Amazon AWS hadoop- how to download a file from s3 in map-reduce program in

0

I am trying to make a quite simple map-reduce program that as part of the map task it needs to load a small file from S3 (something that can be saved in the local memory).

In the map reduce the task is to check each line against this file and generate a feature vector (and so on...)

I am having troubles with making my setup method downloading and accessing this file.
I think the problem is with forwarding the credentials but it might as well be in the way I am accessing the file in the setup function (I put code snippets below).

In my map-reduce logs the error I'm getting is:
"profile file cannot be null"
about this line:

            AWSCredentialsProvider credentialsProvider = new AWSStaticCredentialsProvider(new ProfileCredentialsProvider().getCredentials());

I tried many other ways and got nowhere, if there is anyway you can guide me it will be great.

In my main (running local on my pc) java file I do:

AmazonElasticMapReduce mapReduce =AmazonElasticMapReduceClientBuilder.standard().withRegion("us-east-1").build();
HadoopJarStepConfig hadoopJarStep = new HadoopJarStepConfig()
                .withJar("MYPATH")  // This should be a full map reduce application.
                .withMainClass("MYMAIN");
       

in the mapreduce jar (the one that runs on the ec2 task) I define the relevant jobs and the job contorl, and in the setup function of the map class I do:

AWSCredentialsProvider credentialsProvider = new AWSStaticCredentialsProvider(new ProfileCredentialsProvider().getCredentials());
            AmazonS3 s3 = AmazonS3ClientBuilder.standard()
                    .withCredentials(credentialsProvider)
                    .withRegion("us-east-1")
                    .build();
            String S3Bucket = "MYBUCKET";  // the bucket where the file is located
            String S3Key = "MYKEY"; // the name of the file
            S3Object object = s3.getObject(new GetObjectRequest(S3Bucket, S3Key)); // bucket, key
            S3ObjectInputStream summaryInputStream = object.getObjectContent();
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(summaryInputStream));

I also tried to hard code my credentials in the main (run local on my pc) like this:

String access_key = "aws_access_key_id=MYKEY";
        String secret_key = "aws_secret_access_key=MYSECRETKEY";

        BasicAWSCredentials creds = new BasicAWSCredentials(access_key, secret_key);
        AWSStaticCredentialsProvider awsCred = new AWSStaticCredentialsProvider(creds);
        AmazonElasticMapReduce mapReduce =
                AmazonElasticMapReduceClientBuilder.standard().withRegion("us-east-1").withCredentials(awsCred).build();

also with no success (I tried with excluding the "aws_access_key_id=" and the "aws_secret_access_key=" from the strings)

Thanks!

noampa
asked 4 years ago332 views
3 Answers
0
Accepted Answer

Your first issue looks like you are having a credential issue with Java SDK V1.

Also - I would recommend moving to V2: https://github.com/awsdocs/aws-doc-sdk-examples/tree/master/javav2.

For V1 - simply try to follow this code example to create a bucket so we can focus on your cred issue.

https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/java/example_code/s3/src/main/java/aws/example/s3/CreateBucket.java

I recommend placing your credentials here - as discussed in the docs:

The default credential profiles file– typically located at ~/.aws/credentials (location can vary per platform), and shared by many of the AWS SDKs and by the AWS CLI. The AWS SDK for Java uses the ProfileCredentialsProvider to load these credentials.

Then you can create your S3 Service client like this:

AmazonS3 s3 = AmazonS3ClientBuilder.standard()
.withRegion(Regions.DEFAULT_REGION)
.build();
I just ran this Java V1 example and it created a bucket perfectly.

Try this and post back what happened.

answered 4 years ago
0

Thanks a lot, that did the trick

noampa
answered 4 years ago
0

I am glad it worked for you!!!

Edited by: PowerUserScott on Jul 9, 2020 10:24 AM

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions