By using AWS re:Post, you agree to the Terms of Use
/Orchestrate Sqoop with Step Functions/

Orchestrate Sqoop with Step Functions

0

Can we use Step Functions to orchestrate a Sqoop job? The goal is to create a transient cluster which load data with Sqoop first followed by transformation in Hive, but looks like Command Runner does not have such option.

If not, what are the alternatives?

1 Answers
0
Accepted Answer

You can run any script in script runner mode.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-script.html

Basically from the step function invoke the syntax thats required by the script runner, for example:

       "StartAt":"Step 1",
       "States":{ 
          "Step_1":{ 
             "Type":"Task",
             "Resource":"arn:aws:states:::elasticmapreduce:addStep.sync",
             "Parameters":{ 
                "ClusterId.$":"$.ClusterId",
                "Step":{ 
                   "Name":"1 - Step 1",
                   "ActionOnFailure":"CONTINUE",
                   "HadoopJarStep":{ 
                      "Jar":"s3://elasticmapreduce/libs/script-runner/script-runner.jar",
                      "Args":[ 
                         "s3://xxx/scripts/step1.sh"
                      ]
                   }
                }
             },
             "End":true
          }
       }
    }

Put all the code in the step1.sh script, this script will execute on master node and do any task you want including your sqoop stuff

EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions