Orchestrate Sqoop with Step Functions


Can we use Step Functions to orchestrate a Sqoop job? The goal is to create a transient cluster which load data with Sqoop first followed by transformation in Hive, but looks like Command Runner does not have such option.

If not, what are the alternatives?

You can run any script in script runner mode.


Basically from the step function invoke the syntax thats required by the script runner, for example:

       "StartAt":"Step 1",
                   "Name":"1 - Step 1",

Put all the code in the step1.sh script, this script will execute on master node and do any task you want including your sqoop stuff

