Orchestrate Sqoop with Step Functions

0

Can we use Step Functions to orchestrate a Sqoop job? The goal is to create a transient cluster which load data with Sqoop first followed by transformation in Hive, but looks like Command Runner does not have such option.

If not, what are the alternatives?

AWS
posta 3 anni fa375 visualizzazioni
1 Risposta
0
Risposta accettata

You can run any script in script runner mode.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-script.html

Basically from the step function invoke the syntax thats required by the script runner, for example:

       "StartAt":"Step 1",
       "States":{ 
          "Step_1":{ 
             "Type":"Task",
             "Resource":"arn:aws:states:::elasticmapreduce:addStep.sync",
             "Parameters":{ 
                "ClusterId.$":"$.ClusterId",
                "Step":{ 
                   "Name":"1 - Step 1",
                   "ActionOnFailure":"CONTINUE",
                   "HadoopJarStep":{ 
                      "Jar":"s3://elasticmapreduce/libs/script-runner/script-runner.jar",
                      "Args":[ 
                         "s3://xxx/scripts/step1.sh"
                      ]
                   }
                }
             },
             "End":true
          }
       }
    }

Put all the code in the step1.sh script, this script will execute on master node and do any task you want including your sqoop stuff

AWS
ESPERTO
Behram
con risposta 3 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande