Orchestrate Sqoop with Step Functions

0

Can we use Step Functions to orchestrate a Sqoop job? The goal is to create a transient cluster which load data with Sqoop first followed by transformation in Hive, but looks like Command Runner does not have such option.

If not, what are the alternatives?

AWS
preguntada hace 3 años375 visualizaciones
1 Respuesta
0
Respuesta aceptada

You can run any script in script runner mode.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-script.html

Basically from the step function invoke the syntax thats required by the script runner, for example:

       "StartAt":"Step 1",
       "States":{ 
          "Step_1":{ 
             "Type":"Task",
             "Resource":"arn:aws:states:::elasticmapreduce:addStep.sync",
             "Parameters":{ 
                "ClusterId.$":"$.ClusterId",
                "Step":{ 
                   "Name":"1 - Step 1",
                   "ActionOnFailure":"CONTINUE",
                   "HadoopJarStep":{ 
                      "Jar":"s3://elasticmapreduce/libs/script-runner/script-runner.jar",
                      "Args":[ 
                         "s3://xxx/scripts/step1.sh"
                      ]
                   }
                }
             },
             "End":true
          }
       }
    }

Put all the code in the step1.sh script, this script will execute on master node and do any task you want including your sqoop stuff

AWS
EXPERTO
Behram
respondido hace 3 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas