Converting Data Pipeline to Step function

0

I have to convert a pipeline to step function and I have this step which loads a dynamo db table with data from S3. I am not sure how to convert this to step function. Below is the step. Any idea how can I do this? Thank you { "name": "LoadTable", "schedule": { "ref": "DefaultSchedule" }, "resizeClusterBeforeRunning": "true", "step": "s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbImport,#{input.directoryPath},#{output.tableName},#{output.writeThroughputPercent}", "input": { "ref": "S3InputDataNodeInd" }, "output": { "ref": "DDBDestinationTableInd" }, "type": "EmrActivity", "id": "TableLoadActivityInd", "maximumRetries": "2", "runsOn": { "ref": "EmrClusterForLoadInd" }

1 Answer
0

Hello there!

I understand that you would like to migrate your Data Pipeline step into Step functions. In order to implement this use case, you can follow the AWS provided documentation [1][2] which mentioned the approach of how to perform such workload migration.

Additionally, you can refer the following example of Step Functions state Language to take the general idea, how to perform this migration using AWS Glue DataBrew job [3][4]:

You can use the AWS Glue DataBrew “StartJobRun” action [5] to load the DynamoDB table with data from S3. Here's an example of how you can represent this step in a Step Functions state machine. You can get more information regarding the Step Functions state Language from the AWS Doc [6].

Example JSON Code for Step Functions machine state:

====

{
   "StartAt":"LoadTable",
   "States":{
      "LoadTable":{
         "Type":"Task",
         "Resource":"arn:aws:states:::databrew:startJobRun.sync",
         "Parameters":{
            "Name":"YourDataBrewJobName",
            "Inputs":{
               "InputName":"S3InputDataNodeInd"
            },
            "Outputs":{
               "OutputName":"DDBDestinationTableInd"
            }
         },
         "Next":"NextState"
      },
      "NextState":{
         "Type":"Pass",
         "End":true
      }
   }
}

====

I hope that this information would be of use to you! If you require detailed steps as per your use-case implementation then I would suggest you raise a support ticket with Steps functions technical support team [7].

References:

[1]: AWS Data Pipeline --> Migrating workloads to AWS Step Functions https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/migration.html

[2]: AWS Step Functions --> Migrating workloads from AWS Data Pipeline to Step Functions https://docs.aws.amazon.com/step-functions/latest/dg/migrate-pipeline-workloads.html#sfn-sample-projects

[3]: AWS Glue DataBrew --> What is AWS Glue DataBrew? https://docs.aws.amazon.com/databrew/latest/dg/what-is.html

[4]: AWS Glue DataBrew --> Creating, running, and scheduling AWS Glue DataBrew jobs https://docs.aws.amazon.com/databrew/latest/dg/jobs.html

[5]: AWS Glue --> StartJobRun API https://docs.aws.amazon.com/glue/latest/webapi/API_StartJobRun.html

[6]: AWS Step Functions --> Amazon States Language (Example Amazon States Language Specification) https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html

[7]: AWS Support --> Creating support cases and case management https://docs.aws.amazon.com/awssupport/latest/user/case-management.html

AWS
SUPPORT ENGINEER
answered a year ago
  • Thank you for your answer. I will try this. One more question though: "step": "s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbImport,#{input.directoryPath},#{output.tableName},#{output.writeThroughputPercent}" Is there a way I can add the above step to a step function?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions