Questions tagged with AWS Step Functions

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

AWS parameterized Glue Concurrent jobs using step functions with enabled bookmarks - throws Version mismatch exception

I have a parameterized glue job , that will be called in parallel (25 glue job) through step functions, when bookmark enabled , version mismatch exception is thrown, when disabled, it runs fine. . Below are the inputs to the step function { "JobName": "gluejobname3", "Arguments": { "--cdb": "catalogdbname", "--ctbl": "tablename", "--dest": "s3://thepath/raw/", "--extra-py-files": "s3://thepath/mytransformation.py.zip" } } When bookmarks are disabled, the step functions calls the parameterized glue job, and loads the data into the different s3 location. below is the glue job script --------------- import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job import mytransformation args = getResolvedOptions(sys.argv, ["JOB_NAME","cdb","ctbl","dest"]) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) # this was given so that while it was initialized each job would have their unique id job.init(args["JOB_NAME"]+args["ctbl"], args) #Custom parameters that will be dynamically passed through with the call to the parameterized glue job db = args["cdb"] tbl = args["ctbl"] dest = args["dest"] # Script generated for node MySQL table #dynamically creating a variable name. so that transformation_ctx would be unique for each glue job globals()[f'MySQLtable_sourceDf{tbl}'] = glueContext.create_dynamic_frame.from_catalog( database=db, table_name=tbl, transformation_ctx = '"'+f'MySQLtable_sourceDf{tbl}'+'"' ) #passing the same transformation_ctx into the destination frame destination_Df = mytransformation.tansform(globals()[f'MySQLtable_sourceDf{tbl}']) # Script generated for node Amazon S3 # : "s3://anvil-houston-preprod" #creating the dynamic unique transformation ctx name since the jobs will be run concurrently globals()[f'S3bucket_node{tbl}'] = glueContext.write_dynamic_frame.from_options( frame=destination_Df, connection_type="s3", format="glueparquet", connection_options={"path": dest, "partitionKeys": []}, format_options={"compression": "snappy"}, transformation_ctx='"'+f'S3bucket_node{tbl}'+'"' ) job.commit() ------ Above runs fine , while the execution is started through step functions ( 25 parallel parameterized glue job), the job runs fine, and loads to 25 diffferent locations. When bookmark is now enabled, the job fails with version mismatch . An error occurred while calling z:com.amazonaws.services.glue.util.Job.commit. Continuation update failed due to version mismatch. Expected version 1 but found version 2 (Service: AWSGlueJobExecutor; Status Code: 400; Error Code: VersionMismatchException; Please help
0
answers
0
votes
53
views
asked 6 months ago

States.Runtime error when nested step function fails in Step Functions Local

AWS Step Functions Local does not handle a failure in a nested step function correctly. What is the best place to report a bug? Is this forum a good place for this - or is an issue tracker available somewhere? The issue happens if we have nested step functions: e.g. an inner step function that is invoked by the outer one. If the inner step function fails (e.g. via a `Fail` state), the outer step function incorrectly tries to parse the step function Output (which is `null` due to failure) and itself fails with `States.Runtime` error and `argument "content" is null` cause). To reproduce, use the latest amazon/aws-stepfunctions-local:1.10.1 Docker image. Launch the container with the following command (note the `STEP_FUNCTIONS_ENDPOINT` pointing to itself to enable nested step function execution: ``` docker run -p 8083:8083 -e STEP_FUNCTIONS_ENDPOINT=http://localhost:8083 amazon/aws-stepfunctions-local ``` Then create a simple HelloWorld inner step function in the Step Functions Local container with a single `Fail` state: ``` aws stepfunctions --endpoint-url http://localhost:8083 create-state-machine --definition "{\ \"StartAt\": \"HelloFail\",\ \"States\": {\ \"HelloFail\": {\ \"Type\": \"Fail\",\ \"Error\": \"TestFailure\",\ \"Cause\": \"Test cause\"\ }\ }}" --name "HelloWorld" --role-arn "arn:aws:iam::012345678901:role/DummyRole" ``` Add a simple outer step function that executes the HelloWorld one: ``` aws stepfunctions --endpoint-url http://localhost:8083 create-state-machine --definition "{\ \"StartAt\": \"InnerInvoke\",\ \"States\": {\ \"InnerInvoke\": {\ \"Type\": \"Task\",\ \"Resource\": \"arn:aws:states:::states:startExecution.sync:2\",\ \"Parameters\": {\ \"StateMachineArn\": \"arn:aws:states:us-east-1:123456789012:stateMachine:HelloWorld\"\ },\ \"End\": true\ }\ }}" --name "HelloWorldOuter" --role-arn "arn:aws:iam::012345678901:role/DummyRole" ``` Finally, start execution of the outer Step Function: ``` aws stepfunctions --endpoint-url http://localhost:8083 start-execution --state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:HelloWorldOuter ``` The execution fails with the `argument "content" is null` error in the logs: ``` arn:aws:states:us-east-1:123456789012:execution:HelloWorldOuter:720b9e13-efdc-4718-9a70-be3eab15f416 : {"Type":"ExecutionFailed","PreviousEventId":2,"ExecutionFailedEventDetails":{"Error":"States.Runtime","Cause":"argument \"content\" is null"}} ``` Debugging the stepfunction-local Java application, I was able to narrow the error down to JSON parsing failing in `DescribeExecutionJson` class: ``` resultNode.set("Output", this.mapper.readTree(executionResult.getOutput())); ``` The problem is that `getOutput()` is `null` for a `FAILED` step function execution `DescribeExecution` response. So this seems to be a bug - would like to get it to the Step Function Local developers to fix.
1
answers
0
votes
207
views
asked 6 months ago