Coordinating Step functions from App Flow -> Event Bridge -> DynamoDB

0

Having a real hard time getting data into Dynamo using Step Functions. The data flow is:

  1. App Flow from Salesforce to S3
  2. Event Bridge watching the S3 bucket
  3. Map the data from S3 and store in Dynamo.

This almost works. Whenever App Flow gets new data from SalesForce, the step function is invoked. After the data is pulled from S3 (using GetItem), the next step is a Lambda in C#. This Lambda just deserializes the JSONL, populates a C# model for each item, and returns an array of the model.

Next step is a Map step that passes each item in the Array to DynamoDB PutItem.

Everything works, until you try to write a Boolean. Strings and Numbers are written fine, but it blows up if it sees a boolean. DynamoDB won't convert the boolean value.

Things I have tried that didn't work:

  1. Write the entire model object to Dynamo - blows up on bool values
  2. Update PutItem config to map each field - blows up if there are any Nulls (nulls get dropped when the return value from the Lambda is serialized into JSON)
  3. Serializing the values back into a JSON string inside the Lambda - blows up on bool values
  4. Returning a list of DynamoDB Documents - blows up because each item in the document gets serialized as an empty object
  5. Bypassing the Lambda altogether and try passing the JSONL to the Map step - blows up because it's not an array

When trying write Bool values, the error is "Cannot construct instance of com.amazonaws.services.dynamodbv2.model.AttributeValue (although at least one Creator exists): no boolean/Boolean-argument constructor/factory method to deserialize from boolean value (false)"

I can't see any obvious way to get this working, unless we convert every item to a string, which causes problems reading the model back out later, as we lose type information and makes deserializing the model difficult.

asked 2 years ago828 views
5 Answers
0

You did not show how your code looks like, but the item should be in the following format "AttributeName": {"BOOL": False|True}. Is that what you are doing?

profile pictureAWS
EXPERT
Uri
answered 2 years ago
0

I cannot post the source-code. But in pseudocode, here is what it is doing:

                // deserialize objects
                var jsonReader = new JsonTextReader(new StringReader(payload))
                {
                    SupportMultipleContent = true
                };

                JsonSerializerSettings jss = new JsonSerializerSettings();
                jss.ReferenceLoopHandling = ReferenceLoopHandling.Ignore;

               var jsonSerializer = new JsonSerializer();
               // MyObject is the model
               List<MyObject> objects = new List<MyObject>
                while (jsonReader.Read())
                {
                    var item = jsonSerializer.Deserialize<MyObjecct>(jsonReader);
                    objects.Add(doc);
                }
               return new Response() {
                   Data = objects
               };

The response gets serialized as JSON automatically in the step function. I tried changing the model to a DynamoDB Document type (a Dictionary<string,object>) which is what the DynamoDB SDK does, but that ends up returning the value of each item in the dictionary as an empty object when the Step Function serializer serializes the return value.

You need to use a Lambda to transform the output of the S3 GetItem because the Map step doesn't seem to understand how to deserialize JSONL, which the GetItem outputs.

It seems the only way to avoid problems with the Step Function serializer on the output of the Lambda is return a string that is the already JSON serialized object that has the DynamoDB type on each key in the JSON object (so it doesn't get broken by AWS's serializer), then deserialize it in the next step by using StringToJson. But that seems way too much overhead for something that should be straight-forward. It's less work to just have a lambda that directly writes to Dynamo using the AWS SDK, than to try and use the built-in DynamoDB PutItem step. This feels like it is defeating the purpose of step functions.

On top of that, trying to find documentation on what to do when you receive that error message is very difficult. Almost all AWS documentation for DynamoDB is on using the SDK directly, which abstracts the JSON format that DynamoDB expects.

answered 2 years ago
0

I would also like to point out this seems to be a bug. If you have only strings, you don't need to add the type information - all strings work.

As soon as you have a number or a boolean, you need to add type information. However, the JSON format is limited to types allowed:

  • Object
  • Array
  • String
  • Number
  • Boolean

I would expect that given the very limited allowed types in the JSON specification, you would be able to get away without providing type qualifiers to the JSON for flat JSON objects. The deserializer would be trivial:

{ "name": "foo", "count": 1, "isValid" : true } // could be easily tranformed to the following by a deserializer

{
  "name": {
     "S": "foo"
  },
  "count": {
    "N": "1"
  },
  "isValid": {
    "BOOL": true
  }
}

But I cannot change the deserializer, I'm stuck with either having everything a string, or jumping through hoops

answered 2 years ago
0

Just a side observation, away from the main issue:

  1. Event Bridge watching the S3 bucket

AppFlow is actually integrated with EventBridge, and each flow execution will emit event into EB in the default bus. More info: here.

I'd recommend you rather subscribe to this events, and kick-start the step function that way, than watching for S3 PutObject events.

AppFlow may produce several objects as part of a the ingest (subject to size of the data), and in this case, if you watch for PutObject on S3, you will invoke Step Function multiple times (for each object).

With the recommended approach, you can run a single SF execution and loop through all S3 objects (ListObjects) stored as part of the flow execution.

I hope this make sense.

Kind regards, Kamen

profile pictureAWS
answered 2 years ago
0

That is how it is configured - App Flow -> Event Bridge -> Step function invocation that gets passed the S3 objects.

The issue isn't how the objects are passed to the step functions, or how many there are. The problem is how do you take these objects and write them to DynamoDB. You must convert the JSONL that is the result of the Event Bridge event. The objects are passed to the Step Functions as JSONL (stream of JSON objects with no separator). In order to process those objects, you need to:

  1. Convert JSONL to JSON Array (Can't use Dynamo batch write, since it is limited to 25 items per invocation)
  2. Add Dynamo Type Information to each field in each object (because if you have any field that's not a String, it won't write).
  3. Pass to Map step with a DynamoDB PutItem step inside

There is no "No Code" option for this flow. This is why I say it seems to defeat the purpose of the Step Function

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions