Kinesis Firehose Dynamic Partitoning with inline jq expression


Following the aws blog post : Dynamic Partitioning.

So I have a firehose delivery stream configured to convert api calls into parquet output to s3 - and now partitioned according to the value of a field from within my data - 'playtype'.

However, in some rows the value in this partioning field is null/empty and this causes those rows to error. Therefore I want to adjust the jq expression for the dynamic partioning key to substitute a default value for those null cases. Having trouble with the syntax though.

JQ expression specified for the field is .playtype

So I have tried things like:

.playtype // "some-hardcoded-default"
.playtype | if . == null or . == "" then "some-hardcoded-default" else . end

These evalute as valid jq in jqplay - but in practice i.e. in firehose are returning a 'jq syntax error'. How can i specify this logic correctly? i.e. partition on this field but if it the value is null or empty then use 'some-hardcoded-default'. Thankyou!

  • (.playtype | if . == null or . == "" then "some-hardcoded-default" else . end)

    put it inside the brackets , it will work .

已提问 3 年前3693 查看次数
2 回答

Hi there, please try as per above user suggestion in the comments and let us know the results. jqplay is currently blocked on our end so couldn't check the validity. I believe having brackets for each expression should fix the issue. However if that doesn't work, I would recommend you to resort to using a transformation Lambda function like mentioned here: in the case where field is not present in the incoming data. That way you would have a lot more control on the fields.

已回答 3 年前
  • AWS-User-4356880 is correct! Many thanks. I had used a transformation lambda function as you also suggest and this works too but the former was a much simpler, quicker way to go.


I can confirm the example provided by AWS-User-4356880 works! Using an if statement in a Kinesis Data Firehose works. For exmaple:

Source data might looks like this:

  "playerName": "Alex",
  "score": 1200,
  "playtype": "single",
  "level": 5,
  "duration": "15 minutes"

Dynamic partitioning keys can look like:

  • Key: playtype_val
  • JQ expression: (.playtype| if . == null or . == "" then "hardcodedefault" else . end)

And S3 bucket prefix as: intel/history/!{partitionKeyFromQuery:interaction_type}/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/

What this does is:

  1. If the field is empty or missing, it returns the hardcodeddefault,
  2. Else it returns the value inside the field
  3. And the /timestamp= is a built in feature that allows you to create S3 partitions from the timestamp of the event (no need to define an expression).
profile pictureAWS
已回答 1 个月前

您未登录。 登录 发布回答。

