Kinesis Firehose Dynamic Partitoning with inline jq expression

1

Following the aws blog post : Dynamic Partitioning.

So I have a firehose delivery stream configured to convert api calls into parquet output to s3 - and now partitioned according to the value of a field from within my data - 'playtype'.

However, in some rows the value in this partioning field is null/empty and this causes those rows to error. Therefore I want to adjust the jq expression for the dynamic partioning key to substitute a default value for those null cases. Having trouble with the syntax though.

JQ expression specified for the field is .playtype

So I have tried things like:

.playtype // "some-hardcoded-default"
.playtype | if . == null or . == "" then "some-hardcoded-default" else . end

These evalute as valid jq in jqplay - but in practice i.e. in firehose are returning a 'jq syntax error'. How can i specify this logic correctly? i.e. partition on this field but if it the value is null or empty then use 'some-hardcoded-default'. Thankyou!

  • (.playtype | if . == null or . == "" then "some-hardcoded-default" else . end)

    put it inside the brackets , it will work .

AWS-LDD
gefragt vor 3 Jahren3675 Aufrufe
2 Antworten
1
Akzeptierte Antwort

Hi there, please try as per above user suggestion in the comments and let us know the results. jqplay is currently blocked on our end so couldn't check the validity. I believe having brackets for each expression should fix the issue. However if that doesn't work, I would recommend you to resort to using a transformation Lambda function like mentioned here: https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html in the case where field is not present in the incoming data. That way you would have a lot more control on the fields.

AWS
SUPPORT-TECHNIKER
beantwortet vor 3 Jahren
  • AWS-User-4356880 is correct! Many thanks. I had used a transformation lambda function as you also suggest and this works too but the former was a much simpler, quicker way to go.

0

I can confirm the example provided by AWS-User-4356880 works! Using an if statement in a Kinesis Data Firehose works. For exmaple:

Source data might looks like this:

{
  "playerName": "Alex",
  "score": 1200,
  "playtype": "single",
  "level": 5,
  "duration": "15 minutes"
}

Dynamic partitioning keys can look like:

  • Key: playtype_val
  • JQ expression: (.playtype| if . == null or . == "" then "hardcodedefault" else . end)

And S3 bucket prefix as: intel/history/!{partitionKeyFromQuery:interaction_type}/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/

What this does is:

  1. If the field is empty or missing, it returns the hardcodeddefault,
  2. Else it returns the value inside the field
  3. And the /timestamp= is a built in feature that allows you to create S3 partitions from the timestamp of the event (no need to define an expression).
profile pictureAWS
Lydon
beantwortet vor 22 Tagen

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen