Kinesis Firehose Dynamic Partitoning with inline jq expression

1

Following the aws blog post : Dynamic Partitioning.

So I have a firehose delivery stream configured to convert api calls into parquet output to s3 - and now partitioned according to the value of a field from within my data - 'playtype'.

However, in some rows the value in this partioning field is null/empty and this causes those rows to error. Therefore I want to adjust the jq expression for the dynamic partioning key to substitute a default value for those null cases. Having trouble with the syntax though.

JQ expression specified for the field is .playtype

So I have tried things like:

.playtype // "some-hardcoded-default"
.playtype | if . == null or . == "" then "some-hardcoded-default" else . end

These evalute as valid jq in jqplay - but in practice i.e. in firehose are returning a 'jq syntax error'. How can i specify this logic correctly? i.e. partition on this field but if it the value is null or empty then use 'some-hardcoded-default'. Thankyou!

  • (.playtype | if . == null or . == "" then "some-hardcoded-default" else . end)

    put it inside the brackets , it will work .

AWS-LDD
preguntada hace 3 años3689 visualizaciones
2 Respuestas
1
Respuesta aceptada

Hi there, please try as per above user suggestion in the comments and let us know the results. jqplay is currently blocked on our end so couldn't check the validity. I believe having brackets for each expression should fix the issue. However if that doesn't work, I would recommend you to resort to using a transformation Lambda function like mentioned here: https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html in the case where field is not present in the incoming data. That way you would have a lot more control on the fields.

AWS
INGENIERO DE SOPORTE
respondido hace 3 años
  • AWS-User-4356880 is correct! Many thanks. I had used a transformation lambda function as you also suggest and this works too but the former was a much simpler, quicker way to go.

0

I can confirm the example provided by AWS-User-4356880 works! Using an if statement in a Kinesis Data Firehose works. For exmaple:

Source data might looks like this:

{
  "playerName": "Alex",
  "score": 1200,
  "playtype": "single",
  "level": 5,
  "duration": "15 minutes"
}

Dynamic partitioning keys can look like:

  • Key: playtype_val
  • JQ expression: (.playtype| if . == null or . == "" then "hardcodedefault" else . end)

And S3 bucket prefix as: intel/history/!{partitionKeyFromQuery:interaction_type}/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/

What this does is:

  1. If the field is empty or missing, it returns the hardcodeddefault,
  2. Else it returns the value inside the field
  3. And the /timestamp= is a built in feature that allows you to create S3 partitions from the timestamp of the event (no need to define an expression).
profile pictureAWS
Lydon
respondido hace un mes

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas