Custom parameters to visual ETL job

0

I want to use Glue Studio for creating a glue ETL job. This job needs to filter out the data in its first step based on the input parameters given to it at run time. Is there a way with visual ETL jobs to provide custom input parameters at run time. For example, can I pass two date arguments "--date-start" and "--date-end" to the job at run time and the job can filter out the data based on the provided data range?

In other words, is there a way to provide custom Arguments (for example, { "--date-start" : "2024-01-01", "--date-end" : "2024-01-03"...}) to a visual ETL glue job if I run it using boto3?

response = glue_client.start_job_run(
    JobName='string',
    JobRunId='string',
    Arguments={
        '--date-start': '2024-01-01',
        '--date-end': '2024-01-03'
    },
    ....
)
Anshu
asked 3 months ago313 views
2 Answers
0
Accepted Answer

It is possible to provide custom runtime Arguments to your Glue job that are resolved using getResolvedOptions. The default script generated by Glue already contains the resolved arguments in the args variable.

Simply provide the parameter name as the the key to args['param'] to access a the value passed to your job.

If you have parameters that don't change for each job run, it is better to configure the job with the parameters under the Job details, job parameters page.

profile pictureAWS
answered 3 months ago
  • Thanks for the answer. Yes, it is possible to provide the arguments at run time to the generated script but as I understand it, the generated script would need to be modified manually in order to access those arguments and doing so will take away the visual aspect of the job. I do not want that to happen. As Gonzalo mentioned I think adding custom code node is the way to access the custom parameters (other than source/target databases/tables) in the visual glue jobs.

0

It's important to highlight that the suggestion for edokemwa is valid for custom code nodes, but you cannot configure visual nodes (with the exception of the input table source, which does allow using parameters). So, for instance you could have a custom transform that parses the arguments and filter the data going through to be within those dates.
However, you are probably thinking on controlling the read from the source for efficiency, for that you only have the table option I mentioned or you could have a placedholder source (s3 without data) and then read the actual data in your custom code node.

profile pictureAWS
EXPERT
answered 3 months ago
  • Thanks for the clarification. I was actually looking for controlling the read from the source as you mentioned. Looks like it can only be done through custom code node as I suspected.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions