Job Parameters in AWS Glue Notebooks

0

How can a define a custom job parameters when working with an aws glue notebook? I tried to use the %%configure magic but it does not seems to work. I want to define some parameters, resolve them with getResolvedOptions and making them overridable when running the notebook.

profile picture
asked 8 months ago2136 views
2 Answers
0

Hello Andrea,

In AWS Glue Notebooks, you can define custom job parameters and make them overridable when running the notebook by using the glueContext object. You can use the getResolvedOptions method to access the parameters and set their default values. Here's a step-by-step guide:

  1. Import AWS Glue Libraries:

    At the beginning of your AWS Glue Notebook, import the necessary libraries and create a glueContext object. This context will be used to access and resolve custom job parameters.

    import sys
    from awsglue.context import GlueContext
    from pyspark.context import SparkContext
    
    sc = SparkContext()
    glueContext = GlueContext(sc)
  2. Define Custom Job Parameters:

    Below the context setup, define your custom job parameters using getResolvedOptions. You can provide default values for these parameters. Here's an example:

    from awsglue.utils import getResolvedOptions
    
    # Define your custom parameters with default values
    args = getResolvedOptions(sys.argv, ['myParam1', 'myParam2'])
    myParam1 = args['myParam1'] if 'myParam1' in args else 'default_value1'
    myParam2 = args['myParam2'] if 'myParam2' in args else 'default_value2'
  3. Access and Use the Parameters:

    Now that you have defined your custom parameters, you can access and use them in your notebook as needed. For example:

    print(f"Custom Parameter 1: {myParam1}")
    print(f"Custom Parameter 2: {myParam2}")
  4. Running the Notebook:

    When you run the notebook, you can override these custom parameters by providing them as command-line arguments. For example:

    --myParam1 my_value1 --myParam2 my_value2
    

    These values will override the default values you defined in the notebook.

  5. Using %%configure Magic (Optional):

    While %%configure is typically used for configuring the Spark environment in AWS Glue Notebooks, it can also be used to set custom parameters if needed. Here's an example:

    %%configure -f
    {
        "myParam1": "custom_value1",
        "myParam2": "custom_value2"
    }

    This will set the parameters to the specified values for the duration of the notebook's session.

By following these steps, you can define custom job parameters in AWS Glue Notebooks, set default values, and make them overridable when running the notebook by passing command-line arguments or using the %%configure magic command.

Please give me a thumbs up if my suggestion help

profile picture
answered 8 months ago
  • args = getResolvedOptions(sys.argv, ['myParam1', 'myParam2'])
    myParam1 = args['myParam1'] if 'myParam1' in args else 'default_value1'
    myParam2 = args['myParam2'] if 'myParam2' in args else 'default_value2'
    

    Raise an exception:

    GlueArgumentError: the following arguments are required: --myParam1, --myParam2

    I guess the issue is that the arguments must be defined somewhere.

  • Notice the arguments are taken from sys.argv, when you run an IS, the Python engine you use is local and then forwards the commands to the cluster, on a job, the Python interpreter is inside the cluster. What I would do is detect this in the sys.argv and just build the args variable some other way, so you can test on a notebook

0

A question. What does -f mean after the %%configure magic?

profile picture
answered 8 months ago
  • I don't think it does anything since the Glue configure magic doesn't have flags defined. I think people use it out of habit before on other kernels it does something with the config format

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions