aws sagemaker PySparkProcessor "print statements" not showing

0

Hello, How can I view print statement results from a PySparkProcessor script run through SageMaker jobs? I created a script to collect and manipulate data from s3, then save back the results to s3. Nothing is saved to s3 in the end. It is run from a SageMaker jupyter notebook.

I searched for my print statements in the job logs, but did find them.

Amazon SageMaker > Processing jobs > [job name-id] > View logs > Log stream > ...

Here is an excerpt of my code. Any other config info which would be helpful? Any tips? Thanks.

spark_processor = PySparkProcessor(
        base_job_name= job_name,
        framework_version="2.4",...

...

    spark_processor.run(
        submit_app="src/preprocess.py",
        arguments=['--s3_output_path', save_location,
...                  ],
        spark_event_logs_s3_uri = s3_log_location,
        logs=True,
        wait=False,
    )

1 Risposta
0

Hi there,

As per the documentation for the PySparkProcessor class's run method, the definition for the log parameter is as follows: "Whether to show the logs produced by the job. Only meaningful when wait is True (default: True)" [1].

In your case I see you have set wait to False. May you please confirm if you see logs if you set the wait parameter to True.

spark_processor.run(
        submit_app="src/preprocess.py",
        arguments=['--s3_output_path', save_location,
...                  ],
        spark_event_logs_s3_uri = s3_log_location,
        logs=True,
        wait=True,
    )

Reference

[1] https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.spark.processing.PySparkProcessor

AWS
TECNICO DI SUPPORTO
Njabs
con risposta 2 anni fa
  • Thanks for the help. Yes, I tried setting 'logs' and 'wait' to 'True'. I am still unable to find my debug statements. Below is a snippet from the submitted job script. Any other suggestions? Thanks.

    def main(): print("*** in main", flush=True) logging.info("*** in main info") logging.debug("*** in main debug") ... if name == "main": main()

    Below are the areas I am looking for the debug statements.

    1. In S3, as specified by the 'spark_event_logs_s3_uri' option, nothing posted there recently
    2. Within the Jupyter notebook cell output which initiates the job
    3. Sagemaker processing job logs at the link below (under the latest job): 3.1. https://us-east-2.console.aws.amazon.com/sagemaker/home?region=us-east-2#/processing-jobs 3.2. Under 'view logs' there are 5 logs listed, I opened each and searched for 'in main' using the 'filter events' box

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande