aws sagemaker PySparkProcessor "print statements" not showing

0

Hello, How can I view print statement results from a PySparkProcessor script run through SageMaker jobs? I created a script to collect and manipulate data from s3, then save back the results to s3. Nothing is saved to s3 in the end. It is run from a SageMaker jupyter notebook.

I searched for my print statements in the job logs, but did find them.

Amazon SageMaker > Processing jobs > [job name-id] > View logs > Log stream > ...

Here is an excerpt of my code. Any other config info which would be helpful? Any tips? Thanks.

spark_processor = PySparkProcessor(
        base_job_name= job_name,
        framework_version="2.4",...

...

    spark_processor.run(
        submit_app="src/preprocess.py",
        arguments=['--s3_output_path', save_location,
...                  ],
        spark_event_logs_s3_uri = s3_log_location,
        logs=True,
        wait=False,
    )

已提問 2 年前檢視次數 1156 次
1 個回答
0

Hi there,

As per the documentation for the PySparkProcessor class's run method, the definition for the log parameter is as follows: "Whether to show the logs produced by the job. Only meaningful when wait is True (default: True)" [1].

In your case I see you have set wait to False. May you please confirm if you see logs if you set the wait parameter to True.

spark_processor.run(
        submit_app="src/preprocess.py",
        arguments=['--s3_output_path', save_location,
...                  ],
        spark_event_logs_s3_uri = s3_log_location,
        logs=True,
        wait=True,
    )

Reference

[1] https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.spark.processing.PySparkProcessor

AWS
支援工程師
Njabs
已回答 2 年前
  • Thanks for the help. Yes, I tried setting 'logs' and 'wait' to 'True'. I am still unable to find my debug statements. Below is a snippet from the submitted job script. Any other suggestions? Thanks.

    def main(): print("*** in main", flush=True) logging.info("*** in main info") logging.debug("*** in main debug") ... if name == "main": main()

    Below are the areas I am looking for the debug statements.

    1. In S3, as specified by the 'spark_event_logs_s3_uri' option, nothing posted there recently
    2. Within the Jupyter notebook cell output which initiates the job
    3. Sagemaker processing job logs at the link below (under the latest job): 3.1. https://us-east-2.console.aws.amazon.com/sagemaker/home?region=us-east-2#/processing-jobs 3.2. Under 'view logs' there are 5 logs listed, I opened each and searched for 'in main' using the 'filter events' box

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南