Step logs not available in console/S3 in EMR

0

Hi,

We have an EMR cluster with multiple concurrent steps gets executed seamlessly. Not sure what happened certainly, but the step logs, application logs are not published to s3 from yesterday. However they are exist in the primary node under /emr directory. It will be very difficult to let developers check logs in primary node for each step and this cluster executes more than 10 steps per day. It will be really great if any suggestions provided to troubleshoot. I don't find any relevant steps to troubleshoot in AWS documents.

Thanks in advance

Scott M
asked 4 months ago301 views
3 Answers
3
Accepted Answer

Hello,

Thanks for sharing the error stack. Please follow the below steps to fix the issue,

  1. Stop the logpusher service in primary node
sudo systemctl stop logpusher
  1. Move all the files from this location /emr/logpusher/db to different location. You might be seeing files starts with data*.
  2. Start the logpusher service back.
sudo systemctl start logpusher
  1. Check the latest logpusher file to see if the above exception has disappeared and the logs started uploading to s3 bucket. Wait for sometime to get all the logs available to s3 location. Let me know if you have the issue still exist.
AWS
SUPPORT ENGINEER
answered 4 months ago
  • Excellent!!. This fixed the issue and I see logs are started publishing to S3 and console. Thank you very much!!

3

Hello,

Seems the logpusher failed to push the logs to s3. Please note that logpusher is a deamon in EMR which publish the application logs to s3 every 5 mins. If the files not pushed for complete day, then perhaps the logpusher might be the issue. You can check the service status with below command. If it is running you can restart them and observe after sometime you might be able to view the files in S3.

sudo systemctl status logpusher
sudo systemctl restart logpusher

If you the above doest work and still the files not pushed to s3, you can go this location /emr/logpusher/log and take a look at the latest logpusher file to see if any issues explicitly reported.

AWS
SUPPORT ENGINEER
answered 4 months ago
1

Thanks for the response. I followed your steps to restart the logpusher and it didn't fix the issue. After restart and more than 30 minutes, the logs stays intact and not published to S3/console.

I also found below constraint violation exception several times in logpusher log file. Could you please let me know if this is causing the issue?

2024-01-08 21:34:01,048 ERROR logspusher-1: integrity constraint violation: unique constraint or index violation; SYS_PK_10100 table: LOGFILE
2024-01-08 23:34:01,048 WARN logspusher-1: SQLException doing action 'Performing a transaction': java.sql.BatchUpdateException: integrity constraint violation: unique constraint or index violation; SYS_PK_10100 table: LOGFILE
2024-01-08 23:34:01,048 WARN logspusher-1: SQLState: 23505
2024-01-08 23:34:01,048 WARN logspusher-1: VendorError: -104
2024-01-08 23:34:01,048 ERROR logspusher-1: Failed to schedule logs in logpusher in normal phase
org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update
    
Scott M
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions