Need more guidance on how to check the data pipeline objects

1

Everyday a new emr cluster span up and terminated after completing the step job. Checking the cloudtrail, seems a Data Pipeline created it. I am not sure how to get more details like who created, what is that script, schedules etc. Importantly this involves additional unknown cost involved. Highly appreciate any assistance regarding this.

Vaas
asked 5 months ago222 views
2 Answers
7

Hi,

Using CloudTrail you can get information about who generated the request. CloudTrail records events and actions related to the creation, modification, or execution of AWS Data Pipeline itself. The logs contain information about the user or role who initiated actions within AWS services like AWS Data pipeline and Amazon EMR. You can get the username, or role name associated with the user or service that initiated the action. This information helps identify who performed the specific operation within Data pipeline or EMR. CloudTrail logs include timestamps indicating when the action occurred, allowing you to track the exact time and date of the event. To view the actual scripts and schedules, you can review the pipeline definition or configuration. if the script was stored in a version control system, you can also check the repository directly. For further information you can refer Logging and Monitoring in AWS Data Pipeline and Logging Amazon EMR API calls in AWS CloudTrail.

I hope it helps.

profile pictureAWS
BezuW
answered 5 months ago
AWS
SUPPORT ENGINEER
reviewed 7 days ago
  • Thank you!

4
Accepted Answer

Hello,

As @BezuW mentioned, you can refer the CloudTrail API ActivatePipeline to check who trigger the pipeline that starts processing pipeline tasks. I presume it gives the user-id. You can run "aws iam list-users" command to find the IAM username or role to relate the actual IAM user.

On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that AWS Data Pipeline service is in maintenance mode and we are not planning to expand the service to new regions. You are recommended to migrate if the workload can be leveraged using Glue or MWAA or step function. More details here.

You can use only AWS CLI command to check further,

To check particular pipeline,

aws datapipeline describe-pipelines --pipeline-ids df-0examplepipeline

To check the individual object in the given pipeline,

aws datapipeline describe-objects --pipeline-id <value>.--object-ids <value>

More datapipeline CLI command here

AWS
SUPPORT ENGINEER
answered 5 months ago
  • Thank you!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions