All Content tagged with AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.

Content language: English

Select up to 5 tags to filter
Sort by most recent
[Extracting key insights from Amazon S3 access logs with AWS Glue for Ray](https://aws.amazon.com/blogs/big-data/extracting-key-insights-from-amazon-s3-access-logs-with-aws-glue-for-ray/) introduces a...
2
answers
1
votes
19
views
asked 17 hours ago
**Issue:** Our aim is to reduce logging to control data ingestion by 'PutLogEvent' metrics of CW. In past when we ran our glue job against a 35GB data size, we got a billed ~2K for cloudwatch most of...
1
answers
0
votes
36
views
asked a day ago
I have below Python script where currently it generates several gz files with size 4MB in S3 bucket. Its bydeafult what AWS glue has created. But now i want to create multiple files of each file size...
2
answers
0
votes
28
views
RahulD
asked 4 days ago
I have below python script in AWS Glue job. For incremental load logic i have now set the Job bookmark option to enable. And then i try to run the glue job again but it did not create any temporary...
1
answers
0
votes
23
views
RahulD
asked 4 days ago
We have data stored in Cosmos DB NoSQL and need to migrate it to Snowflake using AWS Glue with a Change Data Capture (CDC) approach. Our objective is to perform CRUD operations based on CDC to handle...
1
answers
0
votes
10
views
sowndar
asked 5 days ago
I am trying to create a Kinesis Firehose stream that can directly write to Iceberg tables in S3. I have defined the Glue Data Catalog in the same account and created a bucket to hold the metadata. ...
2
answers
0
votes
40
views
profile picture
Humaid
asked 6 days ago
Hi Team, Have a AWS Glue job connection to mongo db atlas . Getting this error ServerSelectionTimeoutError: xyz.mongodb.net:27017: timed out error. How can i resolve this using AWS privatelink and...
1
answers
0
votes
22
views
MD
asked 7 days ago
Steps taken: 1. Select existing ETL Job (let's call it "sample-job"). 2. Clone job. 3. New job created, called "sample-job-copy". 4. Rename job. 5. Hit enter immediately after renaming. Outcome: New...
Accepted AnswerAWS Glue
2
answers
0
votes
40
views
asked 8 days ago
We are new to Glue env and dealing with our huge cloud-watch bill, we changed log-level in pyspark script from INFO to ERROR. We are using both python logger and spark logger as below in pyspark (Glue...
0
answers
0
votes
19
views
asked 8 days ago
I have below Python script where currently it generates several gz files with size 4MB in S3 bucket. Its bydeafult what AWS glue has created. But now i want to create multiple files of a specific size...
0
answers
0
votes
28
views
RahulD
asked 8 days ago
![My Cost](/media/postImages/original/IMQAIamlJDTlC0OjRCQSZWWg) I'm using Pretier. EC2 is using t2.micro. I understand about the crawling cost of glue. But I don't understand why the cost of vpc...
1
answers
0
votes
19
views
asked 9 days ago