Need AWS Glue to store bad records/ records with error when reading Mongo db data to a S3 path and process the rest of the data.

0

Hi, Working on a solution to read Mongo db data and storing in S3. But due to conflicting schema the job does not run the second time, it always runs in the first time. I need Glue to handle the records with such conflicting schemas, tried providing badRecordsPath and mode = PERMISSIVE but it does not consider the options. Tried reading data using both -> with catalog and with from_options. Below is the snippet of the same.

  1. dynamic_frame = glueContext.create_dynamic_frame_from_catalog(database = "mongo_app_config_db",table_name = "s3app_config_intf_logs",additional_options = {"database":"app_config", "collection":"intf_logs","badRecordsPath": s3_path,"mode": "PERMISSIVE" },transformation_ctx = "dynamic_frame" )
  2. dynamic_frame = glueContext.create_dynamic_frame.from_options(connection_type="mongodb",connection_options=read_mongo_options, transformation_ctx = "dynamic_frame", additional_options = {"badRecordsPath": s3_path , "mode": "PERMISSIVE"})

Please help me to handle such records and continue the job run for rest of the records.

질문됨 2년 전764회 조회
1개 답변
0

Hello,

Glue or Spark uses a connector called "MongoDB Spark connector" when reading from Mongodb and as per the documentation here https://www.mongodb.com/docs/spark-connector/current/configuration/read/ I do not see any options available for handling corrupted/bad records. I could also see a feature request was placed recently here https://jira.mongodb.org/browse/SPARK-327 to the community.

You can try exporting mongo collection as JSON or CSV files to an s3 location using something like https://www.mongodb.com/docs/database-tools/mongoexport/ and consume them using Spark data frames. Spark should support handling bad records for these file formats.

You can also convert between Spark data frame and Glue dynamic frame easily as shown in the below links

https://stackoverflow.com/a/59668469

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

AWS
지원 엔지니어
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠