Custom classifier for AWS Glue crawler

1

I have a set of files in my S3 bucket which have a delimiter ASCII 31 (unit separator). I am using a crawler to read these files and create the tables in AWS Glue catalog. I tried using the custom delimiter in the classifiers but with no luck since this is a non-printable character. What is the best way to incorporate this delimiter within a crawler?

2개 답변
1

To incorporate the ASCII 31 delimiter within a Glue Crawler, follow the steps below:

  1. Create a Custom Classifier - Because ASCII 31 is non-printable, you'll need to use it's escape sequence. Under the classifier's "Delimiter" field, enter "\u001F" representing the unit separator.

  2. Update your Crawler Configuration - In order to use the custom classifier created above, configure the Glue crawler's "CSV Classifier" settings by selecting the ASCII 31 custom classifier.

  3. Modify Glue Job (Depending on Job Code) - If your job code involves delimiter handling logic, make sure it is updated to account for the updated "\u001F" delimiter.

Below are links to the official AWS documentation on writing custom classifiers and adding them to a Glue crawler: https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html

If you have any further questions or encounter further issues, feel free to reach out with more information!

답변함 8달 전
0

I have similar issue with crawler, so used spark code as below. It may help you

delimiter_char31 = chr(31)
df = spark.read.option("header","false").option('delimiter',delimiter_char31 ).csv("s3://abc/test.txt")
답변함 10달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠