Does AWS Glue support fixed-byte-length source data?

0

Hello. Can AWS Glue read source data file like below?

  20220101E00011000AAABBBCCC
  20220101E00021000あいCCC

The second record contains Japanese characters, and is the same byte length with the first record. The numbers of characters vary by record.

Thank you.

  • If the numbers of characters are the same by record, I can use Grok pattern like "(?<col0>.{6})(?<col1>.{2})..." when making a data catalog using crawler. But in this case, numbers differ, and bytes are the same. Can anyone tell AWS Glue does support/not support data set like this?

질문됨 2년 전413회 조회
1개 답변
0
수락된 답변

Unfortunately parsing data by Bytes right now is not supported in glue. I observed your data is unstructured and the only way to parse that data is Grok SerDe or Regex SerDe but they parse data by identifying patterns so they are not feasible. I will recommend you to preprocess your data and then load them in glue.

unstructured data -> preprocessing using some Custom built parser function(csv) -> S3 -> crawl and createDatabase in glue.
AWS
지원 엔지니어
Shubh
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인