S3 Select SelectObjectContent with AllowQuotedRecordDelimiter & ScanRange

0

We're using S3 Select SelectObjectContent to convert CSV input to JSON output.

CSV files on input are very large, so we're passing chunks using ScanRange. Recently we ran into an issue with CSV files containing record delimiters in quoted fields (\r\n). Docs and API error indicate that you can't use AllowQuotedRecordDelimiter with ScanRange.

Is there any workaround to this issue? Would this be something AWS adds support for in the future? Is there another service to do this type of conversion instead of S3 Select?

Thanks for any help!

preguntada hace 6 meses407 visualizaciones
1 Respuesta
0

Hi,

I understand your concern about using S3 Select with AllowQuotedRecordDelimiter and ScanRange. But yeah, they don't really work together right now.

Don't worry though, there are a few ways you could get around it if you want.

  1. Instead of using the default record delimiter (\r\n), you can try using a different delimiter that does not exist in the referenced field. You can use a comma (",") or a semicolon (";") as a record separator.

  2. If the CSV file was generated by another process, you can try modifying the process to avoid using record delimiters in quoted fields. You can use a different quote character (such as double quotes instead of single quotes) or escape record delimiters within quotes.

  3. If neither of the above solutions works for you, you can consider switching to a different AWS service. You can perform the transformation using AWS Glue, which supports more advanced features such as handling quoted record delimiters.

profile picture
respondido hace 6 meses
profile picture
EXPERTO
revisado hace 4 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas