Need to load column having comas (,) in csv file

0

Hi , I need to load the data address colum in redshift Address "{""Source"":""MITS_SMTP"", ""Machine"":""ATL1S""}"

I have given the below properties ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'separatorChar'=',') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'S3 path ' TABLE PROPERTIES('skip.header.line.count'='1','data_cleansing_enabled'='true','invalid_char_handling'='REPLACE','replacement_char'=',')

The data is loading as "{""Source"":""MITS_LOG"" in the address colums .Can you please guide what needs to be done to get this sorted ..

질문됨 2년 전267회 조회
2개 답변
0

Hello,

Thank you for reaching out to us. Please find the details below :

The following example shows the contents of a text file with the field values separated by commas.

12,Shows,Musicals,Musical theatre 13,Shows,Plays,All "non-musical" theatre
14,Shows,Opera,All opera, light, and "rock" opera 15,Concerts,Classical,All symphony, concerto, and choir concerts

If you load the file using the DELIMITER parameter to specify comma-delimited input, the COPY command fails because some input fields contain commas. You can avoid that problem by using the CSV parameter and enclosing the fields that contain commas in quotation mark characters. If the quotation mark character appears within a quoted string, you need to escape it by doubling the quotation mark character. The default quotation mark character is a double quotation mark, so you need to escape each double quotation mark with an additional double quotation mark. Your new input file looks something like this.

12,Shows,Musicals,Musical theatre 13,Shows,Plays,"All ""non-musical"" theatre" 14,Shows,Opera,"All opera, light, and ""rock"" opera" 15,Concerts,Classical,"All symphony, concerto, and choir concerts" Assuming the file name is category_csv.txt, you can load the file by using the following COPY command:

copy category from 's3://mybucket/data/category_csv.txt' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' csv;

Alternatively, to avoid the need to escape the double quotation marks in your input, you can specify a different quotation mark character by using the QUOTE AS parameter. For example, the following version of category_csv.txt uses '%' as the quotation mark character.

12,Shows,Musicals,Musical theatre 13,Shows,Plays,%All "non-musical" theatre% 14,Shows,Opera,%All opera, light, and "rock" opera% 15,Concerts,Classical,%All symphony, concerto, and choir concerts% The following COPY command uses QUOTE AS to load category_csv.txt:

copy category from 's3://mybucket/data/category_csv.txt' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' csv quote as '%';

Reference :-

https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html

Thank you !

AWS
답변함 2년 전
0

I am not using command .I am loading using redshift external table .Would be helpful if you could detail the serde properties

답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠