Need to load column having comas (,) in csv file

0

Hi , I need to load the data address colum in redshift Address "{""Source"":""MITS_SMTP"", ""Machine"":""ATL1S""}"

I have given the below properties ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'separatorChar'=',') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'S3 path ' TABLE PROPERTIES('skip.header.line.count'='1','data_cleansing_enabled'='true','invalid_char_handling'='REPLACE','replacement_char'=',')

The data is loading as "{""Source"":""MITS_LOG"" in the address colums .Can you please guide what needs to be done to get this sorted ..

asked 2 years ago341 views
2 Answers
0

Hello,

Thank you for reaching out to us. Please find the details below :

The following example shows the contents of a text file with the field values separated by commas.

12,Shows,Musicals,Musical theatre 13,Shows,Plays,All "non-musical" theatre
14,Shows,Opera,All opera, light, and "rock" opera 15,Concerts,Classical,All symphony, concerto, and choir concerts

If you load the file using the DELIMITER parameter to specify comma-delimited input, the COPY command fails because some input fields contain commas. You can avoid that problem by using the CSV parameter and enclosing the fields that contain commas in quotation mark characters. If the quotation mark character appears within a quoted string, you need to escape it by doubling the quotation mark character. The default quotation mark character is a double quotation mark, so you need to escape each double quotation mark with an additional double quotation mark. Your new input file looks something like this.

12,Shows,Musicals,Musical theatre 13,Shows,Plays,"All ""non-musical"" theatre" 14,Shows,Opera,"All opera, light, and ""rock"" opera" 15,Concerts,Classical,"All symphony, concerto, and choir concerts" Assuming the file name is category_csv.txt, you can load the file by using the following COPY command:

copy category from 's3://mybucket/data/category_csv.txt' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' csv;

Alternatively, to avoid the need to escape the double quotation marks in your input, you can specify a different quotation mark character by using the QUOTE AS parameter. For example, the following version of category_csv.txt uses '%' as the quotation mark character.

12,Shows,Musicals,Musical theatre 13,Shows,Plays,%All "non-musical" theatre% 14,Shows,Opera,%All opera, light, and "rock" opera% 15,Concerts,Classical,%All symphony, concerto, and choir concerts% The following COPY command uses QUOTE AS to load category_csv.txt:

copy category from 's3://mybucket/data/category_csv.txt' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' csv quote as '%';

Reference :-

https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html

Thank you !

AWS
answered 2 years ago
0

I am not using command .I am loading using redshift external table .Would be helpful if you could detail the serde properties

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions