- Newest
- Most votes
- Most comments
so as you have given the table with little change use this:
CREATE EXTERNAL TABLE employees (
userID string,
Name string,
Color string, -- Changed to string, array creation in query
Region string,
Country string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'field.delim' = ',',
'quoteChar' = ' " ',
'skip.header.line.count' = '1'
)
LOCATION 's3://inventory/test.csv'
TBLPROPERTIES ('classification' = 'csv');
[ so instead of Lazysimplesere need to use openCSVserde for CSV files, use quote char property too handle comma-separated strings within fields. use the split, quoteChar = ' " ' is crucial it tells Athena ti treat any comma-separated values enclosed in double as single fields. ]
here is the link where i find it. "https://docs.aws.amazon.com/athena/latest/ug/csv-serde.html
After creating the table query with
SELECT
userID,
Name,
split(Color, ',') AS ColorArray,
Region,
Country
FROM
employees;
this should give you updated output.
Hope it will resolve. :-)
To handle CSV fields containing comma-separated values in Athena, you need to use the ARRAY type with a custom ROW FORMAT definition:
CREATE EXTERNAL TABLE your_table ( id string, multi_value_field array<string>, other_field string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'separatorChar' = ',', 'quoteChar' = '"', 'escapeChar' = '\\' ) STORED AS TEXTFILE LOCATION 's3://your-bucket/path/';
so that is what i understand "as you mention only one cell comma separated values like "red, blue, green" in CSV file something like userID, Name, Color.
in Athena console editor page where it says " Table and views" create section choose S3 bucket data. now it is going to ask bunch of questions, table name <mytable> create database name it <queryDB> dataset ---browse you s3 bucket where CSV file is---now data format Apache Hive from drop down---file formate --CSV---and here comes the main SerDeLibrary choose "org.apache.hadop.hive.....than SerDe properties though it is optional but choose --field.delim , --Quote character " ----- Escape character . (choose as your CSV file is) now Column details user_id type string, "name" type string---color string ( as given column in your CSV file, i take example as i gave you above csv example). it will give you preview table to query. (magic of aws :-) )
now query with simple select SELECT user_id, name, colors FROM your_table_name WHERE colors LIKE '%blue%';"
if you want each comma separated value as separate items: SELECT user_id, name, color_item FROM your_table_name CROSS JOIN UNNEST(split(color, ',')) AS t(color_item);
I hope this may help!
Yes. You are right. I have "red, blue, green" in one cell. Let's say I have two other attributes in the CSV file following the
Colorfield in your example, which areRegionandCountry. I ran this statement to create the table:CREATE EXTERNAL TABLE employees ( `userID` string, `Name` string, `Color` array<string>, `Region` string, `Country` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('field.delim' = ',') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://inventory/test.csv' TBLPROPERTIES ('classification' = 'csv', "skip.header.line.count"="1");After I ran this statement, the value for the 'Color' attribute became 'red' alone. The value in 'Region' for this row became 'blue' and the value in 'Country' for this row became 'green'. My only guess is that since I set the field.delim to ',', are values that are delimited by ',' are separated. How could I create an array of strings for this attribute?
Relevant content
- asked a year ago
- asked 2 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 6 months ago

Got it! I will try this today and update it! So I looked into my query and I created the table with
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'and I also hadSTORED AS INPUTFORMATin the query. Did these two statement cause the issue?