Skip to content

Redhisft CSV unload differentiation between nulls and empty string

0

I'm currently working with the UNLOAD command and need some help with formatting the output CSV file. Specifically, I want to achieve the following:

Empty strings should be quoted values: ,"", NULL values should be represented as blanks: ,,, However, I am encountering an issue when trying to use the addquotes and null as parameters together. The problem is that the null as parameter expects a literal value, and I cannot provide a blank value for it. And I end up getting output where there is no way to differentiate between empty strings and null values

Has anyone faced a similar issue or can suggest a workaround to achieve this specific CSV formatting? Any help or pointers would be greatly appreciated!

asked 2 years ago863 views
2 Answers
0

1. Define Placeholder Values

Choose placeholder values that you can easily identify and replace. For this example, let’s use NULL_PLACEHOLDER for NULLs and EMPTY_STRING_PLACEHOLDER for empty strings.

2. Transform Your Data

Use SQL to transform your data before unloading. Replace NULLs with NULL_PLACEHOLDER and empty strings with EMPTY_STRING_PLACEHOLDER.

SELECT
  CASE
    WHEN column_name IS NULL THEN 'NULL_PLACEHOLDER'
    WHEN column_name = '' THEN 'EMPTY_STRING_PLACEHOLDER'
    ELSE column_name
  END AS column_name
FROM your_table;

3. Use the UNLOAD Command

Unload the transformed data, specifying the NULL AS parameter to recognize the NULL_PLACEHOLDER.

UNLOAD ('
  SELECT
    CASE
      WHEN column_name IS NULL THEN ''NULL_PLACEHOLDER''
      WHEN column_name = '''' THEN ''EMPTY_STRING_PLACEHOLDER''
      ELSE column_name
    END AS column_name
  FROM your_table
')
TO 's3://your-bucket/path/prefix'
IAM_ROLE 'your-iam-role'
NULL AS 'NULL_PLACEHOLDER'
ADDQUOTES
ALLOWOVERWRITE
PARALLEL OFF
;

4. Post-Processing (Optional)

After unloading, you might need to post-process the resulting CSV files to replace NULL_PLACEHOLDER with actual blanks and EMPTY_STRING_PLACEHOLDER with quoted empty strings. This can be done using a script in your preferred programming language.

EXPERT
answered 2 years ago
EXPERT
reviewed 2 years ago
  • Thank you for your reply. I am looking for a solution that uses only the unload command, without any post-processing. My output is directly consumed by a third-party service, and I want to avoid post-processing to ensure it doesn't impact performance.

0

Try adding the quotes explicitly in your SELECT statement for all the columns, '"'||col1||'"'.

If a column has an empty string then the concatenation will succeed and you should see "".

If a column has a NULL value then the concatenation will not happen and result will be NULL.

Based on this test below

drop table if exists tbl;
create table tbl (col1 varchar(20));
insert into tbl values ('E137 E028 E098 E999'),('this is a test'),(NULL),('');
select col1, '"'||col1||'"' my_col from tbl;

test query results

AWS
EXPERT
answered 2 years ago
EXPERT
reviewed 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.