Questions tagged with Amazon Redshift
Content language: English
Sort by most recent
Hi all,
I turned on cluster relocation for one of my redshift cluster which was created in past with accessibility on port 8192. And in doing so I was successful.
The Redshift page on console also shows that the cluster relocation is enabled.
I am going through the Redshift doc - https://docs.aws.amazon.com/redshift/latest/mgmt/managing-cluster-recovery.html and the limitations section says that I can't turn on relocation for a cluster running on port other than 5439. Where as I was able to successfully do so.
Can you please help? Am I missing something here?
Hi! In Oracle I could issue ALTER SYSTEM FLUSH SHARED_POOL; command to clear existing data and re-load fresh data. Is there an alternative for AWS Redshift and AWS Redshift Serverless? I searched but perhaps not using the right terminology.
Many thanks in advance,
MZ
Hi !
I have two tables in redshift with a one to many relation, each table has ~ 300K rows
If i execute this request :
```
EXPLAIN
Select Table1.field, Table2.field
FROM Table1
INNER JOIN Table2 On Table1.Reference = Table2.ReferenceTable1
```
Here is the query plan
```
XN Hash Join DS_DIST_ALL_NONE (cost=3929.08..29669880.27 rows=323034878 width=104)
Hash Cond: (("outer".Reference )::text = ("inner".ReferenceTable1)::text)
-> XN Seq Scan on Table1 e (cost=0.00..3143.26 rows=314326 width=104)
-> XN Hash (cost=3143.26..3143.26 rows=314326 width=17)
-> XN Seq Scan on Table2 o (cost=0.00..3143.26 rows=314326 width=17)
```
I don't understand what does it mean rows=323034878 ?
Hi All,
Is there any document that illustrates best/must have cloud alarms to keep track of Redshift resource usage/ resource contention/ blocking queries.
Background - We have seen instances wherein our Redshift cluster becomes unresponsive until some of the blocking queries are killed. However, this activity is more manual in nature and time consuming. Please note that in most of these instances, we have seen the CPU ultilization / cluster performance parameters is nominal. I have seen that the "Average queue wait time by priority" was around 9 min. Total data scanned looked nominal too. I am looking to see if there is a way to identify such abnormalities via Cloud watch alarms.
Any inputs in this regard is greatly appreciated.
Hello. I have an Athena query utilizing UNLOAD to bring data over to my S3 buckets. The query works quite well. However, I do not get the associated header information (column names) in the transferred files. I do not see an explicit parameter that I might be able to use to ensure the header attachment to the compressed (.gz) CSV files. Any help would be appreciated. Thanks.
```
UNLOAD (SELECT * FROM dataplace.datatable WHERE file_date = date '2022-07-01')
TO 's3://my/super/bucket'
WITH (format='TEXTFILE', field_delimiter = ',')
```
Hi,
I was testing Redshift's new features - auto copy from s3 and streaming ingestion.
I have a few questions regarding these features.
1. I understand that Redshift automatically decides the number of files to upload on Redshift in a batch. I wonder how often Redshift detects the file and try to upload the file. Does it decide the timing to upload files based on a specific file size or a specific time interval?
2. When streaming data is transferred to Redshift from Kinesis data streams with streaming ingestion features, where will the data be stored? Will data be stored in Kinesis queue for 24 hours or not stored in anywhere?
Thanks.
Is there any feature in Redshift that allows adding dynamically metadata to database objects?
I mean something similar to object tagging in Snowflake.
I'm trying to copy the CSV file from S3 bucket to Redshift and these are the conditions using, IGNOREHEADER AS 1
delimiter ','
region as 'us-east-1'
csv;
even the table schema nd CSV file is having the same data without nulls, still facing errors as - Invalid digit, Value 'B', Pos 0, Type: Integer ,Invalid dateformat, Char length exceeded.
These are the following Cols presenet in the data - Uploaddate,Batchid,Inventoryid,Itemtype,Model,Supplierid,Quantity,Unit,Costcenter,Inventorydate,Embodiedco2,Usageco2,Riskadjustment.
I am trying to create a new table based on a simple query that contains boolean type columns.
When I run the query it gives me this error: `cannot cast type character varying to boolean`.
_Why would I want to do this?_
I'm trying to define the datatype of a column in my new table as a BOOLEAN but without having a value to put in it (yet). Basically the outcome i'm looking for is the equivalent of running the following DDL: `create table test_table (test_bool boolean);`
A simple example to reproduce this is:
```sql
CREATE table test_table AS (
SELECT CAST(NULL AS BOOLEAN) AS test_bool
);
```
**Note: I am not using a string or casting to a varchar in this code!!**
Conversely, the following works as expected - ie. has the correct column types:
```sql
CREATE table test_table AS (
SELECT
CAST(NULL AS VARCHAR(256)) AS test_varchar,
CAST(NULL AS TIMESTAMP) AS test_ts,
CAST(NULL AS INT) AS test_int,
CAST(NULL AS NUMERIC(10,2)) AS test_numeric
);
```
Hopefully I'm missing something fairly basic with the data implicit conversion or the conversion from a result set to a table definition.
Also fwiw, this was on Redshift serverless, although I did not try on regular redshift so it could be specific to serverless, but I cannot say one way or the other.
**---CLARIFICATION BY EXAMPLE---**
Here's a more concrete example:
I have source data that has people with eye_color and hair_color but nothing else. I want my target table to have the following schema: `person(eye_color VARCHAR(256), hair_color VARCHAR(256), is_left_handed BOOLEAN, salary DECIMAL(10,2))`.
I am creating this table new each run of my pipeline, and I'm using DBT to create a table from a source table (containing the source data mentioned above). This means I don't plan on running DDL to create the table and then fill it - rather I will "CREATE TABLE AS".
Since I want the destination table (person) to have all the columns (i.e. the correct schema) I need placeholders for them in my select statement. Right now it would look something like this:
```sql
CREATE TABLE person AS (
SELECT
eye_color,
hair_color,
CAST (NULL AS BOOLEAN) AS is_left_handed,
CAST (NULL AS DECIMAL(10,2)) AS salary
FROM source_data
);
```
This command fails with the error about converting to varchar. My question is about determining when the boolean value is converted to a varchar and how to prevent it from happening.
If I remove the offending column `CAST (NULL AS BOOLEAN) AS is_left_handed` it works as expected. and can be verified with this query:
```sql
SELECT column_name, is_nullable, data_type, character_maximum_length, numeric_precision, numeric_scale, datetime_precision
FROM information_schema.columns
WHERE table_schema = 'my_schema' AND table_name = 'person';
```
Hi All,
Greetings for the day.
I am curious to undersand the significance and accuracy of the field "vacuum_sort_benefit" in system view "svv_table_info".
In the current cluster that I am working, I see tables where column "unsorted" is 100 but the vacuum_sort_benefit is 0.0 which is making me to think if its really worth to run a vacuum and analyze on these tables.
Any inputs in this regard is greatly appreciated.

Can anybody knows why the zeros at timestamp (eg. 2022-11-08 07:50:08.100 to 2022-11-08 07:50:08.10) are truncated by Redshift automatically? Please refer to my screenshot for details.
Greetings,
I have a really simple ETL that should take a csv from s3 and insert it into Redshift. However, I can't configure Redshift as a target because for some reason in the target properties the dropdown only shows Glue Data Catalog databases and not Redshift ones. I have tried different browsers thinking it was a caching issue, but am now convinced it's an AWS error.
