Questions tagged with Amazon Redshift

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Hi, I created a redshift database and ran some queries in the query editor v2. I can see the option to Export to csv or json, but when I click on it, I get the error "*Exporting result sets is disabled for this account."*. Searching google for this specific error message doesn't return much. I have tried adding the policy **AmazonRedshiftQueryEditorV2FullAccess** to both my own user and the IAM used for the created redshift database, but the error still persists. (taken from article https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2-getting-started.html) 1. Which account specifically is it talking about? 2. What do I need to do in order to be able to export theses query results to csv from the query editor v2 directly? Thanks, Becky
1
answers
0
votes
56
views
asked 2 months ago
Hi @Everyone, Could someone enlighten me? 1. What is the difference between a clustered index and a SORTKEY? 2. In redshift the concept of index has been substituted by SORTKEY, right? 3. Can I partition tables like in transactional DB in RedShift? 4. What consideration I should take when it comes to selecting between a SORTKEY or DISTKEY? Sorry for asking several questions at the same time. Highly appreciate the clarifications
1
answers
0
votes
32
views
Amy
asked 2 months ago
What tables (svl, stl, etc) should I consider in order to filter out which local tables are used in sql executed in redshift? I am thinking of using regex to query stl_ddltext for local tables but it does not seem to work
2
answers
0
votes
24
views
asked 2 months ago
I am trying to create a process that will ingest .cvs files dropped into an S3 bucket daily and add those records to an existing table in a serverless Redshift database. To accomplish this, I'm trying to build a Glue job that will scan the S3 bucket for new files and update/add those records to the redshift table accordingly. I have no Spark or Python skills so am using Glue Studio. I have selected the 'Visual with a source and target' option with my source being Amazon S3 and Target as Amazon Redshift. I click 'Create' and am taken to the canvas. I can find my S3 Bucket, apply the transform mappings but when I click the Redshift Cluster node, I do not see my serverless Redshift database. I do understand there is a difference between a Serverless Redshift database and a Redshift Cluster but there is no option to select the target as Serverless Redshift. My question seems to be similar to that posted in this question but the answer provided did not really help me: https://repost.aws/questions/QU33GP1YziR5OXIn-vehIxwA/aws-glue-studio-source-s-3-target-redshift-job-wants-to-select-gdc-not-redshift-table-in-target I followed the steps outlined here https://docs.aws.amazon.com/glue/latest/dg/setup-vpc-for-glue-access.html (though this specifically applies to Redshift Clusters) to address the Inbound/outbound rules of the VPC security group in my Serverless Redshift workgroup and all look to be set up correctly. I also reviewed the associated IAM roles and they also look to be set up correctly. My question is how do I set up a Glue job to read from an S3 bucket and populate a table in a Serverless Redshift database? What steps am I missing? I can't tell if I'm running into permission issues (i.e. Redshift won't allow Glue access) or if there is something else I'm missing. I cannot find any documentation that specifically addressed Glue and Serverless Redshift databases. I am open to any suggestions.
2
answers
0
votes
276
views
asked 2 months ago
I'm trying to estimate how much it will cost to enable AWS Backup across several Redshift clusters. How can I find out the total storage usage of a Redshift Cluster using the AWS API?
1
answers
0
votes
28
views
Blaine
asked 2 months ago
I have a redshift table that I would like to grant access to a user, if I run: GRANT SELECT ON ALL TABLES IN SCHEMA schema_name to user_name; it works, however, if I run: GRANT SELECT ON schema_name.table_name to user_name; I got the following error: ERROR: Operation not supported on external tables Any ideas why? Thanks!
Accepted AnswerAmazon Redshift
1
answers
0
votes
35
views
asked 2 months ago
I have an array which is stored inside s3 bucket that looks like ``` [ { "bucket_name": "ababa", "bucket_creation_date": "130999", "additional_data": { "bucket_acl": [ { "Grantee": { "DisplayName": "abaabbb", "ID": "abaaaa", "Type": "CanonicalUser" }, "Permission": "FULL_CONTROL" } ], "bucket_policy": { "Version": "2012-10-17", "Id": "abaaa", "Statement": [ { "Sid": "iddd", "Effect": "Allow", "Principal": { "Service": "logging.s3.amazonaws.com" }, "Action": "s3:PutObject", "Resource": "aarnnn" }, { "Effect": "Deny", "Principal": "*", "Action": [ "s3:GetBucket*", "s3:List*", "s3:DeleteObject*" ], "Resource": [ "arn:aws:s3:::1111-aaa/*", "arn:aws:s3:::1111-bbb" ], "Condition": { "Bool": { "aws_SecureTransport": "false" } } } ] }, "public_access_block_configuration": { "BlockPublicAcls": true, "IgnorePublicAcls": true, "BlockPublicPolicy": true, "RestrictPublicBuckets": true }, "website_hosting": {}, "bucket_tags": [ { "Key": "keyyy", "Value": "valueee" } ] }, "processed_data": {} }, ....................... ] ``` NOTE- some of the field may be string/array/struct based on the data we get(eg actions can be array or string) END GOAL- I want to query inside this data and look for multiple conditions and then create a field inside processed_data and set it to true/false based on the query using AWS Glue Example- For each object inside the array, i want to check : ``` 1- if bucket_acl has grantee.type=CanonicalUser and Permission=FULL_CONTROL AND 2- if bucket_policy has statement that contains Effect=Allow and Principal=* and Action = ...... and Resources = ...... and condition is empty AND 3- website_hosting is empty and then create a field inside processes_data and set it to true if the above query satisfies eg- processed_data:{ isPublic: True} ``` Approaches I Tried: 1- I tried saving the data in s3 bucket in parquet format using aws-wrangler/aws-pandas for faster querying and then getting the data in aws glue using glue dynamic frame: ``` S3bucket_node1 = glueContext.create_dynamic_frame.from_options( format_options={}, connection_type="s3", format="parquet", connection_options={"paths": ["s3://abaabbb/abaaaaa/"], "recurse": True}, transformation_ctx="S3bucket_node1", ) S3bucket_node1.printSchema() S3bucket_node1.show() ``` Output: ``` root |-- bucket_name: string |-- bucket_creation_date: string |-- additional_data: string |-- processed_data: string {"bucket_name": "abaaaa", "bucket_creation_date": "139999", "additional_data": "{'bucket_acl': [{'Grantee': {'DisplayName': 'abaaaaaa', 'ID': 'abaaa', 'Type': 'CanonicalUser'}, 'Permission': 'FULL_CONTROL'}], 'bucket_policy': {}, 'public_access_block_configuration': {'BlockPublicAcls': True, 'IgnorePublicAcls': True, 'BlockPublicPolicy': True, 'RestrictPublicBuckets': True}, 'website_hosting': {}, 'bucket_tags': []}", "processed_data": "{}"} ``` Getting everything as string, seems like most of these libraries doesn't support nested data types 2- Tried saving the data as it is(in json) using put object API and then getting the data in aws glue using glue dynamic frame: ``` piece1 = glueContext.create_dynamic_frame.from_options( format_options={"multiline": True}, connection_type="s3", format="json", connection_options={"paths": ["s3://raghav-test-df/raghav3.json"], "recurse": True}, transformation_ctx="S3bucket_node1", ) piece1.printSchema() piece1.show() piece1.count() ``` Output: ``` root 0 ``` Getting no schema and count as 0 3- Tried getting the data using spark data frame: ``` sparkDF=spark.read.option("inferSchema", "true").option("multiline", "true").json("s3://ababa/abaa.json") sparkDF.printSchema() sparkDF.count() sparkDF.show() ``` Output- ``` root |-- additional_data: struct (nullable = true) | |-- bucket_acl: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- Grantee: struct (nullable = true) | | | | |-- DisplayName: string (nullable = true) | | | | |-- ID: string (nullable = true) | | | | |-- Type: string (nullable = true) | | | |-- Permission: string (nullable = true) | |-- bucket_policy: struct (nullable = true) | | |-- Id: string (nullable = true) | | |-- Statement: array (nullable = true) | | | |-- element: struct (containsNull = true) | | | | |-- Action: string (nullable = true) | | | | |-- Condition: struct (nullable = true) | | | | | |-- Bool: struct (nullable = true) | | | | | | |-- aws:SecureTransport: string (nullable = true) | | | | | |-- StringEquals: struct (nullable = true) | | | | | | |-- AWS:SourceAccount: string (nullable = true) | | | | | | |-- AWS:SourceArn: string (nullable = true) | | | | | | |-- aws:PrincipalAccount: string (nullable = true) | | | | | | |-- s3:x-amz-acl: string (nullable = true) | | | | |-- Effect: string (nullable = true) | | | | |-- Principal: string (nullable = true) | | | | |-- Resource: string (nullable = true) | | | | |-- Sid: string (nullable = true) | | |-- Version: string (nullable = true) | |-- bucket_tags: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- Key: string (nullable = true) | | | |-- Value: string (nullable = true) | |-- public_access_block_configuration: struct (nullable = true) | | |-- BlockPublicAcls: boolean (nullable = true) | | |-- BlockPublicPolicy: boolean (nullable = true) | | |-- IgnorePublicAcls: boolean (nullable = true) | | |-- RestrictPublicBuckets: boolean (nullable = true) |-- bucket_creation_date: string (nullable = true) |-- bucket_name: string (nullable = true) ``` Getting the schema and correct count, but some of the field has different data types(eg actions can be string or array) and spark makes them default to string, i think querying the data based on multiple conditions using sql will be too complex Do i need to change the approach or something else, i am stuck here Can someone please help in achieving the end goal?
1
answers
0
votes
84
views
asked 2 months ago
Hi All, In one of our Redshift database most of the columns being stored as length 500 bytes even it holds real data of length less than 50 bytes or smaller. Understanding was that , as Redshift compresses the columns so the physical disk storage will get rid of those blank Avg 450 bytes(500bytes-50bytes) spaces automatically and only consume the compressed version of ~50 bytes of actual data, so it wont harm us anyway even we define larger length for a column. However after seeing below doc,it seems the assumption is simply wrong. It says, during query execution Redshift parks the intermediate results in temporary table format in each compute nodes memory and that data is stored as uncompressed so it will consume the defined data type size rather the actual data length or compressed data length. https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-smallest-column-size.html Want to understand from experts here , 1) If its a real issue and if there exists any way to OR any ready-made dictionary/system table/views exists which logs all such tables/columns which we need to fix to avoid performance overhead? 2)And also we ran some sample query to test the above behaviour and verified results from SVL_QUERY_REPORT and SVL_QUERY_SUMMARY, it shows 75% degradation in performance. Not sure if its the correct way of checking? or any other possible way to verify the performance (or DB resource usage like CPU, I/O, Memory consumption) for a query execution in Redshift. Please guide me here.
2
answers
0
votes
40
views
asked 2 months ago
Until about two days ago, I used to be able to see queries that are currently executing (filtering by running is always empty). Now, no matter how much activity is occurring, queries don't show up until after they are finished. Querying tables like `stv_inflight` and `stv_query_metrics` does allow to get the information I need, but it was nice when it was all available on the console. Is this a bug or did something possibly change on my side?
1
answers
0
votes
35
views
Jon
asked 2 months ago
Pretty basic newbie redshift question here. Wanting to upload historical test data to a redshift database and the data as we have it is in multiple csv's and is formatted in typical table format with columns set as like this: | Time | Var1 | Var2 | Var3 | | 0 | 2 | 3 | 4 | | 1 | 4 | 5 | 6 | | 2 | 6 | 7 | 8 | In the past for other projects we've had data imported to redshift using COPY from s3 and the data was already formatted like below, which I've read is more efficient for querying in redshift: | Time | Var | Value | | 0 | Var1 | 2 | | 0 | Var2 | 3 | | 0 | Var3 | 4 | | 1 | Var3 | 4 | | 1 | Var2 | 5 | | 1 | Var3 | 6 | | 2 | Var1 | 6 | | 2 | Var2 | 7 | | 2 | Var3 | 8 | For this new data do we need to be transforming it to match the columnar setup before uploading to redshift? Are there situations where this reformatting is not necessary?
2
answers
0
votes
44
views
asked 2 months ago
Following this blog: https://docs.aws.amazon.com/redshift/latest/dg/c_serial_isolation.html, it states "A database snapshot is created within a transaction on the first occurrence of most SELECT statements, DML commands such as COPY, DELETE, INSERT, UPDATE". We have a requirement for Redshift snapshot isolation to create a snapshot also on RENAME commands. Is this supported? If not, how to approach this if we have concurrent transactions with RENAME commands.
1
answers
0
votes
29
views
profile picture
AWS
asked 2 months ago
I found SLA for Redshift (clustered) https://aws.amazon.com/redshift/sla/ but I can't find anything related to Serverless. Is it the same as Redshift multi-node? Could you help me with it?
1
answers
0
votes
77
views
asked 2 months ago