Questions tagged with Analytics
Content language: English
Sort by most recent
All,
I recently updated an AVRO Schema. I have checked that the updates made to the schema are backward compatible; and they are. Next, I saved an avro file w/ the new schema to an s3 Bucket and created a crawler.
Please note that I have updated the crawler w/ the following configs:
- Update the table definition in the data catalog
- Update all new and existing partitions with metadata from the table
- Mark the table as deprecated in the data catalog
The crawler runs successfully and creates a table.
However, when I run an Athena table against the created table I get this issue:
`GENERIC_INTERNAL_ERROR: Unknown object inspector category: UNION`
I have checked Amazon Documentation related to [GENERIC_INTERNAL_ERROR](https://aws.amazon.com/premiumsupport/knowledge-center/athena-generic-internal-error/) issue types; but there's nothing mentioned about this type of issue. Any ideas on how to resolve this?
I have a database of large.csv files which I am trying to query but it does not fetch any results when I use the string variable with "like" function. Here is my query:
abc is declared as INT
pc is declared as STRING
-- View Example
CREATE OR REPLACE VIEW view_name AS
SELECT abc
FROM "test_log"
WHERE pc like 'IG%';
We require QuickSight dashboard usage and access data to analyze metrics on the dashboard itself and send out reports scheduled to send automatically via email.
Can anyone help us to get usage and access data from Quicksight and setup the dashboard usage and access summary?
Time Series OpenSearch Serverless Collections don't support specifying a document ID for ingested data. This will lead to duplicate data. Is there a way to dedup this data other than at query time, which is expensive, or is the time-series optimized version of collections already designed to group duplicate documents for efficient query? There is, I guess, also a data storage cost to consider, also. If I have 2 copies of all my data because I was unable to specify a document ID, this is far from ideal.
Hi all,
I wanted to ask if anyone had success in connecting to an AWS Redshift datashare's consumer DB from an analytic tool like Power BI. I am trying to connect my Redshift datashare's consumer DB to Power BI but while connecting I am only shown options to connect to the cluster's own DB and not the shared DB.
Anyone able to accomplish this successfully? Thanks.
Regards,
Raunak
I have a Quicksight Analysis that needs to be published as a Dashboard. The analysis is changing based on demands and I need to update and replace the contents of the existing dashboard. The only way that I know is to click the update and replace existing dashboard through the ui. Is there a way to automate this process?
Edit:
Follow up question. Ultimately what I wanted to do is to have the dashboard be refreshed alongside the refresh of the analysis (containing the refreshed dataset) without having to always run commands or click 'publish dashboard' in the ui. I'm looking for a way to trigger and automate that part. Thank you for your answers!
I've tried to UNLOAD() data from MySQL to S3 by Athena engine 3 and got an error: __INVALID_SESSION_PROPERTY: Unknown session property ***.manifest_location.__
*** - this is my Data source.
But it works with Athena engine 2. I can't find any info about this error.
UNLOAD (SELECT * FROM database.table)
TO 's3://my-bucket/table/'
WITH (format = 'PARQUET',compression = 'SNAPPY')
Hi @Everyone,
Could someone enlighten me?
1. What is the difference between a clustered index and a SORTKEY?
2. In redshift the concept of index has been substituted by SORTKEY, right?
3. Can I partition tables like in transactional DB in RedShift?
4. What consideration I should take when it comes to selecting between a SORTKEY or DISTKEY?
Sorry for asking several questions at the same time.
Highly appreciate the clarifications
What tables (svl, stl, etc) should I consider in order to filter out which local tables are used in sql executed in redshift? I am thinking of using regex to query stl_ddltext for local tables but it does not seem to work
Hello,
I just startet using AWS Quicksight and ran into a problem when trying to upload new datasets using SPICE.
According to the site on Manage Quicksight/ SPICE capacity my SPICE usage is with 297 GB way over my capacity limit of 31 GB. The Problem is that I already deleted all Datasets in Quicksight to get more SPICE capacity and this led to a gain of 14 MB of free capacity, but I am now wondering how should I free the remaining capacity of 297 GB?
I can not see an option to release capacity in the QuickSight console.

I'm following this tutorial to perform text analytics using UDFs.
https://aws.amazon.com/blogs/machine-learning/translate-and-analyze-text-using-sql-functions-with-amazon-athena-amazon-translate-and-amazon-comprehend/
The example where data are not saved works fine work fine in Athena v.3
USING
EXTERNAL FUNCTION detect_sentiment_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf',
EXTERNAL FUNCTION detect_entities_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf'
SELECT *,
detect_sentiment_all(review_body, language) AS sentiment,
detect_entities_all(review_body, language) AS entities
FROM amazon_reviews_with_language
WHERE language IN ('ar', 'hi', 'ko', 'zh-TW', 'ja', 'zh', 'de', 'pt', 'en', 'it', 'fr', 'es')
However, the example where a table is created to save the results does not work in Athena v.3, but it works in Athena v.2.
CREATE TABLE amazon_reviews_with_text_analysis WITH (format='parquet') AS
USING
EXTERNAL FUNCTION detect_sentiment_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf',
EXTERNAL FUNCTION detect_entities_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf'
SELECT *,
detect_sentiment_all(review_body, language) AS sentiment,
detect_entities_all(review_body, language) AS entities
FROM amazon_reviews_with_language
WHERE language IN ('ar', 'hi', 'ko', 'zh-TW', 'ja', 'zh', 'de', 'pt', 'en', 'it', 'fr', 'es')
It fails with an error:
line 2:1: mismatched input 'USING'. Expecting: '(', <query>
The question is how can I save data into a table when using external function in Athena v.3?
I've tried to use Glue Spark Job for very basic partitioning over GZIP JSON data about 50GB.
The reason for trying Glue Job is my data could have more than 100 partitions and it is not really convinians to do it in Athena.
The code is almost from AWS template.
```
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql.functions import concat, col, lit, substring
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
my_partition_predicate = "partition='main' and year='2022' and month='12' and day='17'"
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "my_db", table_name = "my_table", push_down_predicate = my_partition_predicate, transformation_ctx = "datasource0", additional_options = {"recurse": True})
datasource0 = datasource0.toDF()
datasource0.printSchema()
datasource0 = datasource0.withColumn("od", concat(substring('originalrequest',9,3), lit("_"), substring('originalrequest',12,3)))
datasource0 = DynamicFrame.fromDF(datasource0, glueContext, "datasource0")
datasink4 = glueContext.write_dynamic_frame.from_options(
frame = datasource0,
connection_type = "s3",
connection_options = {"path": "s3://my_b/glue4", "partitionKeys": ["od"] , "compression": "gzip"},
format = "json"
)
job.commit()
```
The job were executed with 60 DPUs and after 20 minutes it timed out... This failed execution is cost $8.8.
Meanwhile, totally the same job were done in Athena in about 2 minutes and cost $0.25.
Am I doing something wrong, or Athena (Presto) is leaps ahead of Spark in terms of speed and cost effectiveness?