Questions tagged with Analytics
Content language: English
Sort by most recent
Is it possible to alias the attributes of an IoT Sitewise asset, so that we can update the attributes dynamically for many devices ? Say I have many IoT things , modelled as assets, iot-car. Cars have a 'color' attribute, string, which is updated very rarely, and so isn't really timeseries data, but a car an be painted once per year maybe. I know I can manually change the attribute via the cli or console, and I know I can automatically update it from an IoT Rule if I specify the Asset ID and the property ID (and I can see that the 'color' property ID is the same for all my different cars). Is it possible, using aliasing/substitution/similar, to have cars automatically update themselves, without requiring a different rule per car ? As I have to specify the property AND asset ID in the rule it seems like I can't.
Hi, when running a query with Athena I'm getting this error. I'm thinking it might be the DefaultConnectionString, any advice will help. DefaultConnectionString: postgres://jdbc:postgresql://xxxxx-dbstack-xxxxxx-mydb-xxxxxxxx.xxxxxxxx.us-west-2.rds.amazonaws.com:5432/salesdb?user=xxxxxx&password=xxxxxxxxx GENERIC_USER_ERROR: Encountered an exception[java.lang.RuntimeException] from your LambdaFunction[arn:aws:lambda:us-west-2:xxxxxxxxxxx:function:postgresqlconnector] executed in context[retrieving meta-data] with message[Could not find table in dms_sample] This query ran against the "pg_catalog" database, unless qualified by the query. Please post the error message on our forum
I just want to compare something like ```post tag ["bird","fruits","books"]``` with ```user interest ["cars","software","laptop","movies","books","coding"]``` look one element of each array is matching but I want to create a sort so that I can get the psots related to intrest.
Hi , Im getting below error after running the query. This query ran against the "xxxx" database, unless qualified by the query. Please post the error message on our
As of this writing, applying a background color using a **gradient** through conditional formatting of a numeric (DECIMAL or INTEGER) column in a Table visual causes the table to render with the message: > The data type of a field used in this visual has changed too much for QuickSight to automatically update your analysis. You can adjust field data types by editing or replacing the current dataset. This is a new issue that has appeared in recent days and is affecting a QuickSight dashboard that was published months ago and used heavily just last month. As a workaround, I've removed conditional formatting from all columns in the table, and the table is now successfully rendering. Here is an example using a subset of the data in a new analysis: ![Table with one DECIMAL and one INTEGER column](/media/postImages/original/IM-CuEKZFSRrO3fF_Q35ItLA) And here is the same table once a gradient is applied to one of the columns. The issue appears regardless of the column type chosen. ![Table with a 1-5 gradient applied to one column](/media/postImages/original/IMnxplOmIMRrKE2v_cOaUiNA)
When using Athena ODBC on Windows Computer/Laptop, is it possible to use IAM role instead of creating IAM access key and secret access key? Is it even possible? unable to get clear mention about this.
We run our ETLs using the below architecture to populate the datalake : MySQL -> DMS -> S3 -> Glue -> S3 Though this architecture works fine , its heavily dependent on the Database. Also the object data is scattered across multiple tables . An ETL based on object data could be another way to retain the object structure and extract information from . Below is what I am thinking : Application -> Kinesis Firehose -> S3 -> Glue ->S3 Has anyone tried this ? Any pros/ cons / architecture documentation would be helpful. Note : At this point we do not have any real time data requirement, but might need in future . Let me know if there is any other information required .
Hello, By default Glue run one executor per worker. I want to run more executors in worker. I have set following spark configuration in Glue params but It didn't work. `--conf : spark.executor.instances=10` Let's say I have 5 G.2X workers. In that case It starts 4 executors because 1 will be reserved for driver. I can see list of all 4 executors in Spark UI. But above configuration does not increase executors at all. I'm getting following warning in driver logs. Seems like glue.ExecutorTaskManagement controlling number of executors. `WARN [allocator] glue.ExecutorTaskManagement (Logging.scala:logWarning(69)): executor task creation failed for executor 5, restarting within 15 secs. restart reason: Executor task resource limit has been temporarily hit` Any help would be appreciated. Thanks!
Hi there, i'm just able to see two out of four columns in athena. I don't know the reason but could it be because of my glue schema version? I tried changing the precicision and scale number but it didn't work. ![Athena output](/media/postImages/original/IMc0QRhlHeRai1GfeLDITGPQ) ![Glue Schema](/media/postImages/original/IMwByedqIpQMqRteDHa0L6xw) ![SQL-Query](/media/postImages/original/IMtTq4UnkXRLCk5BsYFwN3yQ) ![CSV-File-originalData](/media/postImages/original/IMGQWkeeRTRQapGVZJyc7Y0g) ![expanded table](/media/postImages/original/IM0PzaGq-9Rrqyp8hJS16mEw) ![Glue Jobs ETL](/media/postImages/original/IMHmQqzLBnTgSzVai6L6056g) I hope someone is able to help me out. Thanks in advance! - Ellie
I'm evaluating different BI solutions and I have a specific requirement. Our setup has *multiple DS with the same schema*, e.g. Customer1DB, Customer2DB, etc. **Can multiple DBs ingested all in the same dataset?**
Onboarded native Delta table using ``` CREATE EXTERNAL TABLE [table_name] LOCATION '[s3_location]' TBLPROPERTIES ( 'table_type'='DELTA' ); ``` Works great when I query it. However, when I run ``` drop table [table_name] ``` I get the following error: "Routed statement type 'DROP_TABLE' to DeltaLakeDDLEngine, expected to route to DATACATALOG_DDL_ENGINE"
Hi all, I have this one query that started costing me way too much, it went unnoticed for almost a year, but now costing me $5k a month. I have this rather large table, 750GB, with 3 partitions, let's call them colA, colB, colC. I counted the number of partitions for each of those columns and it goes like this respectively (2993, 14030, 16520). There's a hierarchy in those partitions, A -> B -> C. When I query this table, I always use those 3 partitions, in that order (even if that doesn't matter). My data is HIVE partitioned in S3 and in parquet SNAPPY compression format. Now, I have a little 3MB fact table that tells me the latest partition added since a certain timestamp. Normally this resolves to 1 or 2 partitions. I use this table to do a RIGHT JOIN with the table above so I can filter out the results just to the partitions of interest. Here goes my interesting experiment and results. The following query uses the partitions correctly: ```sql WITH filter_result AS( SELECT * FROM my_fact_table WHERE timestamp > 1671633935 ) SELECT * FROM my_large_table l RIGHT JOIN filter_result fr ON l.colA = fr.colA AND l.colB = fr.colB AND l.colC = fr.colC WHERE l.colA IN ('ABC') AND l.colB IN ('DEF') AND l.colC IN ('GHI') ``` but this ones, does a full table scan every time. ```sql WITH filter_result AS( SELECT * FROM my_fact_table WHERE timestamp > 1671633935 ) SELECT * FROM my_large_table l RIGHT JOIN filter_result fr ON l.colA = fr.colA AND l.colB = fr.colB AND l.colC = fr.colC ``` I have tried pretty much every order of operation possible. Using Athena Engine Version3. I cannot push the filter predicate no matter what I try. I always ends up scanning the 750GB of data, $4 a pop for querying. I cannot do the version that works, because that's now how my query works, I need the fact table to filter out the results of the second table. This to me seems like a pretty standard use of a SQL engine. If Athena is not capable of doing this, at the very least it is worrying and makes me rethink my choice of technology when it comes to push this sort of workloads.