By using AWS re:Post, you agree to the Terms of Use
/Analytics/

Analytics

AWS provides the broadest selection of analytics services that fit all your data analytics needs and enables organizations of all sizes and industries reinvent their business with data. From data movement, data storage, data lakes, big data analytics, machine learning, and anything in between, AWS offers purpose-built services that provide the best price-performance, scalability, and lowest cost.

Recent questions

see all
1/18

Amazon Athena error on querying DynamoDB exported data

**Background** We've configured an export to s3 from dynamodb using the native dynamodb s3 export, and ION as the format output. After this, we've created a table in Athena ``` CREATE EXTERNAL TABLE export_2022_07_01_v4 ( `_PK` STRING, URL STRING, Item struct< `_PK`:string, URL:string > ) ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe' WITH SERDEPROPERTIES ( "ignore_malformed_ion" = "true" ) STORED AS ION LOCATION '...'; ``` Querying this works all right for small simple queries, but attempting to produce a full output with ``` UNLOAD ( SELECT Item.URL FROM "helioptileexports"."export_2022_07_01_v4" WHERE Item.URL IS NOT NULL ) to '...' WITH (format = 'TEXTFILE') ``` Results in this error ``` HIVE_CURSOR_ERROR: Syntax error at line 1 offset 2: invalid syntax [state:STATE_BEFORE_ANNOTATION_DATAGRAM on token:TOKEN_SYMBOL_OPERATOR] This query ran against the "helioptileexports" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: f4ca5812-1194-41f1-bfda-00a1a5b2471b ``` **Questions** 1. Is there a way to make Athena more tolerant of formatting errors on specific files? As shown in the example, we are attempting without success to use `ignore_malformed_ion`. Is there anything beyond that that can be done? 2. Is this a bug on DynamoDB ION export process? 3. Is there any mechanism or logging to identify the files which have the malformed data and remove them?
0
answers
0
votes
14
views
asked 3 days ago

AWS Glue Studio Data Preview Fails instantly (Py4JJavaError)

Hi, I'm using AWS Glue Studio and once I click "data preview" it fails with the following error. The flow consists of 2 actions - PosgtreSQL JDBC Data source and Select Field action. The error is thrown instantly once the "Data Preview" button is clicked. The overall flow run successfully if i click "RUN" button - there are no errors and I get the outcome if I add the Target Source to dump the results back to PosgtreSQL table. It's just a Data Preview functionality that fails. Any idea what could be wrong and how to troubleshoot it? > Py4JJavaError: An error occurred while calling o538.getSampleDynamicFrame. : java.lang.UnsupportedOperationException: empty.reduceLeft at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:180) at scala.collection.AbstractTraversable.reduceLeft(Traversable.scala:104) at scala.collection.TraversableOnce$class.reduce(TraversableOnce.scala:208) at scala.collection.AbstractTraversable.reduce(Traversable.scala:104) at com.amazonaws.services.glue.SparkSQLDataSource.com$amazonaws$services$glue$SparkSQLDataSource$$getPaths(DataSource.scala:724) at com.amazonaws.services.glue.SparkSQLDataSource$$anonfun$getSampleDynamicFrame$7.apply(DataSource.scala:799) at com.amazonaws.services.glue.SparkSQLDataSource$$anonfun$getSampleDynamicFrame$7.apply(DataSource.scala:793) at com.amazonaws.services.glue.util.FileSchemeWrapper$$anonfun$executeWithQualifiedScheme$1.apply(FileSchemeWrapper.scala:89) at com.amazonaws.services.glue.util.FileSchemeWrapper$$anonfun$executeWithQualifiedScheme$1.apply(FileSchemeWrapper.scala:89) at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWith(FileSchemeWrapper.scala:82) at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWithQualifiedScheme(FileSchemeWrapper.scala:89) at com.amazonaws.services.glue.SparkSQLDataSource.getSampleDynamicFrame(DataSource.scala:792) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)
0
answers
0
votes
13
views
asked 3 days ago

MWAA apache-airflow-providers-amazon DAGS can't import operator

I'm trying to use newest package for Amazon integration in MWAA. In this particular case I want to use `GlueJobOperator` which is a part of the latest `apache-airflow-providers-amazon` package. ([Link to the documentation](https://airflow.apache.org/docs/apache-airflow-providers-amazon/4.0.0/operators/glue.html)) MWAA Airflow version: ` 2.2.2 ` I added this to the *requirements.txt*: ``` apache-airflow-providers-amazon==4.0.0 ``` and tried to import it and use it like in the examples: ``` from airflow.providers.amazon.aws.operators.glue import GlueJobOperator glue_job = GlueJobOperator( task_id='airflow_id', job_name='job_name' wait_for_completion=True, script_location='bucket_script_prefix', s3_bucket='bucket_name', iam_role_name='iam_role' create_job_kwargs=job_arguments, script_args=script_arguments ) ``` Unfortunately, whenever DAG is parsed I get this error: ``` ... from airflow.providers.amazon.aws.operators.glue import GlueJobOperator ImportError: cannot import name 'GlueJobOperator' from 'airflow.providers.amazon.aws.operators.glue' (/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/glue.py) ``` It's not my first rodeo with MWAA and some extra packages, plugins etc. but I am lost. In this case, I tried many things, went through docs from cover to cover and I still couldn't find the reason. I verified in MWAA that packages were successfully installed both in the logs and in the UI with the version prompted in *requirements.txt*. | Package Name | Version | Description | | --- | --- | --- | | apache-airflow-providers-amazon |4.0.0 |Amazon integration (including Amazon Web Services (AWS)). | Fun fact: I'm using `S3Hook` in some other DAGs and it parses just fine. ``` from airflow.providers.amazon.aws.hooks.s3 import S3Hook s3_hook = S3Hook() ... ```
0
answers
0
votes
19
views
asked 4 days ago

Popular users

see all
1/18

Learn AWS faster by following popular topics

1/3