By using AWS re:Post, you agree to the Terms of Use
/Analytics/

Analytics

AWS provides the broadest selection of analytics services that fit all your data analytics needs and enables organizations of all sizes and industries reinvent their business with data. From data movement, data storage, data lakes, big data analytics, machine learning, and anything in between, AWS offers purpose-built services that provide the best price-performance, scalability, and lowest cost.

Recent questions

see all
1/18

Unable to perform OpenSearch text queries from Gremlin using AWS Lambda written in Javascript

I am syncing my AWS Neptune nodes in an AWS OpenSearch cluster as per the documentation https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search.html. The name of the OpenSearch index is amazon_neptune. The OpenSearch index type is _doc. Following is the index configuration ``` { "settings": { "number_of_shards": 1, "number_of_replicas": 1, "analysis": { "normalizer": { "useLowercase": { "type": "custom", "filter": "lowercase" } } } }, "mappings": { "properties": { "document_type" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "entity_id" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "entity_type" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "predicates": { "properties": { "content": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above" : 1000, "normalizer": "useLowercase" } } }, "visibilityType": { "type": "keyword" }, "status": { "type": "keyword" }, "type": { "type": "keyword" }, "firstName": { "type": "text", "fields": { "keyword": { "type": "keyword", "normalizer": "useLowercase" } } }, "lastName": { "type": "text", "fields": { "keyword": { "type": "keyword", "normalizer": "useLowercase", "ignore_above" : 1000 } } } } } } } } ``` Using the npm gremlin package, I'm trying to query my documents. Following is the code: ``` 'use strict'; const gremlin = require('gremlin'); exports.handler = async (event, context) => { try { const DriverRemoteConnection = gremlin.driver.DriverRemoteConnection; const Graph = gremlin.structure.Graph; const dc = new DriverRemoteConnection(<neptune_endpoint>,{}); const graph = new Graph(); const dbClient = graph.traversal().withRemote(dc); const res = await dbClient .withSideEffect("Neptune#fts.endpoint",<https_opensearch_endpoint>) .withSideEffect('Neptune#fts.queryType', 'term') .V().has("visibilityType","Neptune#fts PUBLIC") .toList(); console.log('res:', res); } catch(err) { console.error('Failed to query', err); } } ``` But I'm getting the following error ``` Failed to query ResponseError: Server error: {"detailedMessage":"method [POST], host [<https_opensearch_endpoint>], URI [/amazon_neptune/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 403 Forbidden]\n{\"Message\":\"User: anonymous is not authorized to perform: es:ESHttpPost\"}","requestId":"23a9e7d7-7dde-465b-bf29-9c59cff12e86","code":"BadRequestException"} (500) ``` I have given the following permission to my lambda ``` Type: AWS::IAM::Policy Properties: PolicyName: <Policy_Name> Roles: - 'Ref': <lambda_role> PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - es:ESHttpGet - es:ESHttpPost - es:ESHttpPut - es:ESHttpDelete Resource: <opensearch_cluster_arn> ``` My OpenSearch cluster as well as Neptune cluster are located inside the same VPC. My lambda is hosted inside the same VPC as well. Please help me in understanding why I'm getting the 403 error when I've given the proper reading permissions to my lambda. Any help would be highly appreciated.
1
answers
0
votes
31
views
asked a day ago

How to create dynamic dataframe from AWS Glue catalog in local environment?

I I have performed some AWS Glue version 3.0 jobs testing using Docker containers as detailed [here](https://aws.amazon.com/blogs/big-data/develop-and-test-aws-glue-version-3-0-jobs-locally-using-a-docker-container/). The following code outputs two lists, one per connection, with the names of the tables in a database: ```python import boto3 db_name_s3 = "s3_connection_db" db_name_mysql = "glue_catalog_mysql_connection_db" def retrieve_tables(database_name): session = boto3.session.Session() glue_client = session.client("glue") response_get_tables = glue_client.get_tables(DatabaseName=database_name) return response_get_tables s3_tables_list = [table_dict["Name"] for table_dict in retrieve_tables(db_name_s3)["TableList"]] mysql_tables_list = [table_dict["Name"] for table_dict in retrieve_tables(db_name_mysql)["TableList"]] print(f"These are the tables from {db_name_s3} db: {s3_tables_list}\n") print(f"These are the tables from {db_name_mysql} db {mysql_tables_list}") ``` Now, I try to create a dynamic dataframe with the *from_catalog* method in this way: ```python import sys from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame source_activities = glueContext.create_dynamic_frame.from_catalog( database = db_name, table_name =table_name ) ``` When `database="s3_connection_db"`, everything works fine, however, when I set `database="glue_catalog_mysql_connection_db"`, I get the following error: ```python Py4JJavaError: An error occurred while calling o45.getDynamicFrame. : java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver ``` I understand the issue is related to the fact that I am trying to fetch data from a mysql table but I am not sure how to solve this. By the way, the job runs fine on the Glue console. I would really appreciate some help, thanks!
0
answers
0
votes
12
views
asked 2 days ago

Simple Join query errors out with "table not found"

I'm trying to join a simple "spine" table with a "feature" table. The spine table may or may not have additional columns besides the join key. If there are additional columns (in this case, random_column), the following query works fine (please note that these queries are self-sufficient. they don't depend on any external tables - all tables are inline created by the query itself): ``` with spine_table as (select 'abc' as user_id, 'random_value' as random_column), feature_table as (select 'abc' as user_id, '123' as feature_column) select user_id, spine_table.*, feature_column from spine_table join feature_table using (user_id) ``` If, however, there's no additional column, the query throws an exception: ``` with spine_table as (select 'abc' as user_id), feature_table as (select 'abc' as user_id, '123' as feature_column) select user_id, spine_table.*, feature_column from spine_table join feature_table using (user_id) ``` `Error: SYNTAX_ERROR: line 5:17: Table 'spine_table' not found` The second query works fine if I omit spine_table. : with spine_table as ``` (select 'abc' as user_id), feature_table as (select 'abc' as user_id, '123' as feature_column) select user_id, feature_column from spine_table join feature_table using (user_id) ``` The problem is that my application dynamically generates the query, and it doesn't know ahead of time whether there are additional columns in the spine_table besides the join keys.
0
answers
0
votes
3
views
asked 2 days ago

AWS Glue problems reading from PostgreSQL DB that has uppercased table and column names

I have an RDS PostgreSQL database that has table names and column names with upper cased characters. I have created a glue crawler that connects to the database via jdbc and populates the glue data catalog with the database schemas but in that translation the upper case characters are converted to lower case characters when stored in the data catalog tables. When I run a glue job to query I get this error: An error occurred while calling o94.getDynamicFrame. ERROR: column "id" does not exist I made a copy of this table and changed the table names and column names to have all lower case characters and the same glue jobs and queries run successfully. Changing the table names and column names to lower cased characters in our production environment is just not an option due to the extensive work it would require. I found the 'Edit Schema' option in the Glue UI where you can change column names and data types and thought for a moment the solution had been found. However, when you change a character to upper case and then select "Save" it is reverted to lower case as it is saved. I have edited the pyspark script directly and worked with the glueContext.create_dynamic_frame.from_catalog method using the additional_options parameter to build my select statement using upper and lower case characters but that still fails with the error message noted above. # Script generated for node PostgreSQL table PostgreSQLtable_node1 = glueContext.create_dynamic_frame.from_catalog( database="mydatabase", table_name="mytable", additional_options={"query":"SELECT id from mytable;"}, transformation_ctx="PostgreSQLtable_node1" I believe the failure is because the schema as it is stored in the data catalog contains lower characters while the actual schema in the database is upper characters so when Glue tries to work with the table it is looking for "id" while the actual is "ID" and so "not found" is returned. I have read about the CaseSensitive option and looking in that direction next for a solution. I have not seen any recent (less than couple years old) posts about this issue so not sure if I'm missing something. Any assistance would be greatly appreciated.
1
answers
0
votes
20
views
asked 4 days ago

Popular users

see all
1/18

Learn AWS faster by following popular topics

1/3