Questions tagged with Analytics

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Hi community, I am trying to perform an ETL job using AWS Glue. Our data is stored in MongoDB Atlas, inside a VPC. Our AWS is connected to our MongoDB Atlas using VPC peering. To perform the ETL job in AWS Glue I have first created a connection using the VPC details and the mongoDB Atlas URI along with the password and username. The connection is used by the AWS Glue crawlers to extract the schema to AWS Data Catalog Tables. This connection works! However, I am then attempting to perform the actual ETL job using the following pySpark code: #My Temp Variables source_database="d*********a" source_table_name="main_businesses source_mongodb_db_name = "main" source_mongodb_collection = "businesses" glueContext.create_dynamic_frame.from_catalog(database=source_database,table_name=source_table_name,additional_options = {"database": source_mongodb_db_name,"collection":source_mongodb_collection}) However the connection times out and for some reason mongodb atlas is blocking the connection from the ETL job. It's as if the ETL Job is using the connection differently than the crawler does. Maybe the ETL Job is not able to run the job inside our AWS VPC that is connected to the MongoDB Atlas VPC (VPC Peering is not possible?). Does anyone have any idea what might be going on or how I can fix this? Thank you!
0
answers
0
votes
6
views
asked 17 hours ago
Hi, I'm creating a dashboard for operators to download the athena query results. The ID column values contain hyphens `-` and For example, if table contains the following data | id | name | | --- | --- | | `-xyz` | `First example` | | `a-b-c` | `Second example` | The generated csv contains a extra single quote in the id column at the first row ```csv "id","name" "'-xyz","First example" "a-b-c","Second example" ``` Is there any way to avoid it?
0
answers
0
votes
16
views
hota
asked 4 days ago
I have a KPI visual that displays the count of records I have from a dataset. Is it possible for me to make that KPI show me all the records that were included in this count?
0
answers
0
votes
3
views
asked 5 days ago
In Redshift, I'm trying to update a table using another table from another database. The error details: SQL Error [XX000]: ERROR: Assert Detail: ----------------------------------------------- error: Assert code: 1000 context: scan->m_src_id == table_id - query: 17277564 location: xen_execute.cpp:5251 process: padbmaster [pid=30866] The context is not helpful. I have used a similar join based approach for other tables and there the update statement has been working fine. Update syntax used: ``` UPDATE ods.schema.tablename SET "TimeStamp" = GETDATE(), "col" = S."col", FROM ods.schema.tablename T INNER JOIN stg.schema.tablename S ON T.Col = S.Col; ```
1
answers
0
votes
19
views
asked 6 days ago
Hello, I am running a job to apply an ETL on a semi-colon-separated CSV on S3, however, when I read the file using the DynamicFrame feature of AWS and try to use any method like `printSchema` or `toDF`, I get the following error: ``` py4j.protocol.Py4JJavaError: An error occurred while calling o77.schema. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (52bff5da55da executor driver): com.amazonaws.services.glue.util.FatalException: Unable to parse file: s3://my-bucket/my-file.csv ``` I have already verified the codification, it is UTF-8 so there should be no problem. When I read the CSV using `spark.read.csv`, it works fine, and the Crawlers can also recognize the schema. The data has some special characters that shouldn't be there, and that's part of the ETL I am looking to perform. Neither using the `from_catalog` nor `from_options` function from AWS Glue works, the problem is the same whether I run the job locally on docker or Glue Studio. My data have a folder date partition so I would prefer to avoid using directly Spark to read the data and take advantage of the Glue Data Catalog as well. Thanks in advance.
1
answers
0
votes
32
views
Aftu
asked 9 days ago
We have been able to connect to a Microsoft SQL Server DB using both Glue's DynamicFrame and Spark's own JDBC write option due to the Glue connection option. However, considering the transactional nature of the data, we wanted to switch this to sending data programmatically using a python library, thus allowing us to move from Glue ETL Job to Python Shell Job. Our initial option was pyodbc, however, due to our inability to integrate the required driver with Glue, we were unsuccessful. Another option that we looked at was pymssql. Our experience with connecting to Microsoft SQL server using pymssql was seamless. However, it was restricted to Python 3.6 and we were unable to import it with Python 3.9. Our requirement for Python 3.9 was due to the boto3 version compatibility with the said python version as we are also trying to access the boto3 redshift-data client in the same script. Having considered all of the above, is it possible to query Microsoft SQL Server using a python library with Python 3.9 in AWS Glue Python Shell Job?
1
answers
0
votes
39
views
asked 9 days ago
Hello. I'm trying to update my stack on cloud formation to add custom vocabulary for AWS transcribe. every time I update the stack it rollback changes and the status changed to UPDATE_ROLLBACK_COMPLETE how can I know the reason of rolling back? I need to apply my change Can anyone tell me what should I do? regards,
1
answers
0
votes
29
views
asked 9 days ago
Hello I have JSON files for jobs created and extracted from Amazon transcribe and I need to add it to cloud formation stack of PCA so I found that on my PCA(post call analytics) dashboard there is not option to create a new data source to add the JSON file to it How can I add new data source (JSON files) from transcribe to PCA?? and How can I display my new jobs which created manually within transcribe on PCA as a data source?
0
answers
0
votes
6
views
asked 10 days ago
Hello I'm using Postcall analytics platform PCA to display analytics for call records which I used AWS transcribe to extract text from audio I need to add custom vocabulary for my AWS transcribe jobs to improve performance. I noticed that when I create custom job manually I can add the custom voc which I need but the final results is not added to PCA so how can display my created jobs to PCA?
0
answers
0
votes
10
views
asked 10 days ago
Hello, I'm using AWS transcribe to extract text from records I have a record in Arabic language but it also contains some English words and Names. When I create a Transcribe Job it skips the English words/Names or changes it to the wrong Arabic words. Can I have 2 languages in one transcribing job? also I added custom vocabulary for all records its not improving the performance I found a low performance in Arabic records text extraction, especially in Names, and English words within the Arabic record. So How can I improve my performance? Regards,
1
answers
0
votes
14
views
asked 10 days ago
New to glue and athena. I have a great toy example by an AWS community builder working. But in my real use case, I want to capture all the fields from an eventbridge event 'detail' section and have columns created. This is nested multiple levels. I can't figure out the schema discovery process. Tried to post a text file to S3 and have a glue crawler work on it but no luck. Thanks in advance.
1
answers
0
votes
10
views
asked 13 days ago
I've configured the Athena connection to a RDS SQL Server database using a JDBC driver. After choosing the Data Source the Database does not load and there is a "Network Failure" shown without any information. What might be the cause of such error or where I can find more information on how to solve such case? ![Enter image description here](/media/postImages/original/IMQajw0sQHTgyDgntZCvLPJg)
1
answers
0
votes
18
views
asked 13 days ago