Questions tagged with Analytics
Content language: English
Sort by most recent
We have been able to connect to a Microsoft SQL Server DB using both Glue's DynamicFrame and Spark's own JDBC write option due to the Glue connection option. However, considering the transactional nature of the data, we wanted to switch this to sending data programmatically using a python library, thus allowing us to move from Glue ETL Job to Python Shell Job. Our initial option was pyodbc, however, due to our inability to integrate the required driver with Glue, we were unsuccessful. Another option that we looked at was pymssql. Our experience with connecting to Microsoft SQL server using pymssql was seamless. However, it was restricted to Python 3.6 and we were unable to import it with Python 3.9. Our requirement for Python 3.9 was due to the boto3 version compatibility with the said python version as we are also trying to access the boto3 redshift-data client in the same script. Having considered all of the above, is it possible to query Microsoft SQL Server using a python library with Python 3.9 in AWS Glue Python Shell Job?
Hello.
I'm trying to update my stack on cloud formation to add custom vocabulary for AWS transcribe.
every time I update the stack it rollback changes and the status changed to UPDATE_ROLLBACK_COMPLETE
how can I know the reason of rolling back? I need to apply my change Can anyone tell me what should I do?
regards,
Hello
I have JSON files for jobs created and extracted from Amazon transcribe and I need to add it to cloud formation stack of PCA
so I found that on my PCA(post call analytics) dashboard there is not option to create a new data source to add the JSON file to it
How can I add new data source (JSON files) from transcribe to PCA??
and How can I display my new jobs which created manually within transcribe on PCA as a data source?
Hello
I'm using Postcall analytics platform PCA to display analytics for call records which I used AWS transcribe to extract text from audio
I need to add custom vocabulary for my AWS transcribe jobs to improve performance.
I noticed that when I create custom job manually I can add the custom voc which I need but the final results is not added to PCA
so how can display my created jobs to PCA?
Hello,
I'm using AWS transcribe to extract text from records
I have a record in Arabic language but it also contains some English words and Names. When I create a Transcribe Job it skips the English words/Names or changes it to the wrong Arabic words. Can I have 2 languages in one transcribing job?
also I added custom vocabulary for all records its not improving the performance
I found a low performance in Arabic records text extraction, especially in Names, and English words within the Arabic record.
So How can I improve my performance?
Regards,
New to glue and athena.
I have a great toy example by an AWS community builder working. But in my real use case, I want to capture all the fields from an eventbridge event 'detail' section and have columns created. This is nested multiple levels. I can't figure out the schema discovery process. Tried to post a text file to S3 and have a glue crawler work on it but no luck.
Thanks in advance.
I've configured the Athena connection to a RDS SQL Server database using a JDBC driver. After choosing the Data Source the Database does not load and there is a "Network Failure" shown without any information. What might be the cause of such error or where I can find more information on how to solve such case?

I am trying to use the AWS Glue Studio to build a simple ETL workflow. Basically, I have a bunch of `csv` files in different directories in S3.
I want those csvs to be accessible via a database and have chosen Redshift for the job. The directories and will be updated every day with new csv files. The file structure is:
YYYY-MM-DD (e.g. 2023-03-07)
|---- groupName1
|---- groupName1.csv
|---- groupName2
|---- groupName2.csv
...
|---- groupNameN
|---- groupNameN.csv
We will be keeping historical data, so every day I will have a new date-based directory.
I've read that AWS Glue can automatically copy data on a schedule but I can't see my Redshift databases or tables (screenshot below). I'm using my AWS admin account and I do have `AWSGlueConsoleFullAccess` permission (screenshot below)


Hello, I'm looking to push Salesforce data to a Data Lake. This data lake table needs to hold different versions of the record. I experimented with AppFlow, but I couldn't really get the control I wanted over the process (mainly notifying when an event came in).
To cover the requirement of storing the changes I'm thinking of implementing Iceberg or Hudi. In addition to storing the data for analytics, there are some additional requirements to push data back to Salesforce in more of a real-time nature. Because of this, I've created an EventBridge rule to capture Salesforce CDC events in realtime.
The plan is to take those events and process those back into the lake. I was thinking of just sending the event to Lambda and then using Athena to update the Iceberg/Hudi table. One of the difficulties is since the stream is only what changed I have to query up Athena to get the last full record and then overlay the changes.
Good solution? Bad solution? Other things to try? We are a small company so data volume isn't huge so I'm not wanting to overengineer this process in the amount of work or $$$. My team is primarily database developers so anything we can do to stay more sql-like is a plus.
Thoughts?
Hi AWS,
What is the best practices during the process of reading messages from topic and putting into S3, will there be a rare condition where the message can be missed or not posted due to some unknown error ?. In this case can we use a kafka topic in between the compute that would be reading from and posting it to S3— what is the pros and cons here?
So even if a record is missed that can be fetched later also we do not need a second custom application assuming AWS kafka-S3 sink connect ?
I am trying to analyze CloudFront standard logs using Amazon Athena. I get the following error after running a query:
GENERIC_INTERNAL_ERROR: S3 service error
This query ran against the "<DatabaseName>" database, unless qualified by the query.
Can anyone explain what this error means and how to resolve it?
Data layer is not my thing and I need some guidance.
I create a glue crawler to extract compressed JSON files and store them in an aws S3 bucket. I recently learned that I can use Athena to directly connect to the glue database. When I do select * from *table-name* It starts to load but then errors with long string of stuff
HIVE_METASTORE_ERROR: Error: : expected at the position 407 of 'struct<http:struct<status_code:int,url_details:struct<path:string,queryString:struct<prefix:string,versioning:string,logging:string,encoding-type:string,nodes:string,policy:string,acl:string,policyStatus:string,replication:string,notification:string,tagging:string,website:string,encryption:string,size:string,limit:string,hash:string,accelerate:string,publicAccessBlock:string,code:string,protocol:string,G%EF%BF%BD%EF%BF%BD%EF%BF%BD\,%EF%BF%BD%EF%BF%BD%EF%BF%BD`~%EF%BF%BD%00%EF%BF%BD%EF%BF%BD{%EF%BF%BD%D5%96%EF%BF%BDw%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3C:string,cors:string,object- etc etc etc.
I can load one small table but the others fail.