Questions tagged with Analytics

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

We have been able to connect to a Microsoft SQL Server DB using both Glue's DynamicFrame and Spark's own JDBC write option due to the Glue connection option. However, considering the transactional nature of the data, we wanted to switch this to sending data programmatically using a python library, thus allowing us to move from Glue ETL Job to Python Shell Job. Our initial option was pyodbc, however, due to our inability to integrate the required driver with Glue, we were unsuccessful. Another option that we looked at was pymssql. Our experience with connecting to Microsoft SQL server using pymssql was seamless. However, it was restricted to Python 3.6 and we were unable to import it with Python 3.9. Our requirement for Python 3.9 was due to the boto3 version compatibility with the said python version as we are also trying to access the boto3 redshift-data client in the same script. Having considered all of the above, is it possible to query Microsoft SQL Server using a python library with Python 3.9 in AWS Glue Python Shell Job?
1
answers
0
votes
69
views
asked 19 days ago
Hello. I'm trying to update my stack on cloud formation to add custom vocabulary for AWS transcribe. every time I update the stack it rollback changes and the status changed to UPDATE_ROLLBACK_COMPLETE how can I know the reason of rolling back? I need to apply my change Can anyone tell me what should I do? regards,
1
answers
0
votes
35
views
asked 19 days ago
Hello I have JSON files for jobs created and extracted from Amazon transcribe and I need to add it to cloud formation stack of PCA so I found that on my PCA(post call analytics) dashboard there is not option to create a new data source to add the JSON file to it How can I add new data source (JSON files) from transcribe to PCA?? and How can I display my new jobs which created manually within transcribe on PCA as a data source?
0
answers
0
votes
6
views
asked 20 days ago
Hello I'm using Postcall analytics platform PCA to display analytics for call records which I used AWS transcribe to extract text from audio I need to add custom vocabulary for my AWS transcribe jobs to improve performance. I noticed that when I create custom job manually I can add the custom voc which I need but the final results is not added to PCA so how can display my created jobs to PCA?
0
answers
0
votes
13
views
asked 20 days ago
Hello, I'm using AWS transcribe to extract text from records I have a record in Arabic language but it also contains some English words and Names. When I create a Transcribe Job it skips the English words/Names or changes it to the wrong Arabic words. Can I have 2 languages in one transcribing job? also I added custom vocabulary for all records its not improving the performance I found a low performance in Arabic records text extraction, especially in Names, and English words within the Arabic record. So How can I improve my performance? Regards,
1
answers
0
votes
19
views
asked 20 days ago
New to glue and athena. I have a great toy example by an AWS community builder working. But in my real use case, I want to capture all the fields from an eventbridge event 'detail' section and have columns created. This is nested multiple levels. I can't figure out the schema discovery process. Tried to post a text file to S3 and have a glue crawler work on it but no luck. Thanks in advance.
1
answers
0
votes
10
views
asked 23 days ago
I've configured the Athena connection to a RDS SQL Server database using a JDBC driver. After choosing the Data Source the Database does not load and there is a "Network Failure" shown without any information. What might be the cause of such error or where I can find more information on how to solve such case? ![Enter image description here](/media/postImages/original/IMQajw0sQHTgyDgntZCvLPJg)
1
answers
0
votes
22
views
asked 23 days ago
I am trying to use the AWS Glue Studio to build a simple ETL workflow. Basically, I have a bunch of `csv` files in different directories in S3. I want those csvs to be accessible via a database and have chosen Redshift for the job. The directories and will be updated every day with new csv files. The file structure is: YYYY-MM-DD (e.g. 2023-03-07) |---- groupName1 |---- groupName1.csv |---- groupName2 |---- groupName2.csv ... |---- groupNameN |---- groupNameN.csv We will be keeping historical data, so every day I will have a new date-based directory. I've read that AWS Glue can automatically copy data on a schedule but I can't see my Redshift databases or tables (screenshot below). I'm using my AWS admin account and I do have `AWSGlueConsoleFullAccess` permission (screenshot below) ![Enter image description here](/media/postImages/original/IMLGj4xk83RSiWw_X-q368iA) ![Enter image description here](/media/postImages/original/IMdY_iM6ckSMOvvFgXb7FRsw)
1
answers
0
votes
13
views
asked 24 days ago
Hello, I'm looking to push Salesforce data to a Data Lake. This data lake table needs to hold different versions of the record. I experimented with AppFlow, but I couldn't really get the control I wanted over the process (mainly notifying when an event came in). To cover the requirement of storing the changes I'm thinking of implementing Iceberg or Hudi. In addition to storing the data for analytics, there are some additional requirements to push data back to Salesforce in more of a real-time nature. Because of this, I've created an EventBridge rule to capture Salesforce CDC events in realtime. The plan is to take those events and process those back into the lake. I was thinking of just sending the event to Lambda and then using Athena to update the Iceberg/Hudi table. One of the difficulties is since the stream is only what changed I have to query up Athena to get the last full record and then overlay the changes. Good solution? Bad solution? Other things to try? We are a small company so data volume isn't huge so I'm not wanting to overengineer this process in the amount of work or $$$. My team is primarily database developers so anything we can do to stay more sql-like is a plus. Thoughts?
1
answers
0
votes
22
views
Blake
asked 24 days ago
Hi AWS, What is the best practices during the process of reading messages from topic and putting into S3, will there be a rare condition where the message can be missed or not posted due to some unknown error ?. In this case can we use a kafka topic in between the compute that would be reading from and posting it to S3— what is the pros and cons here? So even if a record is missed that can be fetched later also we do not need a second custom application assuming AWS kafka-S3 sink connect ?
0
answers
0
votes
32
views
profile picture
asked a month ago
I am trying to analyze CloudFront standard logs using Amazon Athena. I get the following error after running a query: GENERIC_INTERNAL_ERROR: S3 service error This query ran against the "<DatabaseName>" database, unless qualified by the query. Can anyone explain what this error means and how to resolve it?
0
answers
0
votes
30
views
asked a month ago
Data layer is not my thing and I need some guidance. I create a glue crawler to extract compressed JSON files and store them in an aws S3 bucket. I recently learned that I can use Athena to directly connect to the glue database. When I do select * from *table-name* It starts to load but then errors with long string of stuff HIVE_METASTORE_ERROR: Error: : expected at the position 407 of 'struct<http:struct<status_code:int,url_details:struct<path:string,queryString:struct<prefix:string,versioning:string,logging:string,encoding-type:string,nodes:string,policy:string,acl:string,policyStatus:string,replication:string,notification:string,tagging:string,website:string,encryption:string,size:string,limit:string,hash:string,accelerate:string,publicAccessBlock:string,code:string,protocol:string,G%EF%BF%BD%EF%BF%BD%EF%BF%BD\,%EF%BF%BD%EF%BF%BD%EF%BF%BD`~%EF%BF%BD%00%EF%BF%BD%EF%BF%BD{%EF%BF%BD%D5%96%EF%BF%BDw%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3C:string,cors:string,object- etc etc etc. I can load one small table but the others fail.
0
answers
0
votes
35
views
asked a month ago