Browse through the questions and answers listed below or filter and sort to narrow down your results.
Glue Crawler and Classifiers - Supported file encodings: is UTF16 supported?
Hi, AWS Glue Crawlers with CSV and XML Classifiers and works well with files encoded in UTF-8 but not with file encoded in UTF-16. Public documentation does not clarify this point: - Do Glue crawler and classifier support UTF-16? - Is there please an available documentation on supported encodings with Glue crawlers and classifiers? Best regards, Nicolas.
Redshift UNLOAD parquet file size
My customer has a **2 - 4 nodes of dc2.8 xlarge** Redshift cluster and they want to export data to parquet in the optimal size (~1GB) per file with option (MAXFILESIZE AS 1GB). But the engine somehow exported the total of 500MB files into **64 files** (average size from 5mb -25mb). My question: 1. How can we control the size per parquet file? 2. How does Redshift determine the optimal file size?
Redshift Spectrum Access to Lake Formation
My customer is looking to provide fine-grained access from Redshift Spectrum to data governed by Lake Formation. They are wondering how access can be controlled and if this can be done through users in Redshift or if this can only be done via IAM. From the various documentation I have looked at, it appears that you can only modify the policy on the Redshift cluster to access the data and that individual users cannot be restricted further. Documentation I've read through: https://docs.aws.amazon.com/lake-formation/latest/dg/tut-query-redshift.html https://docs.aws.amazon.com/redshift/latest/dg/r_REVOKE.html
PowerBI access to S3/Athena
Hi all, My customer mentioned that they have been able to connect and secure Alteryx directly to S3 . Can they connect Power BI directly to S3 or thru Athena or via some other method in order to do similar reporting? I know this is possible via Redshift, but their various analytics requirements may not necessitate the use of a DW. Thanks, Chris
Questions on Data lake using DMS
A customer is implementing a data lake and i am assessing using DMS for pulling data into s3 from on an premise MS SQL DB. As part of this i would like to understand the following points on DMS: 1. Other users of DMS - widely used? 2. Examples of using DMS with SQL Server Change Tracking 3. Volumetrics - how much can it handle 4. Running over VPN - any issues that you're aware of? 5. Known issues/limitations of DMS (big binary objects etc) 6. What ports/access (accounts) does it require for the on prem SQL server 7. What is the overhead on the on prem SQL server?