Questions tagged with Data Lakes
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
Hello,
As part of a SaaS solution, I'm currently setting up the structure for a S3 bucket which will contian multiple clients' data.
The idea is to use one access point per client, in order to...
1
answers
1
votes
551
views
asked 2 years agolg...
Are governed tables insert/append only? Is it possible to update data already in the table? ...
2
answers
0
votes
1610
views
asked 2 years agolg...
With the Glue Console (Glue 3.0 - python and spark), I'm need to overwrite the data of an S3 bucket in a automated daily process. I tried with the `glueContext.purge_s3_path( ...
2
answers
0
votes
5483
views
asked 2 years agolg...
I've been trying this for a week but I'm starting to give up - I need some help understanding this. I have an S3 bucket full of XML files, and I am creating a pyspark ETL job to convert them to...
3
answers
0
votes
1809
views
asked 2 years agolg...
We have a BI feature where a web app which uses non-aws authentication queries Athena for data which is hive partitioned by customer. Currently any BI query gets modified to filter data down to just...
2
answers
0
votes
997
views
asked 2 years agolg...
When defining blueprints in AWS Lake Formation, can we specify a particular snapshot? Does Lake Formation always uses the recent snapshot by default?
2
answers
0
votes
516
views
asked 3 years agolg...
A customer is interested in doing analytics using the data stored in multiple platforms like NetSuite ERP and Magenta (RDS MariaDB db backend) in AWS. They are looking to integrate the data (about 8...
1
answers
0
votes
717
views
asked 4 years agolg...
IHAC that uses Ab Initio on an enterprise scale to do on-prem ETL workloads. They are now wanting to build a Data Lake on AWS, and would prefer using this already established tool to write from source...
1
answers
0
votes
2214
views
asked 4 years agolg...
Customer is going to implement a DataLake solution in an AWS region without LakeFormation and they want to implement the **Column Level Access** with AWS native IAM/Services. Is there any work around...
1
answers
0
votes
567
views
asked 4 years agolg...
Hi,
AWS Glue Crawlers with CSV and XML Classifiers and works well with files encoded in UTF-8 but not with file encoded in UTF-16.
Public documentation does not clarify this point:
- Do Glue...
1
answers
0
votes
990
views
asked 4 years agolg...
My customer has a **2 - 4 nodes of dc2.8 xlarge** Redshift cluster and they want to export data to parquet in the optimal size (~1GB) per file with option (MAXFILESIZE AS 1GB). But the engine somehow...
1
answers
0
votes
1642
views
asked 4 years agolg...
My customer is looking to provide fine-grained access from Redshift Spectrum to data governed by Lake Formation. They are wondering how access can be controlled and if this can be done through users...
1
answers
0
votes
1227
views
asked 5 years agolg...