Browse through the questions and answers listed below or filter and sort to narrow down your results.
Update/Create Governed Table from Glue job
Currently It's possible to create or update catalog tables from Glue job. https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html Two questions : 1. Is it possible to create or update Governed table from Glue job ? 2. How crawler can create Governed table ? Thanks!
HIVE_METASTORE_ERROR: Error: name expected at the position 10 of 'decimal(5, 2)' but ' ' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null)
Hello People !, I encountered follow problem when create table method on Lakeformation. But when create table method by Athena no problem. I have some columns int, string. But decimal is wrong can you help me ? thanks HIVE_METASTORE_ERROR: Error: name expected at the position 10 of 'decimal(5, 2)' but ' ' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null) This query ran against the "db_name" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: fcd81991-c7e0-4a4c-80e7-5c4bde2f9f32
lake formation on iceberg Tables
We have created iceberg tables in S3 bucket through EMR notebook with glue catalog enabled.we want to implement lake formation access based control on that iceberg tables.when iam gave full access i can query that iceberg table from athana but once implemented data filter ,instead of showing filter accessable rows,it is giving error as Access denied. The same working for me with if i directly create external table on top of CSV or PARQUET file. Any idea whether Lake formation will work on iceberg tables?
Athena Query Generic User Error
I am trying to query data in my own account and am getting the following error: ```GENERIC_USER_ERROR: Cannot execute query. User in account <account id> is not authorized for any columns in table <table_name>``` Has anyone experienced this before, or otherwise have any insight as to why this may be occurring? It is a lake formation governed table
Querying Latest Available Partition
I am building an ETL pipeline using primarily state machines, Athena, and the Glue catalog. In general things work in the following way: 1. A table, partitioned by "version", exists in the Glue Catalog. The table represents the output destination of some ETL process. 2. A step function (managed by some other process) executes "INSERT INTO" athena queries. The step function supplies a "version" that is used as part of the "INSERT INTO" query so that new data can be appended into the table defined in (1). The table contains all "versions" - it's a historical table that grows over time. My question is: What is a good way of exposing a view/table that allows someone (or something) to query only the latest "version" partition for a given historically partitioned table? I've looked into other table types AWS offers, including Governed tables and Iceberg tables. Each seems to have some incompatibility with our existing or planned future architecture: 1. Governed tables do not support writes via athena insert queries. Only Glue ETL/Spark seems to be supported at the moment. 2. Iceberg tables do not support Lake Formation data filters (which we'd like to use in the future to control data access) 3. Iceberg tables also seem to have poor performance. Anecdotally, it can take several seconds to insert a very small handful of rows to a given iceberg table. I'd worry about future performance when we want to insert a million rows. Any guidance would be appreciated!
Lake Formation (Database Snapshot) blueprint creates a workflow that does not work. "Internal service error: Invalid Input Provided"
I have followed the guide in this blog post https://aws.amazon.com/blogs/big-data/integrating-aws-lake-formation-with-amazon-rds-for-sql-server/ I am using an RDS mysql instance in the eu-west-1 region. The blueprint and subsequently the workflows, jobs and crawlers are created sucessfully. The worklow runs the crawler jobs successfully and the tables in Glue are populated. But then the workflow fails when reaching the ETL jobs with the error message "failed to execute with exception Internal service error: Invalid Input Provided". I have looked at the job definition itself and the code behind the script and the input seems to match. EDIT: I found out where the error is coming from. The ETL job created by Lake Formation is using "Glue 1.0" and that simply doesn't work. It works running an empty spark job using Glue 3.0 and the same job using Glue 1.0 fails with the error "Internal service error: Invalid Input Provided".
Data Mesh on AWS Lake Formation
Hi, I'm building a data mesh in AWS Lake Formation. The idea is to have 4 accounts: account 0: main account account 1: central data governance account 2: data producer account 3: data consumer I have been looking for information about how to implement the mesh in AWS and I'm following some tutorials that are very similar to what I'm doing: https://catalog.us-east-1.prod.workshops.aws/workshops/78572df7-d2ee-4f78-b698-7cafdb55135d/en-US/lakeformation-basics/cross-account-data-mesh https://aws.amazon.com/blogs/big-data/design-a-data-mesh-architecture-using-aws-lake-formation-and-aws-glue/ https://aws.amazon.com/blogs/big-data/build-a-data-sharing-workflow-with-aws-lake-formation-for-your-data-mesh/ However, after having created the bucket and uploaded some csv data to it (in the producer account), I don't know if I have to register first to the glue catalog in the producer account or I just do it in the lake formation like it says here: https://catalog.us-east-1.prod.workshops.aws/workshops/78572df7-d2ee-4f78-b698-7cafdb55135d/en-US/lakeformation-basics/databases (is this dependant on if one uses glue permissions or lake formation permissions in lake formation configuration?) Indeed I have done it first the database and the table in glue and then when I go to lake formation in the database and table sections the database and table created from glue appear there without doing anything. Even if I disable there the options: "Use only IAM access control for new databases" "Use only IAM access control for new tables in new databases" both the database and table appear there do you know if glue and lake formations share the data catalog? and I'm doing it correctly? thanks, John
Lake Formation federated access
Could you please confirm that AWS Lake Formation federated access is eligible for any IDP ( SAML 2.0 based ) ? IHAC need to query Dataset from Qlik Sense via Athena ( with LakeFormation Permission access ) , Qlik is federated, is it possible to use the AWS Lake Formation Federated access from Qlik ? Tx for your support
Usage of enforcedMatches in Find Incremental Matches
There is a field with the key: **enforcedMatches** in FindIncrementalMatches function of AWS Glue 2.0 It is mentioned in [AWS Docs | FindIncrementalMatches.apply](https://docs.aws.amazon.com/glue/latest/dg/glue-etl-scala-apis-glue-ml-findincrementalmatches.html) but there is no implementation example for this. It expects a data frame but how does it effect the output? ### Expectation I passed the existing source of data hoping that it will re-use the match_id from the previous result and enforce the matches but that did not work. The prior thread is: [AWS Question on FindincrementalMatches](https://repost.aws/questions/QUivX_GVKSTSiCWJlwIQ5gbw/enforce-find-incremental-matches-in-aws-glue-to-use-existing-match-id) ### Question What is the purpose and usage of **enforcedMatches** in FindIncrementalMatches?
Enforce FindIncrementalMatches in AWS Glue to use existing match_id
The use case is that we have an incremental source of data from which we need to identify the matching records. For this purpose, we are running Find Matches using AWS Glue 2.0. ### Result after Find Matches is Run When I run FindMatches on the initial source, the following result is generated against the source. Note the **match_id** generated for each record ![Find Matches Result](https://repost.aws/media/postImages/original/IMDRqUKyW8Qo2lexw6o4mS0Q) ### Result after Find Incremental Matches is Run When I run FindIncrementalMatches using the above result of Find Matches as the existing source, the following result is generated with completely different Match IDs: ![Find Incremental Matches Result](https://repost.aws/media/postImages/original/IMDp-IyxegQ9-4nFAZ5rKZgg) ### Question Is there a way to enforce FindIncrementalMatches in AWS Glue to use existing match_id while processing matches from an incremental source? ### Important links We've followed the steps described in the following link: 1. [AWS Blog - Incremental Data Matching](https://aws.amazon.com/blogs/big-data/incremental-data-matching-using-aws-lake-formation/)
Show existing permissions while granting/modifying Lake Formation permissions
Hi, This is a request for enhancement of the current behavior of "Permission granting" mechanism of the Lake Formation (Navigation: Lake Formation -> Permissions -> Data Lake Permissions). Currently while modifying the permission for a principal (User or Role), the system shows a fresh screen as if no permissions has been granted so far. Where as we might be modifying existing set of permissions on tables and columns. It will be very helpful and informative if the existing permissions are somehow displayed on the "grant permissions" screen (Navigation: Lake Formation -> Permissions -> Data Lake Permissions). Thank you for the time Suresh Babu
Unable to Run DBT Glue Job in eu-west-2 in DEV Account
Dear all, We are looking to implement Glue ETL jobs into our Data Pipeline processing. To achieve this we are using the DBT Glue Adaptor https://github.com/aws-samples/dbt-glue and following this AWS tutorial demo to get started - https://aws.amazon.com/blogs/big-data/build-your-data-pipeline-in-your-aws-modern-data-platform-using-aws-lake-formation-aws-glue-and-dbt-core/ We're able to successfully deploy and run this tutorial in regions - us-west-1, us-east-1, eu-west-1, eu-central-1. We are also able to deploy this in eu-west-2 in our STAGING account. However, we are NOT able to run it in eu-west-2 (London) in our DEV Account. Please note we use eu-west-2 as our primary region and have previously used Lake Formation for proof of concept projects. So I'm not sure if this has something to do with it. We've tried everything including deleting all stacks and IAM Permissions/Roles/Policies related to Glue, Lake Formation in eu-west-2 and it doesn't seem to work. We don't think its a problem with the dbt glue connector as this works correctly across other regions including eu-west-2 in our STAGING account. Can anyone kindly advise how to debug or resolve this? Thanks in advance, Eric Sales De Andrade