AWS Lake Formation Data Filter Issue

0

True must be entered in the field "Row filter expression" (obligatory), so there are performance issues due to True in the condition field. Does anyone have a solution to run the Lake Formation Data Filter without having to enter True? If not, is there a solution to improve the runtime? The average of 10 run times with a database filter and the True expression is 5.443 seconds and the average without the True expression and a database filter is 984 ms.

asked 2 years ago687 views
1 Answer
0

If you do not need Row-level filtration, you can achieve your goal by including/excluding columns via LF-tags access control (https://docs.aws.amazon.com/lake-formation/latest/dg/TBAC-security.html) .

As an example, please check the code below:

  1. Create a Lake Formation tag "Confidential"
  2. Assign Confidential True/False to the table columns
  3. Grant to user SELECT privileges to object which has Confidential=False tag assigned (or inherited)

Alternatively, you can configure LF-TBAC via console interface. I find this article especially convenient: https://medium.com/towards-aws/lake-formation-data-security-and-data-governance-with-lf-tbac-79b6f44a50e9

Please let me know if this answer provided you with some benefit by marking it as accepted Regards Stas

# Create a Lake Formation client
lakeformation_client = boto3.client('lakeformation')

# Define the tag key and values
tag_key = 'Confidential'
tag_values = ['True', 'False']


# Create the tag if not exists
try:
    response = lakeformation_client.get_lf_tag(
    CatalogId=account_id,
    TagKey=tag_key
    )
    print(f"Tag {tag_key} already exists.")
except lakeformation_client.exceptions.EntityNotFoundException:
    response = lakeformation_client.create_lf_tag(
        CatalogId=account_id,
        TagKey=tag_key,
        TagValues=tag_values
    )
    if response['ResponseMetadata']['HTTPStatusCode'] == 200:
        print(f"Tag {tag_key} has been created.")
    else:
        print(f"Tag {tag_key} has not been created. See response:")
        
## Assign tags to the table and columns
# Assign `Confidential=False` to the table ( to be used as a default tag)
#Assign `Confidential=True` to the columns with "Confidential" in the name
# Create a Lake Formation client
lakeformation_client = boto3.client('lakeformation')

# Assign the tag to the table
response = lakeformation_client.add_lf_tags_to_resource(
    CatalogId=account_id,
    LFTags=[
        {
            'CatalogId': account_id,
            'TagKey': "Confidential",
            'TagValues': ["False"]
        }
    ],
    Resource={
        'Table': {
            'CatalogId': account_id,
            'DatabaseName': DATABASE,
            'Name': TABLE
        }
    }
)
# get names of all columns from teh table
columns = wr.catalog.get_table_types(database=DATABASE, table=TABLE)
columns = list(columns.keys())
# gen name for all confidential columns
columns = [column for column in columns if column.startswith("confidential")]
# assign the tag to the columns
response = lakeformation_client.add_lf_tags_to_resource(
    CatalogId=account_id,
    LFTags=[
        {
            'CatalogId': account_id,
            'TagKey': "Confidential",
            'TagValues': ["True"]
        }
    ],
    Resource={
        'TableWithColumns': {
            'CatalogId': account_id,
            'DatabaseName': DATABASE,
            'Name': TABLE,
            "ColumnNames": columns
        }
    }
)

if not USE_NAMED_DATA_CELL_FILTER:
    print(f"Assigning LF-TAG to the user {USER_NAME}")
    # Create a Lake Formation client
    lakeformation_client = boto3.client('lakeformation')

    # Assign LF-TAG to the user USER_NAME

    response = lakeformation_client.grant_permissions(
        CatalogId=account_id,
        Principal={
            'DataLakePrincipalIdentifier': f'arn:aws:iam::{account_id}:user/{USER_NAME}'
        },
        Resource={
            "LFTagPolicy": {
                "CatalogId": account_id,
                "ResourceType": "DATABASE",
                "Expression": [
                    {
                        "TagKey": "Confidential",
                        "TagValues": ["False"]		
                    }
                ]
            }
        },
        Permissions=[
            "DESCRIBE",
        ],
        PermissionsWithGrantOption=[

        ]
    )
    if not response['ResponseMetadata']['HTTPStatusCode'] == 200:
        print(f"Error: ")
        pprint.pprint(response)

    response = lakeformation_client.grant_permissions(
        CatalogId=account_id,
        Principal={
            'DataLakePrincipalIdentifier': f'arn:aws:iam::{account_id}:user/{USER_NAME}'
        },
        Resource={
            "LFTagPolicy": {
                "CatalogId": account_id,
                "ResourceType": "TABLE",
                "Expression": [
                    {
                        "TagKey": "Confidential",
                        "TagValues": ["False"]		
                    }
                ]
            }
        },
        Permissions=[
            "SELECT",
        ],
        PermissionsWithGrantOption=[

        ]
    )
    if not response['ResponseMetadata']['HTTPStatusCode'] == 200:
        print(f"Error: ")
        pprint.pprint(response)
else:
    print(f"User {USER_NAME} permissions are set using named data cell filter {FILTER}")


answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions