AWS Lake Formation Data Filter Issue

0

True must be entered in the field "Row filter expression" (obligatory), so there are performance issues due to True in the condition field. Does anyone have a solution to run the Lake Formation Data Filter without having to enter True? If not, is there a solution to improve the runtime? The average of 10 run times with a database filter and the True expression is 5.443 seconds and the average without the True expression and a database filter is 984 ms.

gefragt vor 2 Jahren557 Aufrufe
1 Antwort
0

If you do not need Row-level filtration, you can achieve your goal by including/excluding columns via LF-tags access control (https://docs.aws.amazon.com/lake-formation/latest/dg/TBAC-security.html) .

As an example, please check the code below:

  1. Create a Lake Formation tag "Confidential"
  2. Assign Confidential True/False to the table columns
  3. Grant to user SELECT privileges to object which has Confidential=False tag assigned (or inherited)

Alternatively, you can configure LF-TBAC via console interface. I find this article especially convenient: https://medium.com/towards-aws/lake-formation-data-security-and-data-governance-with-lf-tbac-79b6f44a50e9

Please let me know if this answer provided you with some benefit by marking it as accepted Regards Stas

# Create a Lake Formation client
lakeformation_client = boto3.client('lakeformation')

# Define the tag key and values
tag_key = 'Confidential'
tag_values = ['True', 'False']


# Create the tag if not exists
try:
    response = lakeformation_client.get_lf_tag(
    CatalogId=account_id,
    TagKey=tag_key
    )
    print(f"Tag {tag_key} already exists.")
except lakeformation_client.exceptions.EntityNotFoundException:
    response = lakeformation_client.create_lf_tag(
        CatalogId=account_id,
        TagKey=tag_key,
        TagValues=tag_values
    )
    if response['ResponseMetadata']['HTTPStatusCode'] == 200:
        print(f"Tag {tag_key} has been created.")
    else:
        print(f"Tag {tag_key} has not been created. See response:")
        
## Assign tags to the table and columns
# Assign `Confidential=False` to the table ( to be used as a default tag)
#Assign `Confidential=True` to the columns with "Confidential" in the name
# Create a Lake Formation client
lakeformation_client = boto3.client('lakeformation')

# Assign the tag to the table
response = lakeformation_client.add_lf_tags_to_resource(
    CatalogId=account_id,
    LFTags=[
        {
            'CatalogId': account_id,
            'TagKey': "Confidential",
            'TagValues': ["False"]
        }
    ],
    Resource={
        'Table': {
            'CatalogId': account_id,
            'DatabaseName': DATABASE,
            'Name': TABLE
        }
    }
)
# get names of all columns from teh table
columns = wr.catalog.get_table_types(database=DATABASE, table=TABLE)
columns = list(columns.keys())
# gen name for all confidential columns
columns = [column for column in columns if column.startswith("confidential")]
# assign the tag to the columns
response = lakeformation_client.add_lf_tags_to_resource(
    CatalogId=account_id,
    LFTags=[
        {
            'CatalogId': account_id,
            'TagKey': "Confidential",
            'TagValues': ["True"]
        }
    ],
    Resource={
        'TableWithColumns': {
            'CatalogId': account_id,
            'DatabaseName': DATABASE,
            'Name': TABLE,
            "ColumnNames": columns
        }
    }
)

if not USE_NAMED_DATA_CELL_FILTER:
    print(f"Assigning LF-TAG to the user {USER_NAME}")
    # Create a Lake Formation client
    lakeformation_client = boto3.client('lakeformation')

    # Assign LF-TAG to the user USER_NAME

    response = lakeformation_client.grant_permissions(
        CatalogId=account_id,
        Principal={
            'DataLakePrincipalIdentifier': f'arn:aws:iam::{account_id}:user/{USER_NAME}'
        },
        Resource={
            "LFTagPolicy": {
                "CatalogId": account_id,
                "ResourceType": "DATABASE",
                "Expression": [
                    {
                        "TagKey": "Confidential",
                        "TagValues": ["False"]		
                    }
                ]
            }
        },
        Permissions=[
            "DESCRIBE",
        ],
        PermissionsWithGrantOption=[

        ]
    )
    if not response['ResponseMetadata']['HTTPStatusCode'] == 200:
        print(f"Error: ")
        pprint.pprint(response)

    response = lakeformation_client.grant_permissions(
        CatalogId=account_id,
        Principal={
            'DataLakePrincipalIdentifier': f'arn:aws:iam::{account_id}:user/{USER_NAME}'
        },
        Resource={
            "LFTagPolicy": {
                "CatalogId": account_id,
                "ResourceType": "TABLE",
                "Expression": [
                    {
                        "TagKey": "Confidential",
                        "TagValues": ["False"]		
                    }
                ]
            }
        },
        Permissions=[
            "SELECT",
        ],
        PermissionsWithGrantOption=[

        ]
    )
    if not response['ResponseMetadata']['HTTPStatusCode'] == 200:
        print(f"Error: ")
        pprint.pprint(response)
else:
    print(f"User {USER_NAME} permissions are set using named data cell filter {FILTER}")


beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen