- Neueste
- Die meisten Stimmen
- Die meisten Kommentare
If you are trying to replace values with NULL in a Redshift table using Glue job, you can use the replace_null() transformation function in Glue. Here's an example of how to do it:
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
## Create a DynamicFrame using a Glue Catalog table
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(database = "database_name", table_name = "table_name")
## Replace values with NULL using replace_null() function
dynamic_frame = ApplyMapping.apply(frame = dynamic_frame, mappings = [("column_name", "string", "column_name", "string"), ("column_with_value_to_replace", "string", "column_with_value_to_replace", "string")])
dynamic_frame = Map.apply(frame = dynamic_frame, f = lambda x: {"column_name": x["column_name"], "column_with_value_to_replace": replace_null(x["column_with_value_to_replace"])})
## Convert DynamicFrame back to DataFrame and write to Redshift
data_frame = dynamic_frame.toDF()
data_frame.write.format("com.databricks.spark.redshift").option("url", "jdbc:redshift://<redshift_endpoint>:<port>/<database>?user=<user>&password=<password>").option("dbtable", "table_name").option("tempdir", "s3://<s3_bucket_path>/temp").mode("overwrite").save()
In the above code, you can see that we use the replace_null() function to replace any value in the "column_with_value_to_replace" column with NULL. You can add more columns as needed in the mappings list. The Map function is used to apply the replace_null() function to each row.
Make sure to replace the placeholders in the option function with your own values for the Redshift endpoint, port, database, user, password, and S3 bucket path.
If your Glue job is still dropping null records after implementing this, please provide more details on your Glue job and the data you are trying to load.
Relevanter Inhalt
- AWS OFFICIALAktualisiert vor einem Jahr
- AWS OFFICIALAktualisiert vor 2 Jahren
- AWS OFFICIALAktualisiert vor einem Jahr
You mean you are uploading rows to Redshift from Glue and then in the table they have a empty strings instead of NULL values?