we are using crawler and custom classifier to parse fixed length file.
As part of our requirement, need to extract input file name.
Input files stores into S3 Folder
S3 Folder ----> Crawler (custom classifier) ----> data catalog<-------AWS Glue job (ETL) ---> Store into S3
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql.functions import input_file_name
from awsglue.dynamicframe import DynamicFrame
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(
database="test-poc",
table_name="test-raw",
transformation_ctx="datasource0",
groupFiles='none',
)
Create a DataFrame and add a new column in the containing the file name of every DataRecord
dataframe1 = datasource0.toDF().withColumn("filename", input_file_name())
dataframe1.show()
input_file_name is returning empty string