1 Answer
- Newest
- Most votes
- Most comments
0
- Tried reading the mentioned sample data from a CSV file using a Glue dynamic frame and Spark dataframe. However, in both the cases, I observed that the special character was being preserved.
df = spark.read.option("header", "true").csv("<s3-path-to-file>")
testdf.show()
+----------+
| colA|
+----------+
|Hello\John|
+----------+
datasource0 = glueContext.create_dynamic_frame_from_options("s3", {'paths': ["<s3-path-to-file>"]}, format="csv",transformation_ctx = "datasource0")
datasource0.toDF().show()
+----------+
| col0|
+----------+
|Hello\John|
+----------+
So, please make sure your code is not doing any other processing before writing the data to MySQL.
- Created a CSV file with mentioned sample data and ran a crawler with CSV classifier (Quote symbol - ""). Crawler was able to create the table and I was able to query the data from Athena.
SELECT * FROM "default"."csv" limit 10;
col1 col3 col0 col1
-------------------------------------------------
aaa\blah blah blah\blah blah\ aaa\blah blah blah\blah blah\ aaa\blah blah blah\blah blah\ aaa\blah blah blah\blah blah\
aaa\blah blah blah\blah blah\ aaa\blah blah blah\blah blah\ aaa\blah blah blah\blah blah\ aaa\blah blah blah\blah blah\
Because you mentioned crawler produced blank table, make sure your CSV file has at least two columns and two rows of data as per documentation, for the crawler to be able to determine a table.
For any other questions specific to a job/crawler in your account, please reach out to AWS Premium Support.
Relevant content
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
Thank you for your answer, in my case I don't read the file directly with from_options, I just create a crawler point to s3 path all files have more than 2 columns and to load data I create a new job and choose the source crawled table and destination crawled table.