1 個回答
- 最新
- 最多得票
- 最多評論
0
If you're crawling the files with Glue to add them to the Glue catalog, you can set this table property:
skip.header.line.count=1
I set that property manually in the console and was able to query successfully in Athena with header rows ignored. You can also set the table property via the API or in a CloudFormation template.
This also works if you use Glue's Spark libraries to query the table using the catalog:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
glueContext = GlueContext(SparkContext.getOrCreate())
df = glueContext.create_dynamic_frame.from_catalog(
database = "default",
table_name = "headertest_headertest")
df.printSchema()
df.toDF().show()
If you are reading the CSV directly into a dynamic frame, you can use the withHeader connection option:
dfs3 = glueContext.create_dynamic_frame_from_options(connection_type = "s3", connection_options = {"paths": ["s3://rd-mb3/headertest/"]}, format="csv", format_options = {"withHeader": True})
dfs3.toDF().show()
相關內容
- 已提問 6 個月前
- AWS 官方已更新 2 年前
- AWS 官方已更新 1 年前
- AWS 官方已更新 9 個月前
- AWS 官方已更新 2 年前