1 Answer
- Newest
- Most votes
- Most comments
0
Hi,
My first guess would be classifiers:
https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html
Haven't tried it myself but have a look at this:
For the given JSON structure, you would likely want to create a custom JSON classifier in AWS Glue. Here's the configuration you'd use:
- Classification:
json
(This is the default for JSON classifiers) - JSON Path:
$._1[*]
Explanation:
$
: Represents the root of the JSON document._1
: Refers to the top-level key in your JSON.[*]
: The wildcard indicates that the classifier should process all elements within the array under the_1
key.
Why a custom classifier?
- The built-in JSON classifier might not correctly infer the schema due to the underscore in the top-level key (
_1
). - The custom classifier with the specified JSON path allows you to explicitly target the data you want to extract.
Additional Considerations:
- Data Types: By default, the crawler will infer the data types for your columns (e.g.,
AccountId
as string,Goods
as double). If you want to enforce specific types, you can modify the schema in the AWS Glue Data Catalog after the initial crawl. - Nested Structures: If your JSON had more complex nested structures, you would need to adjust the JSON path accordingly.
How to create the custom classifier:
- Go to the AWS Glue console.
- Navigate to "Classifiers" and click "Add classifier".
- Select "JSON" as the classifier type.
- Provide a name for your classifier (e.g., "MyJsonClassifier").
- Enter the JSON path as described above.
- Click "Create classifier".
How to use the classifier in your crawler:
- In your crawler configuration, under "Classifiers", add your newly created custom classifier (e.g., "MyJsonClassifier").
- Ensure that your custom classifier is listed before the built-in JSON classifier. This way, the crawler will use your custom logic first.
answered 22 days ago
Relevant content
- asked 3 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 2 years ago