Is there an option in AWS Glue DataBrew to keep only the required columns and delete the rest?

0

I have many datasets in AWS Glue with hundreds of columns, but I need only a few of those columns for feature selection. I don't see an option in AWS Glue DataBrew for keeping the required columns and deleting the rest. Is there a feature or capability to achieve this option?

profile pictureAWS
EXPERTE
pechung
gefragt vor 3 Jahren1263 Aufrufe
1 Antwort
0
Akzeptierte Antwort

You can use the text box in the Columns tab to view the required columns in the AWS Glue DataBrew console. You can search for the columns, select the required columns, and deselect the rest.

To remove some of the columns from your final dataset, you need to apply the delete column recipe that doesn't have the global filter/search functionality.

To delete multiple columns, you can download the recipe as a JSON file, add your columns in the DELETE step, and then upload the recipe.

Example:

{
"Action": {
  "Operation": "DELETE",
  "Parameters": {
    "sourceColumns": "[\"victory_status\",\"winner\",\"turns\"]"
  }
}
AWS
beantwortet vor 3 Jahren
  • Is this still true? I don't see the sourceColumns parameter documented anywhere, only sourceColumn

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen