Is there an option in AWS Glue DataBrew to keep only the required columns and delete the rest?

0

I have many datasets in AWS Glue with hundreds of columns, but I need only a few of those columns for feature selection. I don't see an option in AWS Glue DataBrew for keeping the required columns and deleting the rest. Is there a feature or capability to achieve this option?

profile pictureAWS
專家
pechung
已提問 3 年前檢視次數 1264 次
1 個回答
0
已接受的答案

You can use the text box in the Columns tab to view the required columns in the AWS Glue DataBrew console. You can search for the columns, select the required columns, and deselect the rest.

To remove some of the columns from your final dataset, you need to apply the delete column recipe that doesn't have the global filter/search functionality.

To delete multiple columns, you can download the recipe as a JSON file, add your columns in the DELETE step, and then upload the recipe.

Example:

{
"Action": {
  "Operation": "DELETE",
  "Parameters": {
    "sourceColumns": "[\"victory_status\",\"winner\",\"turns\"]"
  }
}
AWS
已回答 3 年前
  • Is this still true? I don't see the sourceColumns parameter documented anywhere, only sourceColumn

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南