1 個回答
- 最新
- 最多得票
- 最多評論
1
XGBoost as a framework container (v0.90+) can read parquet for training (see example notebook).
The full list of valid content types are CSV, LIBSVM, PARQUET, RECORDIO_PROTOBUF (see source)
Additionally:
Uber Petastorm for reading parquet into Tensorflow, Pytorch, and PySpark inputs.
As XGBoost accepts numpy, you can convert from PySpark to numpy/pandas using the mentioned PyArrow.
已回答 4 年前
相關內容
- 已提問 1 年前
- 已提問 6 個月前
- AWS 官方已更新 7 個月前
- AWS 官方已更新 1 年前
- AWS 官方已更新 2 年前
- AWS 官方已更新 2 年前
Hi, I'm facing the same issue but for testing. It doesn't seem that testing in Sagemaker accepts PyArrow or parquet files. Do you know if Sagemaker does accept parquet files for testing or only training? If not, whats the go around?