Training a classifier on parquet with SageMaker ?

0

Hi,

What parquet data loading logic is known to work well to train with SageMaker on parquet? ml-io? pyarrow? any examples? That would be to train a classifier, either logistic regression, XGBoost or custom TF.

  • Hi, I'm facing the same issue but for testing. It doesn't seem that testing in Sagemaker accepts PyArrow or parquet files. Do you know if Sagemaker does accept parquet files for testing or only training? If not, whats the go around?

AWS
EXPERTE
gefragt vor 4 Jahren839 Aufrufe
1 Antwort
1
Akzeptierte Antwort

XGBoost as a framework container (v0.90+) can read parquet for training (see example notebook).
The full list of valid content types are CSV, LIBSVM, PARQUET, RECORDIO_PROTOBUF (see source)

Additionally:
Uber Petastorm for reading parquet into Tensorflow, Pytorch, and PySpark inputs.
As XGBoost accepts numpy, you can convert from PySpark to numpy/pandas using the mentioned PyArrow.

AWS
beantwortet vor 4 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen