I am trying to train a linear learner model in Sagemaker. My training set is 422 rows split into 4 files on AWS S3. The mini-batch size that I set is 50.
I keep on getting this error in Sagemaker.
Customer Error: No training data processed. Either the training
channel is empty or the mini-batch size is too high. Verify that
training data contains non-empty files and the mini-batch size is less
than the number of records per training host.
I am using this InputDataConfig
InputDataConfig=\[
{
'ChannelName': 'train',
'DataSource': {
'S3DataSource': {
'S3DataType': 'S3Prefix',
'S3Uri': 's3://MY_S3_BUCKET/REST_OF_PREFIX/exported/',
'S3DataDistributionType': 'FullyReplicated'
}
},
'ContentType': 'text/csv',
'CompressionType': 'Gzip'
}
],
I am not sure what I am doing wrong here. I tried increasing the number of records to 5547495 split across 6 files. The same error. That makes me think that somehow the config itself has something missing. Due to which it seems to think training channel is just not present. I tried changing 'train' to 'training' as that is what the erorr message is saying. But then I got
Customer Error: Unable to initialize the algorithm. Failed to validate
input data configuration. (caused by ValidationError)
Caused by: {u'training': {u'TrainingInputMode': u'Pipe',
u'ContentType': u'text/csv', u'RecordWrapperType': u'None',
u'S3DistributionType': u'FullyReplicated'}} is not valid under any of
the given schemas
I went back to train as that seems to be what is needed. But what am I doing wrong with that?
Edited by: anshbansal on Jun 3, 2019 12:06 AM