- Le plus récent
- Le plus de votes
- La plupart des commentaires
You might consider reviewing your 'script.py' entry point. There could be a variety of reasons for a training job to fail but the most likely, I can see, from the description and output would be related to "where" the model artifacts were written to within your script.
The SageMaker Github examples contain has an example of using a RandomForestRegressor in a script - https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-script-mode/sagemaker-script-mode.ipynb
I'm sharing this example because if you refer to the Scikit-learn section, you'll find the "train_deploy_scikitlearn_without_dependencies.py" script is referenced and the model is dumped to the model_dir: joblib.dump(model, os.path.join(args.model_dir, "model.joblib"))
. If we were to change that to some arbitrary location in the script then the example training job would fail with an AlgorithmError: framework error as well. As long as the 10 second training is expected then I see the output location as a likely cause.
For more details on this you can refer to the following two resources:
- How Amazon SageMaker Processes Training Output - https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-output.html
- Using the SageMaker Python SDK - https://sagemaker.readthedocs.io/en/stable/overview.html
In the first resource, you'll find that your algorithm should write all final model artifacts to opt/ml/model
. In the second resource, you'll find more information on proper use of the SageMaker Python SDK and various implementations.
Contenus pertinents
- demandé il y a 6 mois
- demandé il y a un an
- demandé il y a un mois
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 8 mois
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans
I found out that the error results from no files found in the SM_CHANNEL_TRAIN and SM_CHANNEL_TEST. I don't understand why because when I run the script using ! python script_rf.py --model-dir ./
--train ./
--test ./ \ it works fine.
script looks like this:
def model_fn(model_dir): clf = joblib.load(os.path.join(model_dir, "model.joblib")) return clf
if name =='main': parser.add_argument('--max_depth', type=int, default=2) parser.add_argument('--n_estimators', type=int, default=100) parser.add_argument('--random_state', type=int, default=0