Questions tagged with Amazon SageMaker Model Training
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
Hello,
When training an HPO using the Sagemaker SDK It's much slower than training on Sagemaker jupyter notebook -
Both variants have the same:
1. Hyperparameters (the same Model)
2. Data - Train /...
2
answers
0
votes
317
views
asked a year agolg...
Hi!
I have a training set of images for which I manually created a manifest file respecting the format required to train a Rekognition Custom Labels model for object detection. Both the images and...
1
answers
0
votes
310
views
asked a year agolg...
When trying to start a training job in sagemaker using AlgorithmEstimator (by inputting the algorithm arn), I get an error saying that the Algorithm arn does not exist. I have tried this with...
1
answers
0
votes
235
views
asked a year agolg...
I used the training script from [https://sagemaker.readthedocs.io/en/stable/frameworks/xgboost/using_xgboost.html](here),and trying to train the model. And the here is the code I used for configuring...
0
answers
0
votes
201
views
asked a year agolg...
Im using sagemaker for train the data
It has pre-trained model
“tensorflow-od1-ssd-resnet50-v1-fpn-640x640-coco17-tpu-8”
**Create the SageMaker model instance. Note that we need to pass Predictor...
0
answers
0
votes
132
views
asked a year agolg...
I would like to fine-tune large language models (starting with 10+B parameters) on Sagemaker.
Since we are working with Pytorch and Lightning the idea would be to use DeepSpeed in combination with...
1
answers
1
votes
697
views
asked a year agolg...
I am trying to train GPT2-large model on Sagemaker Studio -- using a 'ml.g4dn.2xlarge instance. The training file is very small ( 13 kb). It gives the following error:
ExitCode 1
ErrorMessage...
1
answers
0
votes
495
views
asked a year agolg...
I'm using the functionality of `sagemaker.experiments`, where a run object is defined for tracking a job.
For logging of metrics, I'm using the `log_metric()` method of the run object, with name,...
2
answers
0
votes
227
views
asked a year agolg...
I have trained a timeseries model on SageMaker Canvas through 'Standard Build' and made predictions on it. But I am unable to see the trained timeseries model as an AutoMLJob in SageMaker Studio. Is...
1
answers
0
votes
381
views
asked a year agolg...
Hi!
As of a few days ago, the "**Uploading**" phase of my SageMaker training jobs jumped from **2 minutes to 3+ hours.** The size of my artifacts did not change, but I did enable check-pointing...
0
answers
0
votes
71
views
asked a year agolg...
A pipeline train step saves a custom json file in the output path, set in the estimator's `output_path` param, as seen below:
```
estimator = TensorFlow(
entry_point=code_entry,
...
2
answers
0
votes
852
views
asked a year agolg...
I had made my custom training image so It can be conducted through CreateTrainingJob, not sagemaker training took kit (requiring "ContainerEntrypoint" option).
But when I'm trying to run...
1
answers
0
votes
281
views
asked a year agolg...