Incremental training with custom keras code in script mode

0

Hi,
I'm moving my first steps in sagemaker. I'm using script mode to train a classification algorithm. Training is fine, however I'm not able to do incremental training. I want to train again the same model with new data. Here what I did. This is my script

import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker import get_execution_role

bucket = 'sagemaker-blablabla'
train_data = 's3://{}/{}'.format(bucket,'train')
validation_data = 's3://{}/{}'.format(bucket,'test')

s3_output_location = 's3://{}'.format(bucket)

tf_estimator = TensorFlow(entry_point='main.py', 
                          role=get_execution_role(),
                          train_instance_count=1, 
                          train_instance_type='ml.p2.xlarge',
                          framework_version='1.12', 
                          py_version='py3',
                          output_path=s3_output_location)

inputs = {'train': train_data, 'test': validation_data}
tf_estimator.fit(inputs)

The entry point is my custom keras code, which I adapted to receive arguments from the script.
Now the training is successfully completed and I have in my s3 bucket the model.tar.gz. I want to train again, but it's not clear to me how to do it.. I tried this

trained_model = 's3://sagemaker-blablabla/sagemaker-tensorflow-scriptmode-2019-11-27-12-01-42-300/output/model.tar.gz'

tf_estimator = sagemaker.estimator.Estimator(image_name='blablabla-west-1.amazonaws.com/sagemaker-tensorflow-scriptmode:1.12-gpu-py3', 
                                              role=get_execution_role(),
                                              train_instance_count=1, 
                                              train_instance_type='ml.p2.xlarge',
                                              output_path=s3_output_location,
                                              model_uri = trained_model)

inputs = {'train': train_data, 'test': validation_data}

tf_estimator.fit(inputs)

Doesn't work.. first I don't know how to retrieve the training image name (for this I looked for it in the aws console, but I guess there should be a smarter solution), second this code throws an exception about the entry point.. but it is my understanding that I shouldn't need it when I do incremental learning with a ready image..
I'm surely missing something important, any help? Thank you

rokk07
gefragt vor 4 Jahren408 Aufrufe
1 Antwort
0

Found by myself, here the answer: https://stackoverflow.com/a/59096925/4267439

rokk07
beantwortet vor 4 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen