Durch die Nutzung von AWS re:Post stimmt du den AWS re:Post Nutzungsbedingungen

how to create a custom model tar file via create model step in sagemaker pipeline step?

0

based on the sample/docs provided here => https://www.philschmid.de/mlops-sagemaker-huggingface-transformers, I am fine tuning a hugging face distilbert model in sagemaker studio via pipeline. this example works. when the model is created , i specify entry_point = 'predict.py', and source_dir = 'script' (see below ) , which creates following directory structure in the model.tar file. there are other files like tokenizer.json , tokenizer_config.json. , is it possible put these files into another folder next to my script folder during the model create/package step ? these files , i assume are downloaded from hugging face along with the pytorch model file and are put at the root of the tar model file generated.

model directory structure

model.tar.gz/
|- pytorch_model.bin
|- tokenizer.json
|- tokenizer_config.json
|- special_tokens_map.json
|- ...
|- script/
   |- predict.py
   |- requirements.txt 
# Create Model
model = Model(
    entry_point = 'predict.py', 
    source_dir = 'script'
huggingface_estimator = HuggingFace(entry_point='train.py',
                            source_dir='./scripts',
                            instance_type='ml.p3.2xlarge',
                            instance_count=1,
                            role=role,
                            transformers_version='4.6',
                            pytorch_version='1.7',
                            py_version='py36',
                            hyperparameters = hyperparameters)

step_train = TrainingStep(
    name="TrainHuggingFaceModel",
    estimator=huggingface_estimator,
    inputs={
        "train": TrainingInput( ...  ),
        "test": TrainingInput( ...     ),
    },
....
)

# Create Model
model = Model(
    entry_point = 'predict.py', 
    source_dir = 'script'
    image_uri=image_uri,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=pipeline_session,
    role=role,
    ....
)

step_create_model = ModelStep(
    name="CreateModel",
    step_args=model.create("ml.m4.large"),
)
gefragt vor 2 Jahren853 Aufrufe
1 Antwort
0

You are correct these JSON files come with distilbert-base-uncased - https://huggingface.co/distilbert-base-uncased/tree/main

Im not familiar with the model training process but It might be that these were used to train the model initially - but are then not required later for a deployed endpoint - I can see from https://github.com/huggingface/transformers/blob/3f936df66287f557c6528912a9a68d7850913b9b/src/transformers/models/bert/tokenization_bert_fast.py - distillation process where they are referenced.

These files are used here and it might be possible to train from another location initially

profile pictureAWS
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen