Training error for LLama2 finetuning

0

I am trying to instruction finetune my llama2 model on sagemaker jumpstart, but I keep getting errors like this:

We encountered an error while training the model on your data. AlgorithmError: ExecuteUserScriptError: ExitCode 1 ErrorMessage "raise TypeError( TypeError: Invalid function argument. Expected parameter tensor to be of type torch.Tensor. Traceback (most recent call last) File "/opt/ml/code/llama_finetuning.py", line 335, in <module> fire.Fire(main)

This is an example of my training.jsonl:

{"input": "1/8 to 1/4 teaspoon of cinnamon", "output": "{\"templateString\": \"1/8 to 1/4 teaspoon of cinnamon\", \"ingredient\": \"cinnamon\", \"quantityFrom\": 0.125, \"quantityTo\": 0.25, \"quantityType\": \"range\", \"unit\": \"teaspoon\"}"}

This is how my template.json files looks like:

{
    "prompt": "### Input:\n{input}\n\n",
    "completion": " {output}"
}
AWS
aykazmi
質問済み 4ヶ月前2014ビュー
1回答
1
承認された回答

Hi Ayman,

Try increasing the number of training data or set max_seq_len hyper-parameter to be small (For example a value of 128) to see if the error keeps persisting.

The way that the computation works is that all text is processed, combined and then split into sample (each of length equal to max input length). Then, the examples are batched as per the batch size. If you are using 8 GPU machines, you need to have at least 8 non-empty batches. That is, you either need to have large enough data such that there are 8 batches or you need to decrease the batch size or you need to reduce the max input length.

AWS
autrin
回答済み 4ヶ月前
profile picture
エキスパート
レビュー済み 4ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ