Error: "Couldn't cast; because columns don't match" on sagemaker jumpstart

0

I was instruction finetuning on sagemaker jumpstart. I provided a template.json file, describing the template used, in the train directory alongside the dataset on S3 as directed in the documentation. However, when I run the training I get the error: Couldn't cast because columns don't match. Am I configuring the template.json wrong? Please help. Thanks

Exact error:

We encountered an error while training the model on your data. AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage "raise ValueError(f"Couldn't cast\n{table.schema}\nto\n{features}\nbecause column names don't match")
 ValueError: Couldn't cast
 question: string
 context: string
 response: string
 to
 {'prompt': Value(dtype='string', id=None), 'completion': Value(dtype='string', id=None)}
 because column names don't match
 
 The above exception was the direct cause of the following exception
 Traceback (most recent call last)
 File "/opt/ml/code/llama_finetuning.py", line 301, in <module>
 fire.Fire(main)
 File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
 component_trace = _Fire(component, args, parsed_flag_args, context, name)
 File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
 component, remaining_args = _CallAndUpdateTrace(
 File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
 component = fn(*varargs, **kwargs)
 File "/opt/ml/code/llama_finetuning.py", line

Custom template

## template.json
{
  "prompt": "Below is a dialog between three people. The first person is the system, the second person is the user and the third person is the assistant. The system starts the dialog with a context and none other input for the rest of the dialog. The user asks a question and the assistant responds. Write a response that appropriately completes the conversation.\n\n### Dialog:\n[\n\t{\n\t\t\"role\": \"system\",\n\t\t\"content\": \"{context}\"\n\t},\n\t{\n\t\t\"role\": \"user\",\n\t\t\"content\": \"{question}\"\n\t},\n\t{\n\t\t\"role\": \"assistant\",\n\t\t\"content\": \"{response}\"\n\t}\n]",
  "completion": "{response}"
}

Dataset

{
  "question": "What is the color of water",
  "context": "",
  "response": "Water is colorless"
}

{
  "question": "What do humans inhale",
  "context": "",
  "response": "Humans inhale oxygen and exhale carbon dioxide"
}
Geeb
asked 8 months ago469 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions