Trying to make sense of InvalidInputException

0

Full Exception:

InvalidInputException: An error occurred (InvalidInputException) when calling the CreateDatasetImportJob operation: Input csv has rows that do not conform to the dataset schema. Please ensure all required data fields are present and that they are of the type specified in the schema.

I am trying to create an interactions dataset import job in Amazon Personalize from Amazon SageMaker. My interactions schema looks like this:

interactions_schema = {"type" : "record", 
                       "name" : "Interactions",
                       "namespace" : "com.amazonaws.personalize.schema",
                       "fields" : [{"name" : "ITEM_ID",
                                    "type" : "string"},
                                   {"name" : "USER_ID",
                                    "type" : "string"},
                                   {"name" : "TIMESTAMP",
                                    "type" : "float"},
                                   {"name" : "EVENT_VALUE",
                                    "type" : "long"},
                                   {"name" : "EVENT_TYPE",
                                    "type" : "string"}],
                       "version" : "1.0"}

The only thing I can think of that could lead to the above exception is that when I save my CSV to my S3 bucket, the fields with type 'string' get saved as type 'object'. Otherwise not sure what I could be doing wrong.

Edit: I have now tried the following:

interactions_schema = {"type" : "record", 
                       "name" : "Interactions",
                       "namespace" : "com.amazonaws.personalize.schema",
                       "fields" : [{"name" : "ITEM_ID",
                                    "type" : "string"},
                                   {"name" : "USER_ID",
                                    "type" : "string"},
                                   {"name" : "TIMESTAMP",
                                    "type" : "long"},
                                   {"name" : "EVENT_VALUE",
                                    "type" : "float"},
                                   {"name" : "EVENT_TYPE",
                                    "type" : "string"}],
                       "version" : "1.0"}
cmq
질문됨 일 년 전711회 조회
1개 답변
0

One possible/likely issue is that the TIMESTAMP column type is float. It should be type long. Also, EVENT_VALUE is typically float but I believe it works as a long.

AWS
James_J
답변함 일 년 전
  • So I tried that (making TIMESTAMP a long and EVENT_VALUE a float) and I'm still getting the same exception. Is there anything else I can try (or more information I can give to help diagnose the problem)?

  • @cmq What does your interactions dataset look like? Can you share the first few rows? Are the USER_ID, ITEM_ID, TIMESTAMP, and EVENT_TYPE columns fully populated?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠