By using AWS re:Post, you agree to the Terms of Use
/Amazon SageMaker Pipelines/

Questions tagged with Amazon SageMaker Pipelines

Sort by most recent
  • 1
  • 90 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Sagemaker Pipelines - Batch Transform job using generated predictions as input for the model

Hi all! So, we're trying to implement a very simple Sagemaker Pipeline with 3 steps: * **ETL:** for now it only runs a simple query * **Batch transform:** uses the ETL's result and generates predictions with a batch transform job * **Report:** generates an HTML report The thing is, when running the batch transform job alone in the Pipeline, everything runs OK. But when trying to run all the steps in a Pipeline, the batch transform job fails, and what we have seen in the logs is that the job takes the dataset which was generated in the ETL step, generates the predictions and saves them correctly in S3 (this is where we would expect the job to stop) but then it resends those predictions to the endpoint, as if they were a new input, and so the step fails as the model receives an array of 1 column thus mismatching the number of features which it was trained with. There's not much info out there on this, and Sagemaker is painfully hard to debug. Has anyone experienced anything like this? Our model and transformer code: ```python model = XGBoostModel( model_data=f"s3://{BUCKET}/{MODEL_ARTIFACTS_PATH}/artifacts.gzip", role=get_execution_role(), entry_point="predict.py", framework_version="1.3-1", ) transformer = model.transformer( instance_count=1, instance_type="ml.m5.large", output_path=f"s3://{BUCKET}/{PREDICTIONS_PATH}/", accept="text/csv", ) step = TransformStep( name="Batch", transformer=transformer, inputs=TransformInput( data=etl_step.properties.ProcessingOutputConfig.Outputs[ "dataset" ].S3Output.S3Uri, content_type="text/csv", split_type="Line", ), depends_on=[etl_step], ) ``` And our inference script: ```python def input_fn(request_body, content_type): return pd.read_csv(StringIO(request_body), header=None).values def predict_fn(input_obj, model): """ Function which takes the result of input_fn and generates predictions. """ return model.predict_proba(input_obj)[:, 1] def output_fn(predictions, content_type): return ",".join(str(pred) for pred in predictions) ```
1
answers
0
votes
34
views
asked a month ago

Invoking endpoint outputs empty prediction data

Hello, I am able to invoke my endpoint using the following command template: > aws --profile ‘insert_profile_name’ sagemaker-runtime invoke-endpoint --endpoint-name 'insert_endpoint_name' --body fileb://'insert_image_file_path' --region ‘insert_region’ --content-type application/x-image output.txt However, this produces an output text file that contains the following: > {prediction": []} Also, this appears in the terminal after running the command: > { "ContentType": "application/json", "InvokedProductionVariant": "variant-name-1" } The image I used to invoke my endpoint was also used for training the model. Here is my training job configuration (values that I've modified or added): > **Job Settings:** > Algorithm - Object Detection | Input Mode - Pipe > **Hyperparameters:** > num_classes - 1 | mini_batch_size - 1 | num_training_samples - 1 > **Input data configuration:** > *First channel:* > Name - validation | Input Mode - Pipe | Content Type - application/x-recordio | Record Wrapper - RecordIO | S3 Data Type - AugmentedManifestFile | Attribute Names - source-ref, bounding-box > *Second channel:* > Name - train | Input Mode - Pipe | Content Type - application/x-recordio | Record Wrapper - RecordIO | S3 Data Type - AugmentedManifestFile | Attribute Names - source-ref, bounding-box Any help would be appreciated. I can provide more information if needed. Thanks!
1
answers
0
votes
18
views
asked 3 months ago

How can I feed outputed augmented manifest file as input to blazingtext in a pipeline?

I'm creating a pipeline with multiple steps One to preprocess a dataset and the other one takes the preprocessed one as an input to train a BlazingText model for classification My first `ProcessingStep` outputs augmented manifest files step_process = ProcessingStep( name="Nab3Process", processor=sklearn_processor, inputs=[ ProcessingInput(source=raw_input_data, destination=raw_dir), ProcessingInput(source=categories_input_data, destination=categories_dir) ], outputs=[ ProcessingOutput(output_name="train", source=train_dir), ProcessingOutput(output_name="validation", source=validation_dir), ProcessingOutput(output_name="test", source=test_dir), ProcessingOutput(output_name="mlb_train", source=mlb_data_train_dir), ProcessingOutput(output_name="mlb_validation", source=mlb_data_validation_dir), ProcessingOutput(output_name="mlb_test", source=mlb_data_test_dir), ProcessingOutput(output_name="le_vectorizer", source=le_vectorizer_dir), ProcessingOutput(output_name="mlb_vectorizer", source=mlb_vectorizer_dir) ], code=preprocessing_dir) But I'm having a hard time when I try to feed my `train` output as a `TrainingInput` to the model step to use it to train. step_train = TrainingStep( name="Nab3Train", estimator=bt_train, inputs={ "train": TrainingInput( step_process.properties.ProcessingOutputConfig.Outputs[ "train" ].S3Output.S3Uri, distribution="FullyReplicated", content_type="application/x-recordio", s3_data_type='AugmentedManifestFile', attribute_names=['source', 'label'], input_mode='Pipe', record_wrapping='RecordIO' ), "validation": TrainingInput( step_process.properties.ProcessingOutputConfig.Outputs[ "validation" ].S3Output.S3Uri, distribution="FullyReplicated", content_type='application/x-recordio', s3_data_type='AugmentedManifestFile', attribute_names=['source', 'label'], input_mode='Pipe', record_wrapping='RecordIO' ) }) And I'm getting the following error 'FailureReason': 'ClientError: Could not download manifest file with S3 URL "s3://sagemaker-us-east-1-xxxxxxxxxx/Nab3Process-xxxxxxxxxx/output/train". Please ensure that the bucket exists in the selected region (us-east-1), that the manifest file exists at that S3 URL, and that the role "arn:aws:iam::xxxxxxxxxx:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole" has "s3:GetObject" permissions on the manifest file. Error message from S3: The specified key does not exist.' What Should I do?
0
answers
0
votes
4
views
asked 4 months ago
  • 1
  • 90 / page