By using AWS re:Post, you agree to the Terms of Use
Questions in Machine Learning & AI
Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

ResourceLimitExceeded exception but I have Quota

I am not able to to start a SageMaker notebook neither a SageMaker training job with ml.c5.xlarge (or any other instance type). I checked on "Quota Services", and I clearly have quotes for both tasks. - 1 in "applied quota value" for "ml.c5.xlarge for notebook instance usage". - 15 in "applied quota value" for "ml.c5.xlarge for training job usage". Of course I am checking in the same region I try to work: "us-east-1". I have researched for several days, and all forum suggests to ask for a limit increase. Nevertheless, I already have quota (limits) available. Nevertheless, when I try to start the Jupyter notebook, it raise the exception `The account-level service limit 'ml.c5.xlarge for notebook instance usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please contact AWS support to request an increase for this limit.` It is strange because the exception says that I have a limit of 0 instances, while the quota list services says I have 1. Here's the output of the command `service-quotas list-service-quotas` ``` { "ServiceCode": "sagemaker", "ServiceName": "Amazon SageMaker", "QuotaArn": "arn:aws:servicequotas:us-east-1:631720213551:sagemaker/L-E2BB44FE", "QuotaCode": "L-E2BB44FE", "QuotaName": "ml.c5.xlarge for training job usage", "Value": 15.0, "Unit": "None", "Adjustable": true, "GlobalQuota": false }, { "ServiceCode": "sagemaker", "ServiceName": "Amazon SageMaker", "QuotaArn": "arn:aws:servicequotas:us-east-1:631720213551:sagemaker/L-39F5FD98", "QuotaCode": "L-39F5FD98", "QuotaName": "ml.c5.xlarge for notebook instance usage", "Value": 1.0, "Unit": "None", "Adjustable": true, "GlobalQuota": false, "UsageMetric": { "MetricNamespace": "AWS/Usage", "MetricName": "ResourceCount", "MetricDimensions": { "Class": "None", "Resource": "notebook-instance/ml.c5.xlarge", "Service": "SageMaker", "Type": "Resource" }, "MetricStatisticRecommendation": "Maximum" } }, ``` I strongly appreciate your help, because I have no way to open a SageMaker training job for several days. Thanks.
0
answers
1
votes
13
views
asked a day ago

Human Review Flow not triggert with low value confidence

Hi, I`m using aws textract to extract information from an png. I want to trigger an human review workflow when textract has a low confidence of the value (not the key!). But it doesn't trigger. My supposition is that the aws console doesn't write the JSON correctly. My png: ![My test file](/media/postImages/original/IMjnuM0RZPSN6h9fwok8iSWg) My python call: ``` response = textract.analyze_document( Document={ "S3Object": { "Bucket": "sagemakerawstextracttest", "Name": "test.png" } }, HumanLoopConfig={ "FlowDefinitionArn":"arn:aws:sagemaker:eu-central-1:392047662260:flow-definition/confunderv2", "HumanLoopName":"223456", "DataAttributes" : { "ContentClassifiers":["FreeOfPersonallyIdentifiableInformation","FreeOfAdultContent"] } }, FeatureTypes=["FORMS"]) ``` The response: ``` {'DocumentMetadata': {'Pages': 1}, 'Blocks': [{'BlockType': 'PAGE', 'Geometry': {'BoundingBox': {'Width': 1.0, 'Height': 1.0, 'Left': 0.0, 'Top': 0.0}, 'Polygon': [{'X': 9.166517763292134e-17, 'Y': 0.0}, {'X': 1.0, 'Y': 1.6361280468230185e-16}, {'X': 1.0, 'Y': 1.0}, {'X': 0.0, 'Y': 1.0}]}, 'Id': '731d5fe7-4ef1-483d-940d-75eb7a113034', 'Relationships': [{'Type': 'CHILD', 'Ids': ['d88b4e1d-257e-4ce2-9d45-8e72b1849c35', '820a0720-e256-43c8-8608-c047c43e02fc', 'd39da0ab-5e3c-46df-a3d2-a96c8d7d62ca', '83b498a5-e833-43bb-9562-c785d668c438', '6849808f-f183-4535-92b1-20770cbdddd1', 'e6cb3e33-6c4a-4d8e-b51a-d8293b8ae487', '9d731496-0f47-471a-828e-63ffc536b2f8', '238a421e-8d4e-4927-a8e7-74c9b8e7c02b', 'afab5162-c4a8-462a-808e-9e73f2c199df']}]}, {'BlockType': 'LINE', 'Confidence': 99.37574005126953, 'Text': 'Name: Paul Spöring', 'Geometry': {'BoundingBox': {'Width': 0.5787070989608765, 'Height': 0.0934426411986351, 'Left': 0.09132613986730576, 'Top': 0.22022898495197296}, 'Polygon': [{'X': 0.09132613986730576, 'Y': 0.22022898495197296}, {'X': 0.6700332760810852, 'Y': 0.22022898495197296}, {'X': 0.6700332760810852, 'Y': 0.31367161870002747}, {'X': 0.09132613986730576, 'Y': 0.31367161870002747}]}, 'Id': 'd88b4e1d-257e-4ce2-9d45-8e72b1849c35', 'Relationships': [{'Type': 'CHILD', 'Ids': ['7a09146f-189a-4f63-9a1e-5b535161c73d', 'b3171404-53e3-4b73-9d32-848a26bb6b00', '71b86f1d-c6d7-443c-bd34-a1bb7dbf059d']}]}, {'BlockType': 'LINE', 'Confidence': 99.62093353271484, 'Text': 'Alter: 20', 'Geometry': {'BoundingBox': {'Width': 0.2573024034500122, 'Height': 0.07728662341833115, 'Left': 0.08906695991754532, 'Top': 0.3895558714866638}, 'Polygon': [{'X': 0.08906695991754532, 'Y': 0.3895558714866638}, {'X': 0.3463693857192993, 'Y': 0.3895558714866638}, {'X': 0.3463693857192993, 'Y': 0.46684250235557556}, {'X': 0.08906695991754532, 'Y': 0.46684250235557556}]}, 'Id': '820a0720-e256-43c8-8608-c047c43e02fc', 'Relationships': [{'Type': 'CHILD', 'Ids': ['e869172f-4479-43e9-9310-bcccb5039bd1', '22d76bef-3742-4aa1-9948-f10631f7be85']}]}, {'BlockType': 'LINE', 'Confidence': 82.59257507324219, 'Text': 'Blub: you', 'Geometry': {'BoundingBox': {'Width': 0.43706580996513367, 'Height': 0.11816448718309402, 'Left': 0.09128420799970627, 'Top': 0.560679018497467}, 'Polygon': [{'X': 0.09128420799970627, 'Y': 0.560679018497467}, {'X': 0.5283499956130981, 'Y': 0.560679018497467}, {'X': 0.5283499956130981, 'Y': 0.6788434982299805}, {'X': 0.09128420799970627, 'Y': 0.6788434982299805}]}, 'Id': 'd39da0ab-5e3c-46df-a3d2-a96c8d7d62ca', 'Relationships': [{'Type': 'CHILD', 'Ids': ['a9fd356f-1935-4f51-8053-02d1bf02f609', 'c657da1a-2441-4844-99bd-f301887eb5ed']}]}, {'BlockType': 'WORD', 'Confidence': 99.62495422363281, 'Text': 'Name:', 'TextType': 'PRINTED', 'Geometry': {'BoundingBox': {'Width': 0.19461767375469208, 'Height': 0.07422882318496704, 'Left': 0.09132613986730576, 'Top': 0.2239169031381607}, 'Polygon': [{'X': 0.09132613986730576, 'Y': 0.2239169031381607}, {'X': 0.28594380617141724, 'Y': 0.2239169031381607}, {'X': 0.28594380617141724, 'Y': 0.29814571142196655}, {'X': 0.09132613986730576, 'Y': 0.29814571142196655}]}, 'Id': '7a09146f-189a-4f63-9a1e-5b535161c73d'}, {'BlockType': 'WORD', 'Confidence': 98.95515441894531, 'Text': 'Paul', 'TextType': 'PRINTED', 'Geometry': {'BoundingBox': {'Width': 0.12700314819812775, 'Height': 0.07847694307565689, 'Left': 0.30101263523101807, 'Top': 0.22022898495197296}, 'Polygon': [{'X': 0.30101263523101807, 'Y': 0.22022898495197296}, {'X': 0.428015798330307, 'Y': 0.22022898495197296}, {'X': 0.428015798330307, 'Y': 0.29870593547821045}, {'X': 0.30101263523101807, 'Y': 0.29870593547821045}]}, 'Id': 'b3171404-53e3-4b73-9d32-848a26bb6b00'}, {'BlockType': 'WORD', 'Confidence': 99.54711151123047, 'Text': 'Spöring', 'TextType': 'PRINTED', 'Geometry': {'BoundingBox': {'Width': 0.23034201562404633, 'Height': 0.09066428989171982, 'Left': 0.4396912455558777, 'Top': 0.22300733625888824}, 'Polygon': [{'X': 0.4396912455558777, 'Y': 0.22300733625888824}, {'X': 0.6700332760810852, 'Y': 0.22300733625888824}, {'X': 0.6700332760810852, 'Y': 0.31367161870002747}, {'X': 0.4396912455558777, 'Y': 0.31367161870002747}]}, 'Id': '71b86f1d-c6d7-443c-bd34-a1bb7dbf059d'}, {'BlockType': 'WORD', 'Confidence': 99.47355651855469, 'Text': 'Alter:', 'TextType': 'PRINTED', 'Geometry': {'BoundingBox': {'Width': 0.16495245695114136, 'Height': 0.07728662341833115, 'Left': 0.08906695991754532, 'Top': 0.3895558714866638}, 'Polygon': [{'X': 0.08906695991754532, 'Y': 0.3895558714866638}, {'X': 0.2540194094181061, 'Y': 0.3895558714866638}, {'X': 0.2540194094181061, 'Y': 0.46684250235557556}, {'X': 0.08906695991754532, 'Y': 0.46684250235557556}]}, 'Id': 'e869172f-4479-43e9-9310-bcccb5039bd1'}, {'BlockType': 'WORD', 'Confidence': 99.76831817626953, 'Text': '20', 'TextType': 'PRINTED', 'Geometry': {'BoundingBox': {'Width': 0.07841669768095016, 'Height': 0.07549438625574112, 'Left': 0.26795268058776855, 'Top': 0.3907521665096283}, 'Polygon': [{'X': 0.26795268058776855, 'Y': 0.3907521665096283}, {'X': 0.3463693857192993, 'Y': 0.3907521665096283}, {'X': 0.3463693857192993, 'Y': 0.4662465453147888}, {'X': 0.26795268058776855, 'Y': 0.4662465453147888}]}, 'Id': '22d76bef-3742-4aa1-9948-f10631f7be85'}, {'BlockType': 'WORD', 'Confidence': 99.90613555908203, 'Text': 'Blub:', 'TextType': 'PRINTED', 'Geometry': {'BoundingBox': {'Width': 0.15235087275505066, 'Height': 0.07806506007909775, 'Left': 0.09128420799970627, 'Top': 0.560679018497467}, 'Polygon': [{'X': 0.09128420799970627, 'Y': 0.560679018497467}, {'X': 0.24363507330417633, 'Y': 0.560679018497467}, {'X': 0.24363507330417633, 'Y': 0.638744056224823}, {'X': 0.09128420799970627, 'Y': 0.638744056224823}]}, 'Id': 'a9fd356f-1935-4f51-8053-02d1bf02f609'}, {'BlockType': 'WORD', 'Confidence': 65.27902221679688, 'Text': 'you', 'TextType': 'HANDWRITING', 'Geometry': {'BoundingBox': {'Width': 0.2605644464492798, 'Height': 0.10577575862407684, 'Left': 0.26778554916381836, 'Top': 0.5730677247047424}, 'Polygon': [{'X': 0.26778554916381836, 'Y': 0.5730677247047424}, {'X': 0.5283499956130981, 'Y': 0.5730677247047424}, {'X': 0.5283499956130981, 'Y': 0.6788434982299805}, {'X': 0.26778554916381836, 'Y': 0.6788434982299805}]}, 'Id': 'c657da1a-2441-4844-99bd-f301887eb5ed'}, {'BlockType': 'KEY_VALUE_SET', 'Confidence': 95.0, 'Geometry': {'BoundingBox': {'Width': 0.14683577418327332, 'Height': 0.07679956406354904, 'Left': 0.09270352125167847, 'Top': 0.5600157380104065}, 'Polygon': [{'X': 0.09270352125167847, 'Y': 0.5600157380104065}, {'X': 0.23953929543495178, 'Y': 0.5600157380104065}, {'X': 0.23953929543495178, 'Y': 0.6368153095245361}, {'X': 0.09270352125167847, 'Y': 0.6368153095245361}]}, 'Id': '83b498a5-e833-43bb-9562-c785d668c438', 'Relationships': [{'Type': 'VALUE', 'Ids': ['6849808f-f183-4535-92b1-20770cbdddd1']}, {'Type': 'CHILD', 'Ids': ['a9fd356f-1935-4f51-8053-02d1bf02f609']}], 'EntityTypes': ['KEY']}, {'BlockType': 'KEY_VALUE_SET', 'Confidence': 95.0, 'Geometry': {'BoundingBox': {'Width': 0.25565099716186523, 'Height': 0.10435424745082855, 'Left': 0.27307748794555664, 'Top': 0.5714280009269714}, 'Polygon': [{'X': 0.27307748794555664, 'Y': 0.5714280009269714}, {'X': 0.5287284851074219, 'Y': 0.5714280009269714}, {'X': 0.5287284851074219, 'Y': 0.6757822036743164}, {'X': 0.27307748794555664, 'Y': 0.6757822036743164}]}, 'Id': '6849808f-f183-4535-92b1-20770cbdddd1', 'Relationships': [{'Type': 'CHILD', 'Ids': ['c657da1a-2441-4844-99bd-f301887eb5ed']}], 'EntityTypes': ['VALUE']}, {'BlockType': 'KEY_VALUE_SET', 'Confidence': 94.0, 'Geometry': {'BoundingBox': {'Width': 0.18979747593402863, 'Height': 0.06938383728265762, 'Left': 0.09253517538309097, 'Top': 0.22251036763191223}, 'Polygon': [{'X': 0.09253517538309097, 'Y': 0.22251036763191223}, {'X': 0.2823326289653778, 'Y': 0.22251036763191223}, {'X': 0.2823326289653778, 'Y': 0.29189419746398926}, {'X': 0.09253517538309097, 'Y': 0.29189419746398926}]}, 'Id': 'e6cb3e33-6c4a-4d8e-b51a-d8293b8ae487', 'Relationships': [{'Type': 'VALUE', 'Ids': ['9d731496-0f47-471a-828e-63ffc536b2f8']}, {'Type': 'CHILD', 'Ids': ['7a09146f-189a-4f63-9a1e-5b535161c73d']}], 'EntityTypes': ['KEY']}, {'BlockType': 'KEY_VALUE_SET', 'Confidence': 94.0, 'Geometry': {'BoundingBox': {'Width': 0.359325647354126, 'Height': 0.09458027780056, 'Left': 0.3059462904930115, 'Top': 0.21961979568004608}, 'Polygon': [{'X': 0.3059462904930115, 'Y': 0.21961979568004608}, {'X': 0.6652719378471375, 'Y': 0.21961979568004608}, {'X': 0.6652719378471375, 'Y': 0.3142000734806061}, {'X': 0.3059462904930115, 'Y': 0.3142000734806061}]}, 'Id': '9d731496-0f47-471a-828e-63ffc536b2f8', 'Relationships': [{'Type': 'CHILD', 'Ids': ['b3171404-53e3-4b73-9d32-848a26bb6b00', '71b86f1d-c6d7-443c-bd34-a1bb7dbf059d']}], 'EntityTypes': ['VALUE']}, {'BlockType': 'KEY_VALUE_SET', 'Confidence': 90.0, 'Geometry': {'BoundingBox': {'Width': 0.16694426536560059, 'Height': 0.076688751578331, 'Left': 0.08947249501943588, 'Top': 0.38823941349983215}, 'Polygon': [{'X': 0.08947249501943588, 'Y': 0.38823941349983215}, {'X': 0.25641676783561707, 'Y': 0.38823941349983215}, {'X': 0.25641676783561707, 'Y': 0.46492815017700195}, {'X': 0.08947249501943588, 'Y': 0.46492815017700195}]}, 'Id': '238a421e-8d4e-4927-a8e7-74c9b8e7c02b', 'Relationships': [{'Type': 'VALUE', 'Ids': ['afab5162-c4a8-462a-808e-9e73f2c199df']}, {'Type': 'CHILD', 'Ids': ['e869172f-4479-43e9-9310-bcccb5039bd1']}], 'EntityTypes': ['KEY']}, {'BlockType': 'KEY_VALUE_SET', 'Confidence': 90.0, 'Geometry': {'BoundingBox': {'Width': 0.0738547071814537, 'Height': 0.0703207403421402, 'Left': 0.26923850178718567, 'Top': 0.39165323972702026}, 'Polygon': [{'X': 0.26923850178718567, 'Y': 0.39165323972702026}, {'X': 0.34309321641921997, 'Y': 0.39165323972702026}, {'X': 0.34309321641921997, 'Y': 0.46197396516799927}, {'X': 0.26923850178718567, 'Y': 0.46197396516799927}]}, 'Id': 'afab5162-c4a8-462a-808e-9e73f2c199df', 'Relationships': [{'Type': 'CHILD', 'Ids': ['22d76bef-3742-4aa1-9948-f10631f7be85']}], 'EntityTypes': ['VALUE']}], 'HumanLoopActivationOutput': {'HumanLoopActivationReasons': [], 'HumanLoopActivationConditionsEvaluationResults': '{"Conditions":[{"And":[{"ConditionType":"ImportantFormKeyConfidenceCheck","ConditionParameters":{"ImportantFormKey":"*","ImportantFormKeyAliases":[],"KeyValueBlockConfidenceLessThan":20.0,"WordBlockConfidenceLessThan":90.0},"EvaluationResult":false},{"ConditionType":"ImportantFormKeyConfidenceCheck","ConditionParameters":{"ImportantFormKey":"*","ImportantFormKeyAliases":[],"KeyValueBlockConfidenceGreaterThan":0.0,"WordBlockConfidenceGreaterThan":0.0},"EvaluationResult":true}],"EvaluationResult":false}]}'}, 'AnalyzeDocumentModelVersion': '1.0', 'ResponseMetadata': {'RequestId': '33e55167-74c8-461b-b83e-ccda0603c1a7', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '33e55167-74c8-461b-b83e-ccda0603c1a7', 'content-type': 'application/x-amz-json-1.1', 'content-length': '10018', 'date': 'Tue, 04 Oct 2022 12:41:57 GMT'}, 'RetryAttempts': 0}} ``` As you can see is the confidence in the key quite high, but the confidence in the value of the key "Blub" low. My console configuration: ![Enter image description here](/media/postImages/original/IMDVvgans0R3ugMd--SrnY2Q) And the JSON that the console makes from it: ``` { "Conditions": [ { "And": [ { "ConditionType": "ImportantFormKeyConfidenceCheck", "ConditionParameters": { "ImportantFormKey": "*", "KeyValueBlockConfidenceLessThan": 20, "WordBlockConfidenceLessThan": 90 } }, { "ConditionType": "ImportantFormKeyConfidenceCheck", "ConditionParameters": { "ImportantFormKey": "*", "KeyValueBlockConfidenceGreaterThan": 0, "WordBlockConfidenceGreaterThan": 0 } } ] } ] } ``` My question is, how to I configure it, that the human review flow is triggert, when the confidence of the value (not the key, or key and value) is low? Best regards, Paul
0
answers
0
votes
4
views
asked 2 days ago

Using HuggingFace in Sagemaker Studio as part of a project

TLDR: if we are trying to use a HuggingFaceProcessor/Estimator in a Sagemaker Studio project, what are the requirements for the `train.py` file in terms of how it refers to the assembled training data, and where it should store the results of the operations it performs( e.g. compiled model, datae etc.) ----------------------- FULL DETAILS ------------------------ So our high level goal is to be able to deploy some kind of non-XGB model from a sagemaker studio project, given that the templates provided are all XGB. As outlined in [an earlier question](https://repost.aws/questions/QUdd2zOBY0Q4CEG1ZdbgNsgA/using-transformers-module-with-sagemaker-studio-project-module-not-found-error-no-module-named-transformers) we'd started with TensorFlow, but since our TensorFlow model wraps a HuggingFace model we thought let's try something even simpler, just a HuggingFace model using the HuggingFaceProcessor. So following docs on [HuggingFaceProcessor](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job-frameworks-hugging-face.html) and a [HuggingFace Estimator](https://github.com/huggingface/notebooks/blob/main/sagemaker/02_getting_started_tensorflow/sagemaker-notebook.ipynb) example we started to adjust the abalone (project template) pipeline.py to look like this (full code can be provided on request): ``` # processing step for feature engineering hf_processor = HuggingFaceProcessor( role=role, instance_count=processing_instance_count, instance_type=processing_instance_type, transformers_version='4.4.2', pytorch_version='1.6.0', base_job_name=f"{base_job_prefix}/frameworkprocessor-hf", sagemaker_session=pipeline_session, ) step_args = hf_processor.run( outputs=[ ProcessingOutput(output_name="train", source="/opt/ml/processing/train"), ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"), ProcessingOutput(output_name="test", source="/opt/ml/processing/test"), ], code=os.path.join(BASE_DIR, "preprocess.py"), arguments=["--input-data", input_data], ) step_process = ProcessingStep( name="PreprocessTopicData", step_args=step_args, ) # training step for generating model artifacts model_path = f"s3://{sagemaker_session.default_bucket()}/{base_job_prefix}/TopicTrain" hf_train = HuggingFace(entry_point='train.py', source_dir=BASE_DIR, base_job_name='huggingface-sdk-extension', instance_type=processing_instance_type, instance_count=processing_instance_count, transformers_version='4.4', pytorch_version='1.6', py_version='py36', role=role, ) hf_train.set_hyperparameters( epochs=3, train_batch_size=16, learning_rate=1.0e-5, model_name='distilbert-base-uncased', ) step_args = hf_train.fit( inputs={ "train": TrainingInput( s3_data=step_process.properties.ProcessingOutputConfig.Outputs[ "train" ].S3Output.S3Uri, content_type="text/csv", ), "validation": TrainingInput( s3_data=step_process.properties.ProcessingOutputConfig.Outputs[ "validation" ].S3Output.S3Uri, content_type="text/csv", ), }, ) ``` Finding that pushing to master doesn't provide any feedback on issues arising from pipeline.py, we realised that trying to get the pipeline from a notebook was a better way of debugging these sorts of changes, assuming one remembered to restart the kernel each time to ensure changes to the pipeline.py file was available to the notebook. So using the following code in the notebook we worked through a series of issues trying to bash the code into shape such that it would compile: ``` from pipelines.topic.pipeline import get_pipeline pipeline = get_pipeline( region=region, role=role, default_bucket=default_bucket, model_package_group_name=model_package_group_name, pipeline_name=pipeline_name, ) ``` We needed to change the default processing and training instance types to avoid a "cpu" unsupported issue: ``` processing_instance_type="ml.p3.xlarge", training_instance_type="ml.p3.xlarge", ``` and add a train.py script: ``` from transformers import AutoTokenizer from transformers import TFAutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=18) import pandas as pd import tensorflow as tf from sklearn.model_selection import train_test_split from transformers import ( DistilBertTokenizerFast, TFDistilBertForSequenceClassification, ) DATA_COLUMN = 'text' LABEL_COLUMN = 'label' MAX_SEQUENCE_LENGTH = 512 LEARNING_RATE = 5e-5 BATCH_SIZE = 16 NUM_EPOCHS = 3 NUM_LABELS = 15 if __name__ == "__main__": # -------------------------------------------------------------------------------- # Tokenizer # -------------------------------------------------------------------------------- tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased') def tokenize(sentences, max_length=MAX_SEQUENCE_LENGTH, padding='max_length'): """Tokenize using the Huggingface tokenizer Args: sentences: String or list of string to tokenize padding: Padding method ['do_not_pad'|'longest'|'max_length'] """ return tokenizer( sentences, truncation=True, padding=padding, max_length=max_length, return_tensors="tf" ) # -------------------------------------------------------------------------------- # Load data # -------------------------------------------------------------------------------- from keras.utils import to_categorical from sklearn.preprocessing import LabelEncoder labelencoder_Y_1 = LabelEncoder() yy = labelencoder_Y_1.fit_transform(train_data[LABEL_COLUMN].tolist()) yy = to_categorical(yy) print(len(yy)) print(yy.shape) train_dat, validation_dat, train_label, validation_label = train_test_split( train_data[DATA_COLUMN].tolist(), yy, test_size=0.2, shuffle=True ) # -------------------------------------------------------------------------------- # Prepare TF dataset # -------------------------------------------------------------------------------- train_dataset = tf.data.Dataset.from_tensor_slices(( dict(tokenize(train_dat)), # Convert BatchEncoding instance to dictionary train_label )).shuffle(1000).batch(BATCH_SIZE).prefetch(1) validation_dataset = tf.data.Dataset.from_tensor_slices(( dict(tokenize(validation_dat)), validation_label )).batch(BATCH_SIZE).prefetch(1) # -------------------------------------------------------------------------------- # training # -------------------------------------------------------------------------------- model = TFDistilBertForSequenceClassification.from_pretrained( 'distilbert-base-uncased', num_labels=NUM_LABELS ) optimizer = tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE) model.compile( optimizer=optimizer, loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True), ) ``` However we are now stuck on this error when trying to get the pipeline from a notebook. ```TypeError Traceback (most recent call last) <ipython-input-3-be38b3dda75f> in <module> 7 default_bucket=default_bucket, 8 model_package_group_name=model_package_group_name, ----> 9 pipeline_name=pipeline_name, 10 ) 11 # !conda list ~/topic-models-no-monitoring-p-rboparx6tdeg/sagemaker-topic-models-no-monitoring-p-rboparx6tdeg-modelbuild/pipelines/topic/pipeline.py in get_pipeline(region, sagemaker_project_arn, role, default_bucket, model_package_group_name, pipeline_name, base_job_prefix, processing_instance_type, training_instance_type) 228 "validation" 229 ].S3Output.S3Uri, --> 230 content_type="text/csv", 231 ), 232 }, /opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline_context.py in wrapper(*args, **kwargs) 246 return self_instance.sagemaker_session.context 247 --> 248 return run_func(*args, **kwargs) 249 250 return wrapper /opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config) 1059 self._prepare_for_training(job_name=job_name) 1060 -> 1061 self.latest_training_job = _TrainingJob.start_new(self, inputs, experiment_config) 1062 self.jobs.append(self.latest_training_job) 1063 if wait: /opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in start_new(cls, estimator, inputs, experiment_config) 1956 train_args = cls._get_train_args(estimator, inputs, experiment_config) 1957 -> 1958 estimator.sagemaker_session.train(**train_args) 1959 1960 return cls(estimator.sagemaker_session, estimator._current_job_name) /opt/conda/lib/python3.7/site-packages/sagemaker/session.py in train(self, input_mode, input_config, role, job_name, output_config, resource_config, vpc_config, hyperparameters, stop_condition, tags, metric_definitions, enable_network_isolation, image_uri, algorithm_arn, encrypt_inter_container_traffic, use_spot_instances, checkpoint_s3_uri, checkpoint_local_path, experiment_config, debugger_rule_configs, debugger_hook_config, tensorboard_output_config, enable_sagemaker_metrics, profiler_rule_configs, profiler_config, environment, retry_strategy) 611 self.sagemaker_client.create_training_job(**request) 612 --> 613 self._intercept_create_request(train_request, submit, self.train.__name__) 614 615 def _get_train_request( # noqa: C901 /opt/conda/lib/python3.7/site-packages/sagemaker/session.py in _intercept_create_request(self, request, create, func_name) 4303 func_name (str): the name of the function needed intercepting 4304 """ -> 4305 return create(request) 4306 4307 /opt/conda/lib/python3.7/site-packages/sagemaker/session.py in submit(request) 608 def submit(request): 609 LOGGER.info("Creating training-job with name: %s", job_name) --> 610 LOGGER.debug("train request: %s", json.dumps(request, indent=4)) 611 self.sagemaker_client.create_training_job(**request) 612 /opt/conda/lib/python3.7/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw) 236 check_circular=check_circular, allow_nan=allow_nan, indent=indent, 237 separators=separators, default=default, sort_keys=sort_keys, --> 238 **kw).encode(obj) 239 240 /opt/conda/lib/python3.7/json/encoder.py in encode(self, o) 199 chunks = self.iterencode(o, _one_shot=True) 200 if not isinstance(chunks, (list, tuple)): --> 201 chunks = list(chunks) 202 return ''.join(chunks) 203 /opt/conda/lib/python3.7/json/encoder.py in _iterencode(o, _current_indent_level) 429 yield from _iterencode_list(o, _current_indent_level) 430 elif isinstance(o, dict): --> 431 yield from _iterencode_dict(o, _current_indent_level) 432 else: 433 if markers is not None: /opt/conda/lib/python3.7/json/encoder.py in _iterencode_dict(dct, _current_indent_level) 403 else: 404 chunks = _iterencode(value, _current_indent_level) --> 405 yield from chunks 406 if newline_indent is not None: 407 _current_indent_level -= 1 /opt/conda/lib/python3.7/json/encoder.py in _iterencode_dict(dct, _current_indent_level) 403 else: 404 chunks = _iterencode(value, _current_indent_level) --> 405 yield from chunks 406 if newline_indent is not None: 407 _current_indent_level -= 1 /opt/conda/lib/python3.7/json/encoder.py in _iterencode(o, _current_indent_level) 436 raise ValueError("Circular reference detected") 437 markers[markerid] = o --> 438 o = _default(o) 439 yield from _iterencode(o, _current_indent_level) 440 if markers is not None: /opt/conda/lib/python3.7/json/encoder.py in default(self, o) 177 178 """ --> 179 raise TypeError(f'Object of type {o.__class__.__name__} ' 180 f'is not JSON serializable') 181 TypeError: Object of type ParameterInteger is not JSON serializable ``` Which is telling us that some aspect of the training job (?) is not serializable, and it's not clear how to debug further. What would be enormously helpful is project templates for sagemaker studio showing the use of all the Processors, e.g. HuggingFace, TensorFlow and so on, but failing that we'd be most grateful is anyone could point us to documentation on what the requirements are for the `train.py` file that we need to specifiy for the HuggingFace Estimator. many thanks in advance
0
answers
0
votes
11
views
asked 2 days ago

AWS Pytorch Neuron Compliation Error

I followed user guide on updating torch neuron and then started compiling the model to neuron. But got an error, from which I don't understand what's wrong. In Neuron SDK you claim that it should compile all operations, even not supported ones, they just should run on CPU. The error: ``` INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile) INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 3345, fused = 3345, percent fused = 100.0% INFO:Neuron:Number of neuron graph operations 8175 did not match traced graph 9652 - using heuristic matching of hierarchical information INFO:Neuron:Compiling function _NeuronGraph$3362 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]} --verbose 35' ..............................................................................INFO:Neuron:Compile command returned: -9 WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$3362; falling back to native python function call ERROR:Neuron:neuron-cc failed with the following command line call: /home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35 Traceback (most recent call last): File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 382, in op_converter item, inputs, compiler_workdir=sg_workdir, **kwargs) File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/decorators.py", line 220, in trace 'neuron-cc failed with the following command line call:\n{}'.format(command)) subprocess.SubprocessError: neuron-cc failed with the following command line call: /home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35 INFO:Neuron:Number of arithmetic operators (post-compilation) before = 3345, compiled = 0, percent compiled = 0.0% INFO:Neuron:The neuron partitioner created 1 sub-graphs INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0% INFO:Neuron:Compiled these operators (and operator counts) to Neuron: INFO:Neuron:Not compiled operators (and operator counts) to Neuron: INFO:Neuron: => aten::Int: 942 [supported] INFO:Neuron: => aten::_convolution: 107 [supported] INFO:Neuron: => aten::add: 104 [supported] INFO:Neuron: => aten::batch_norm: 1 [supported] INFO:Neuron: => aten::cat: 1 [supported] INFO:Neuron: => aten::contiguous: 4 [supported] INFO:Neuron: => aten::div: 104 [supported] INFO:Neuron: => aten::dropout: 208 [supported] INFO:Neuron: => aten::feature_dropout: 1 [supported] INFO:Neuron: => aten::flatten: 60 [supported] INFO:Neuron: => aten::gelu: 52 [supported] INFO:Neuron: => aten::layer_norm: 161 [supported] INFO:Neuron: => aten::linear: 264 [supported] INFO:Neuron: => aten::matmul: 104 [supported] INFO:Neuron: => aten::mul: 52 [supported] INFO:Neuron: => aten::permute: 210 [supported] INFO:Neuron: => aten::relu: 1 [supported] INFO:Neuron: => aten::reshape: 262 [supported] INFO:Neuron: => aten::select: 104 [supported] INFO:Neuron: => aten::sigmoid: 1 [supported] INFO:Neuron: => aten::size: 278 [supported] INFO:Neuron: => aten::softmax: 52 [supported] INFO:Neuron: => aten::transpose: 216 [supported] INFO:Neuron: => aten::upsample_bilinear2d: 4 [supported] INFO:Neuron: => aten::view: 52 [supported] Traceback (most recent call last): File "to_neuron.py", line 14, in <module> model_neuron = torch.neuron.trace(model, example_inputs=[image.cuda()]) File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 184, in trace cu.stats_post_compiler(neuron_graph) File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 493, in stats_post_compiler "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! ```
1
answers
0
votes
7
views
asked 3 days ago