Questions tagged with Amazon SageMaker Model Training

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

ResourceLimitExceeded exception but I have Quota

I am not able to to start a SageMaker notebook neither a SageMaker training job with ml.c5.xlarge (or any other instance type). I checked on "Quota Services", and I clearly have quotes for both tasks. - 1 in "applied quota value" for "ml.c5.xlarge for notebook instance usage". - 15 in "applied quota value" for "ml.c5.xlarge for training job usage". Of course I am checking in the same region I try to work: "us-east-1". I have researched for several days, and all forum suggests to ask for a limit increase. Nevertheless, I already have quota (limits) available. Nevertheless, when I try to start the Jupyter notebook, it raise the exception `The account-level service limit 'ml.c5.xlarge for notebook instance usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please contact AWS support to request an increase for this limit.` It is strange because the exception says that I have a limit of 0 instances, while the quota list services says I have 1. Here's the output of the command `service-quotas list-service-quotas` ``` { "ServiceCode": "sagemaker", "ServiceName": "Amazon SageMaker", "QuotaArn": "arn:aws:servicequotas:us-east-1:631720213551:sagemaker/L-E2BB44FE", "QuotaCode": "L-E2BB44FE", "QuotaName": "ml.c5.xlarge for training job usage", "Value": 15.0, "Unit": "None", "Adjustable": true, "GlobalQuota": false }, { "ServiceCode": "sagemaker", "ServiceName": "Amazon SageMaker", "QuotaArn": "arn:aws:servicequotas:us-east-1:631720213551:sagemaker/L-39F5FD98", "QuotaCode": "L-39F5FD98", "QuotaName": "ml.c5.xlarge for notebook instance usage", "Value": 1.0, "Unit": "None", "Adjustable": true, "GlobalQuota": false, "UsageMetric": { "MetricNamespace": "AWS/Usage", "MetricName": "ResourceCount", "MetricDimensions": { "Class": "None", "Resource": "notebook-instance/ml.c5.xlarge", "Service": "SageMaker", "Type": "Resource" }, "MetricStatisticRecommendation": "Maximum" } }, ``` I strongly appreciate your help, because I have no way to open a SageMaker training job for several days. Thanks.
1
answers
1
votes
38
views
Gabriel
asked 2 months ago

No such file or directory: '/opt/ml/input/data/test/revenue_train.csv' Sagemaker [SM_CHANNEL_TRAIN]

I am trying to deploy my RandomForestClassifier on Amazon Sagemaker using Python SDK. I have been following this example https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-script-mode/sagemaker-script-mode.ipynb but keep getting an error that the train file was not found. I think the file were not uploaded to the correct channel. When I run the script as follows it works fine. ``` ! python script_rf.py --model-dir ./ \ --train ./ \ --test ./ \ ``` This is my script code: ``` # inference functions --------------- def model_fn(model_dir): clf = joblib.load(os.path.join(model_dir, "model.joblib")) return clf if __name__ =='__main__': print('extracting arguments') parser = argparse.ArgumentParser() # hyperparameters sent by the client are passed as command-line arguments to the script. parser.add_argument('--max_depth', type=int, default=2) parser.add_argument('--n_estimators', type=int, default=100) parser.add_argument('--random_state', type=int, default=0) # Data, model, and output directories parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR')) parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TEST')) parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST')) parser.add_argument('--train-file', type=str, default='revenue_train.csv') parser.add_argument('--test-file', type=str, default='revenue_test.csv') args, _ = parser.parse_known_args() print('reading data') train_df = pd.read_csv(os.path.join(args.train, args.train_file)) test_df = pd.read_csv(os.path.join(args.test, args.test_file)) if len(train_df) == 0: raise ValueError(('There are no files in {}.\n').format(args.train, "train")) print('building training and testing datasets') attributes = ['available_minutes_100','ampido_slots_amount','ampido_slots_amount_100','ampido_slots_amount_200','ampido_slots_amount_300','min_dist_loc','count_event','min_dist_phouses','count_phouses','min_dist_stops','count_stops','min_dist_tickets','count_tickets','min_dist_google','min_dist_psa','count_psa'] X_train = train_df[attributes] X_test = test_df[attributes] y_train = train_df['target'] y_test = test_df['target'] # train print('training model') model = RandomForestClassifier( max_depth =args.max_depth, n_estimators = args.n_estimators) model.fit(X_train, y_train) # persist model path = os.path.join(args.model_dir, "model_rf.joblib") joblib.dump(model, path) print('model persisted at ' + path) # print accuracy and confusion matrix print('validating model') y_pred=model.predict(X_test) print('Confusion Matrix:') result = confusion_matrix(y_test, y_pred) print(result) print('Accuracy:') result2 = accuracy_score(y_test, y_pred) print(result2) ``` the error is raised in the train_df line of the script (FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/test/revenue_train.csv'). I tried specifying the input parameters: ``` # change channel input dirs inputs = { "train": "ampido-exports/production/revenue_train", "test": "ampido-exports/production/revenue_test", } from sagemaker.sklearn.estimator import SKLearn enable_local_mode_training = False hyperparameters = {"max_depth": 2, 'random_state':0, "n_estimators": 100} if enable_local_mode_training: train_instance_type = "local" inputs = {"train": trainpath, "test": testpath} else: train_instance_type = "ml.c5.xlarge" inputs = {"train": trainpath, "test": testpath} estimator_parameters = { "entry_point": "script_rf.py", "framework_version": "1.0-1", "py_version": "py3", "instance_type": train_instance_type, "instance_count": 1, "hyperparameters": hyperparameters, "role": role, "base_job_name": "randomforestclassifier-model", 'channel_input_dirs' : inputs } estimator = SKLearn(**estimator_parameters) estimator.fit(inputs) ``` but i still get the error FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/test/revenue_train.csv
1
answers
0
votes
53
views
asked 3 months ago