如何對 SageMaker Studio 排程的筆記本任務進行疑難排解?
2 分的閱讀內容
0
當我在 Amazon SageMaker Studio 中執行排程的筆記本任務時,會出現錯誤。
簡短描述
有兩個常見錯誤可能會阻止 SageMaker Studio 中進行排程的筆記本任務:
- AccessDenied 錯誤
- 嘗試更新任務時的 UI 錯誤
解決方案
AccessDenied 錯誤
AccessDenied 錯誤通常涉及下列問題:
- AWS Identity and Access Management (IAM) 政策
- 虛擬私有雲端 (VPC) 端點政策
- 資源標籤例外狀況
IAM 政策問題
AccessDenied 錯誤通常由權限型錯誤所造成。因此,請遵循筆記本任務所需的 IAM 角色的最佳做法。建立基本信任關係需要以下 IAM 角色:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" }, { "Effect": "Allow", "Principal": { "Service": "events.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
此外,請驗證您的 IAM 角色是否具備以下權限:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::*:role/*", "Condition": { "StringLike": { "iam:PassedToService": [ "sagemaker.amazonaws.com", "events.amazonaws.com" ] } } }, { "Effect": "Allow", "Action": [ "events:TagResource", "events:DeleteRule", "events:PutTargets", "events:DescribeRule", "events:PutRule", "events:RemoveTargets", "events:DisableRule", "events:EnableRule" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true" } } }, { "Effect": "Allow", "Action": [ "s3:CreateBucket", "s3:PutBucketVersioning", "s3:PutEncryptionConfiguration" ], "Resource": "arn:aws:s3:::sagemaker-automated-execution-*" }, { "Effect": "Allow", "Action": [ "sagemaker:ListTags" ], "Resource": [ "arn:aws:sagemaker:*:*:user-profile/*", "arn:aws:sagemaker:*:*:space/*", "arn:aws:sagemaker:*:*:training-job/*", "arn:aws:sagemaker:*:*:pipeline/*" ] }, { "Effect": "Allow", "Action": [ "sagemaker:AddTags" ], "Resource": [ "arn:aws:sagemaker:*:*:training-job/*", "arn:aws:sagemaker:*:*:pipeline/*" ] }, { "Effect": "Allow", "Action": [ "ec2:CreateNetworkInterface", "ec2:CreateNetworkInterfacePermission", "ec2:CreateVpcEndpoint", "ec2:DeleteNetworkInterface", "ec2:DeleteNetworkInterfacePermission", "ec2:DescribeDhcpOptions", "ec2:DescribeNetworkInterfaces", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeVpcEndpoints", "ec2:DescribeVpcs", "ecr:BatchCheckLayerAvailability", "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer", "ecr:GetAuthorizationToken", "s3:ListBucket", "s3:GetBucketLocation", "s3:GetEncryptionConfiguration", "s3:PutObject", "s3:DeleteObject", "s3:GetObject", "sagemaker:DescribeDomain", "sagemaker:DescribeUserProfile", "sagemaker:DescribeSpace", "sagemaker:DescribeStudioLifecycleConfig", "sagemaker:DescribeImageVersion", "sagemaker:DescribeAppImageConfig", "sagemaker:CreateTrainingJob", "sagemaker:DescribeTrainingJob", "sagemaker:StopTrainingJob", "sagemaker:Search", "sagemaker:CreatePipeline", "sagemaker:DescribePipeline", "sagemaker:DeletePipeline", "sagemaker:StartPipelineExecution" ], "Resource": "*" } ] }
如需詳細資訊,請參閱 SageMaker 筆記本的 AWS 受管政策。
VPC 端點問題
如果您透過 VPC 端點啟動筆記本任務,請檢查端點的組態和政策。確保您遵循相關服務端點的步驟和最佳做法:
- Amazon Elastic Compute Cloud (Amazon EC2) VPC 端點
- Amazon EventBridge 端點
- SageMaker 端點
- Amazon Simple Storage Service (Amazon S3) 端點
對於 Amazon S3 VPC 端點,最常見的錯誤與限制為單一帳戶的端點有關。例如,下列政策會限制對 ID 為 111122223333 的帳戶存取權限:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowSpecificAccountsPermission", "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": "s3:*", "Resource": "*", "Condition": { "StringEquals": { "s3:ResourceAccount": "111122223333" } } } ] }
在此情況下,您還必須允許使用者存取以下儲存貯體:
{ "Action": [ "s3:*" ], "Resource": [ "arn:aws:s3:::sagemakerheadlessexecution-prod-*", "arn:aws:s3:::sagemakerheadlessexecution-prod-*/*" ], "Effect": "Allow", "Sid": "SCTASK14554266" }
資源標籤例外狀況
確保您的 IAM 政策具備以下權限:
{ "Effect": "Allow", "Action": [ "events:TagResource", "events:DeleteRule", "events:PutTargets", "events:DescribeRule", "events:PutRule", "events:RemoveTargets", "events:DisableRule", "events:EnableRule" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true" } } }
嘗試更新任務時的 UI 錯誤
當您嘗試建立、描述、更新、停止或刪除筆記本任務,可能會遇到 UI 錯誤。任務定義 (已排程的任務) 方面也可能會遇到這個問題。若要對此問題進行疑難排解,請先記下 UI 出現的錯誤訊息。此訊息通常包含解決問題的指示或建議動作。
如果無法解決錯誤,請完成以下步驟:
- 擷取錯誤的螢幕擷取畫面,然後將其另存為影像檔案。
- 建立 HTTP 封存 (HAR) 檔案,以在發生 UI 錯誤時擷取網路流量。
- 前往 SageMaker Studio 的 Jupyter 伺服器終端。選擇檔案、新增、終端。
- 檢查 /var/log/apps/app_container.log 的日誌,查看在出現 UI 錯誤時是否存在例外狀況、錯誤或警告。
- 透過 AWS Support Center 聯絡 AWS Support。附加錯誤的螢幕擷取畫面、app_container.log 和 HAR 檔案在您的請求中。
AWS 官方已更新 6 個月前
沒有評論
相關內容
- 已提問 9 個月前lg...
- AWS 官方已更新 1 年前
- AWS 官方已更新 1 年前
- AWS 官方已更新 1 年前