By using AWS re:Post, you agree to the Terms of Use
/AWS Batch/

Questions tagged with AWS Batch

Sort by most recent
  • 1
  • 90 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Trying Sagemaker example but getting error: AttributeError: module 'sagemaker' has no attribute 'create_transform_job'

Hi, I keep getting this error: AttributeError: module 'sagemaker' has no attribute 'create_transform_job', when using a batch transform example that AWS graciously had in the notebook instances. Code: ***Also, I updated Sagemaker to the newest package and its still not working. ``` %%time import time from time import gmtime, strftime batch_job_name = "Batch-Transform-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime()) input_location = "s3://{}/{}/batch/{}".format( bucket, prefix, batch_file ) # use input data without ID column output_location = "s3://{}/{}/output/{}".format(bucket, prefix, batch_job_name) request = { "TransformJobName": batch_job_name, "ModelName": 'xgboost-parquet-example-training-2022-03-28-16-02-31-model', "TransformOutput": { "S3OutputPath": output_location, "Accept": "text/csv", "AssembleWith": "Line", }, "TransformInput": { "DataSource": {"S3DataSource": {"S3DataType": "S3Prefix", "S3Uri": input_location}}, "ContentType": "text/csv", "SplitType": "Line", "CompressionType": "None", }, "TransformResources": {"InstanceType": "ml.m4.xlarge", "InstanceCount": 1}, } sagemaker.create_transform_job(**request) print("Created Transform job with name: ", batch_job_name) # Wait until the job finishes try: sagemaker.get_waiter("transform_job_completed_or_stopped").wait(TransformJobName=batch_job_name) finally: response = sagemaker.describe_transform_job(TransformJobName=batch_job_name) status = response["TransformJobStatus"] print("Transform job ended with status: " + status) if status == "Failed": message = response["FailureReason"] print("Transform failed with the following error: {}".format(message)) raise Exception("Transform job failed") ``` Everything else is working well. I've had no luck with this on anyother forum.
1
answers
0
votes
9
views
AWS-User-7732475
asked 2 months ago

Set cpu and memory requirements for a Fargate AWS Batch job from an AWS Cloudwatch event

I am trying to automate Fargate AWS Batch jobs by means of AWS Cloudwatch Events. So far, so good. I am trying to run the same job definition with different configurations. I am able to set the batch job as a cloudwatch event target. I have learned how to use the Constant (JSON text) configuration to set a parameter of the job. Thus, I can set the name parameter successfully and the job runs. However, I am not able to also set the memory and cpu settings in the Cloudwatch event. I would like to use a larger machine for a a bigger port such as Singapore, without changing the job definition. After all, at the moment it still uses the default vpcu and memory settings of the job definition. ``` { "Parameters": {"name":"wilhelmshaven"}, "ContainerOverrides": { "Command": ["upload_to_day.py", "-port_name","Ref::name"], "resourceRequirements": [ {"type": "MEMORY", "value": "4096"}, {"type": "VCPU", "value": "2"} ] } } ``` Does any one know how to set the Constant (JSON text) configuration or input transformer correctly? Edit: If I try the same thing using the AWS CLI, I can achieve what I would like to do. ``` aws batch submit-job \ --job-name "run-wilhelmshaven" \ --job-queue "arn:aws:batch:eu-central-1:123666072061:job-queue/upload-raw-to-day-vtexplorer" \ --job-definition "arn:aws:batch:eu-central-1:123666072061:job-definition/upload-to-day:2" \ --container-overrides '{"command": ["upload_to_day.py", "-port_name","wilhelmshaven"], "resourceRequirements": [{"value": "2", "type": "VCPU"}, {"value": "4096", "type": "MEMORY"}]}' ```
1
answers
0
votes
3
views
AWS-User-6786633
asked 2 months ago

Recommended batch automated workflow for updating docker containers

How do I update the docker image for a Batch Job Definition using CLI or API? It looks like the APIs for `RegisterJobDefinition` are "create only". You can't update a Job Definition from what I can tell from the documentation, so you can't change the reference to the Docker Image. The JobDefinition really wants to be defined in the CDK Constructs area (or CFT) because it ties in a bunch of stuff I already have in the CDK such as databases and EFS and Secrets. That is fine, as that all that stuff is fairly static. But Docker images are meant to change all the time, and quickly as my devs iterate code. I really don't want to specify the final Docker Image at creation time in CDK or CFT, but it looks like that's the only place to do it. I do not want to re-deploy a CDK/CFT instance just to change some code in a Docker container, that would be a slow and bad practice. Note: a[ similar was asked on the old forum](https://forums.aws.amazon.com/thread.jspa?threadID=257528&tstart=0), but didn't really get an answer. "use :latest" isn't always the best answer for Docker version management. My devs need to be able to iterate quickly and not walk over each other. I would like my devs to be able to change to a new Docker image and then test a batch. How can they do this quickly and easily? Note: Here's a[ blog post ](https://stevelasker.blog/2018/03/01/docker-tagging-best-practices-for-tagging-and-versioning-docker-images/)on `stable` tagging versus `unique` tagging. For deployments they recommend unique. Which isn't supported with BATCH, AFAICT.
1
answers
0
votes
6
views
PaulSPNW
asked 2 months ago

Internal Error on CloudFormation + Batch JobDefinition + Tags

Whenever I try to update tags on an AWS::Batch::JobDefinition, the template deployment fails with "Internal Error". Without changing tags, no problem, with tags, internal error, whatever the tag name or value are. I'm using the latest AWS CLI Here is such a template: ``` MyJobDefinition: Type: AWS::Batch::JobDefinition Properties: Type: "container" Parameters: {} RetryStrategy: Attempts: 2 PropagateTags: true Tags: Version: "3.1.1" ContainerProperties: Command: - "python3" - "worker.py" JobRoleArn: Fn::ImportValue: !Sub "${IAMStackName}-EcsTaskExecutionRole" Image: Fn::Join: - ":" - - Fn::ImportValue: !Sub "${BaseStackName}-WorkerRepositoryUri" - !Ref DockerImageTag ResourceRequirements: - Type: VCPU Value: 16 - Type: MEMORY Value: 16384 ``` Steps to reproduce: - 1. Take the example CFN template https://s3-us-east-2.amazonaws.com/cloudformation-templates-us-east-2/Managed_EC2_Batch_Environment.template - 2. Add a Tag in the JobDefinition (eg. below) - 3. Deploy. All works - 4. Change the JobDefinition Tag in the template - 5. Deploy -> Internal Fail ``` "JobDefinition" : { "Type" : "AWS::Batch::JobDefinition", "Properties" : { "Type" : "container", "ContainerProperties" : { "Image" : { "Fn::Join": [ "", [ "137112412989.dkr.ecr.", { "Ref" : "AWS::Region" }, ".amazonaws.com/amazonlinux:latest" ]] }, "Vcpus" : 2, "Memory" : 2000, "Command" : [ "echo", "Hello world" ] }, "RetryStrategy" : { "Attempts" : 1 }, "Tags": { "svat-version": "3.1.1" } } }, ```
1
answers
0
votes
4
views
FabreLambeau
asked 3 months ago

Add temporary drive to Batch instance

I am trying to add a temporary volume to my Batch process for a machine learning application. I have created a Launch Template with a 104G drive and specified this, along with the appropriate amazon ML docker image in the compute environment. Instance types: optimal p3.2xlarge Ec2 configuration: ECS_AL2_NVIDIA The instance launches well enough, and the launch template seems to be respected, however I do not know how to use the additional drive. ls -al /dev ``` drwxr-xr-x 5 root root 380 Jan 23 03:00 . drwxr-xr-x 1 root root 101 Jan 23 03:00 .. lrwxrwxrwx 1 root root 11 Jan 23 03:00 core -> /proc/kcore lrwxrwxrwx 1 root root 13 Jan 23 03:00 fd -> /proc/self/fd crw-rw-rw- 1 root root 1, 7 Jan 23 03:00 full drwxrwxrwt 2 root root 40 Jan 23 03:00 mqueue crw-rw-rw- 1 root root 1, 3 Jan 23 03:00 null crw-rw-rw- 1 root root 195, 0 Jan 23 02:58 nvidia0 crw-rw-rw- 1 root root 195, 255 Jan 23 02:58 nvidiactl lrwxrwxrwx 1 root root 8 Jan 23 03:00 ptmx -> pts/ptmx drwxr-xr-x 2 root root 0 Jan 23 03:00 pts crw-rw-rw- 1 root root 1, 8 Jan 23 03:00 random drwxrwxrwt 2 root root 40 Jan 23 03:00 shm lrwxrwxrwx 1 root root 15 Jan 23 03:00 stderr -> /proc/self/fd/2 lrwxrwxrwx 1 root root 15 Jan 23 03:00 stdin -> /proc/self/fd/0 lrwxrwxrwx 1 root root 15 Jan 23 03:00 stdout -> /proc/self/fd/1 crw-rw-rw- 1 root root 5, 0 Jan 23 03:00 tty crw-rw-rw- 1 root root 1, 9 Jan 23 03:00 urandom crw-rw-rw- 1 root root 1, 5 Jan 23 03:00 zero ``` lsblk ``` NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 30G 0 disk └─xvda1 202:1 0 30G 0 part /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.470.57.02 xvdb 202:16 0 104G 0 disk ``` The 104G disk is showing up as "xvdb" but there is no associated device file so I do not know how to use any of the mounting or formatting tools. I have tried changing the names and dive types in the launch template, but get exactly the same problem. I have tried adding random names and sources to the job definition "Volumes configuration" but this also does not help. How do I access this drive? Or should I take a different approach?
1
answers
0
votes
4
views
AWS-User-7211613
asked 4 months ago

OS in Batch compute environment suddenly changed?

Hi, I have multiple AWS accounts in which I use AWS Batch to run jobs. All Batch compute environments were created with the default EC2 AMI: ECS_AL1 (Amazon Linux). Here's part of the output from _aws batch describe-compute-environments_ for one of my accounts: "type": "MANAGED", "state": "ENABLED", "status": "VALID", "statusReason": "ComputeEnvironment Healthy", "computeResources": { "type": "EC2", "minvCpus": 0, "maxvCpus": 64, "desiredvCpus": 0, "instanceTypes": \[ "c4.large", "c4.xlarge", "c4.2xlarge", "c4.4xlarge", "c4.8xlarge" ], "subnets": \[ "...", "..." ], "securityGroupIds": \[ "..." ], "instanceRole": "arn:aws:iam::...", "tags": {}, "ec2Configuration": \[ { "imageType": "ECS_AL1" } ] In the past I could use yum, as you do in Amazon Linux, inside Batch jobs to install dependencies for my workloads. Yesterday I noticed that yum doesn't work anymore. I also noticed that the OS in my compute environments changed! Now it's Debian. Here's the output of _cat /etc/os-release_ which I ran inside one of my Batch jobs: PRETTY_NAME="Debian GNU/Linux 10 (buster)" NAME="Debian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" Why did the OS inside my Batch jobs change and how can I make it Amazon Linux again? For more context, I don't start Batch jobs directly. Metaflow (<https://metaflow.org/>) in a SageMaker Notebook instance does that for me.
1
answers
0
votes
0
views
zeds
asked a year ago
  • 1
  • 90 / page