Global outage event
If you're experiencing issues with your AWS services, then please refer to the AWS Health Dashboard. You can find the overall status of ongoing outages, the health of AWS services, and the latest updates from AWS engineers.
Choosing an S3 connector for ML training with S3 Express One Zone
A field guide for PyTorch, TensorFlow, Spark, and Kubernetes workloads reading training data from Amazon S3 Express One Zone directory buckets.
The problem
You're training an ML model on AWS and you've chosen Amazon S3 Express One Zone for its single-digit millisecond latency and high request rates. Your data is in a directory bucket. Now you need to connect your training framework to it.
This guide covers which connector to use, what Express-specific configuration each one requires, and the mistakes that will cost you time.
One universal requirement: every connector needs s3express:CreateSession in its IAM policy. Standard managed policies (AmazonSageMakerFullAccess, AmazonElasticMapReduceforEC2Role, etc.) do not include it. The IAM section has the full policy template.
Decision matrix
| Scenario | Connector | Notes |
|---|---|---|
| PyTorch training (EC2 or SageMaker) | S3 Connector for PyTorch | CRT handles CreateSession transparently |
| TensorFlow / JAX on EC2 | Mountpoint for S3 | Mount the directory bucket directly |
| Any framework on EKS | Mountpoint CSI driver v2+ | Set bucketName to directory bucket |
| Spark / Hive / Flink on EMR | S3A (not EMRFS) | Requires Express-specific cluster config; EMR 6.15+ Magic Committer enables writes |
| SageMaker training job | Native S3Uri | Role needs s3express:CreateSession |
| Custom pipelines | boto3 | Transparent, no code changes |
| Distributed checkpoints | S3 Connector for PyTorch or Mountpoint | Express sub-10ms writes reduce checkpoint overhead |
| Shared read cache across nodes | Mountpoint --cache-xz | Uses a separate directory bucket as cache |
The connectors
S3 Connector for PyTorch
The Amazon S3 Connector for PyTorch is a PyTorch-native library backed by the AWS Common Runtime (CRT). The CRT handles CreateSession for Express transparently, point s3_uri at the directory bucket and go.
This is the best throughput option for PyTorch. If you need the same data accessible to non-PyTorch tools, use Mountpoint instead.
Mountpoint for Amazon S3
Mountpoint for Amazon S3 is an open-source file client that mounts an S3 bucket as a Linux filesystem. It supports directory buckets directly, use the full bucket name with the --<az>--x-s3 suffix. Framework-agnostic: TensorFlow tf.data, JAX, HuggingFace datasets, NumPy, and OpenCV all work.
Two caching modes with Express:
--cache /mnt/local-cache-local disk cache on the instance--cache-xz <cache-directory-bucket>-shared cache backed by a separate Express directory bucket, useful for multi-node clusters reading the same data (objects ≤1 MB only)
Linux-only. Does not provide full POSIX semantics (no file locking, atomic rename, or hard links).
Mountpoint CSI driver (Amazon EKS)
The Mountpoint CSI driver is the Kubernetes version of Mountpoint, available as an Amazon EKS add-on. Set bucketName in volumeAttributes to the directory bucket name. The v2 driver uses EKS Pod Identity for IAM.
It's not supported on Fargate, EKS Hybrid Nodes, or Windows containers.
S3A (Apache Hadoop connector) on Amazon EMR
The only Hadoop-ecosystem connector that supports S3 Express. EMRFS does not support Express, if you're on Amazon EMR and want Express, you must use S3A with s3a:// URIs, even though EMRFS is the default.
Requires specific cluster configuration (see code snippet below) because Express ETags aren't MD5 checksums, region resolution doesn't auto-detect, and S3 Select isn't supported.
Writes: EMR 6.15+ uses Spark's MagicCommitProtocol (backed by MagicV2Committer) as the default FileCommitProtocol for s3a://. Magic Committer is rename-free — it publishes outputs via S3 multipart upload completion, which Express supports — so reads and writes to a directory bucket both work. Don't override the default committer. Older EMR releases, or self-managed Spark on Apache Hadoop older than 3.4.0, default to FileOutputCommitter, which can fail on Express with InvalidStorageClass.
Requires EMR 6.15+ (Spark) or 7.2+ (HBase/Flink/Hive). EMR 7.10+ ships the Amazon EMR S3A connector with native Express support by default.
Amazon SageMaker training jobs
Amazon SageMaker File mode, FastFile mode, and Pipe mode all support Express directory buckets as input. Point S3Uri at the directory bucket. The training role needs s3express:CreateSession because AmazonSageMakerFullAccess alone is not sufficient.
Encryption note: SageMaker output written to a directory bucket can only use SSE-S3. SSE-KMS is not supported for SageMaker output to Express. Input staging is unaffected.
AWS SDK (boto3)
Current SDK versions handle CreateSession transparently for directory buckets. Use the directory bucket name as the Bucket parameter, no other changes needed. Good for custom data pipelines and preprocessing/embedding jobs.
Code snippets
Every snippet below was tested against a live S3 Express One Zone directory bucket. They include fixes for API changes (pytorch_lightning 2.x) and required configuration not present in default setups (S3A on EMR, SageMaker IAM).
S3 Connector for PyTorch
from s3torchconnector import S3MapDataset from torch.utils.data import DataLoader dataset = S3MapDataset.from_prefix( s3_uri="s3://my-bucket--use1-az4--x-s3/imagenet/train/", region="us-east-1", ) loader = DataLoader( dataset, batch_size=256, num_workers=8, shuffle=True, collate_fn=lambda batch: batch, # S3Reader objects; decode to tensors in your transform ) for batch in loader: images = [item.read() for item in batch] # bytes; decode with PIL, cv2, etc. ...
Checkpoint saving with PyTorch Lightning 2.x:
from s3torchconnector.lightning import S3LightningCheckpoint # Lightning 2.x API—do NOT use plugins=[S3LightningCheckpoint(...)] # or default_root_dir="s3://..." (both fail in 2.x). trainer = pl.Trainer( default_root_dir="/tmp/checkpoints", # local path for trainer internals enable_checkpointing=False, # disable auto ModelCheckpoint max_epochs=10, ) trainer.strategy.checkpoint_io = S3LightningCheckpoint(region="us-east-1") trainer.fit(model) # Save explicitly to the directory bucket trainer.save_checkpoint("s3://my-bucket--use1-az4--x-s3/run-42/model.ckpt")
Mountpoint for S3
Mount a directory bucket:
sudo mount-s3 my-bucket--use1-az4--x-s3 /mnt/s3 \ --allow-delete \ --allow-overwrite \ --region us-east-1
Mount a general-purpose bucket with a shared Express cache:
# --cache-xz requires a SEPARATE directory bucket as the cache target. # Only objects <=1 MB are cached in the shared cache. sudo mount-s3 my-training-bucket /mnt/s3 \ --cache-xz my-cache-bucket--use1-az4--x-s3 \ --read-only \ --region us-east-1
For read-only training data, omit --allow-delete and --allow-overwrite. For write workloads (checkpointing), both flags are required—without them, writes return Operation not permitted even as root.
Mountpoint CSI driver on EKS (PVC)
apiVersion: v1 kind: PersistentVolume metadata: name: s3-express-training-data spec: capacity: storage: 1200Gi accessModes: - ReadOnlyMany # use ReadWriteMany for checkpoint writes mountOptions: - region us-east-1 # add allow-delete and allow-overwrite here for write workloads csi: driver: s3.csi.aws.com volumeHandle: s3-express-training-data volumeAttributes: bucketName: my-bucket--use1-az4--x-s3 # directory bucket name
The CSI driver's IAM role needs s3express:CreateSession on the directory bucket ARN. Use EKS Pod Identity (v2 driver) rather than IRSA.
S3A on EMR for S3 Express
[ { "Classification": "core-site", "Properties": { "fs.s3a.aws.credentials.provider": "software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider", "fs.s3a.change.detection.mode": "none", "fs.s3a.endpoint.region": "us-east-1", "fs.s3a.select.enabled": "false" } }, { "Classification": "spark-defaults", "Properties": { "spark.sql.sources.fastS3PartitionDiscovery.enabled": "false" } } ]
Then use s3a:// URIs in your Spark jobs:
df = spark.read.format("binaryFile").load("s3a://my-bucket--use1-az4--x-s3/dataset/shards/")
Why each setting matters:
change.detection.mode=none-Express ETags are not MD5 checksums; default change detection breaksendpoint.region-region resolution doesn't auto-detect for Expressselect.enabled=false-S3 Select is not supported on ExpressfastS3PartitionDiscovery.enabled=false-uses an S3 API parameter not supported by Express
SageMaker training job
import boto3 sm = boto3.client("sagemaker", region_name="us-east-1") sm.create_training_job( TrainingJobName="my-express-training-job", AlgorithmSpecification={ "TrainingImage": "...", "TrainingInputMode": "File", # or FastFile }, RoleArn="arn:aws:iam::123456789012:role/my-sagemaker-role", InputDataConfig=[{ "ChannelName": "training", "DataSource": { "S3DataSource": { "S3Uri": "s3://my-bucket--use1-az4--x-s3/dataset/", "S3DataType": "S3Prefix", "S3DataDistributionType": "FullyReplicated", } }, }], ... )
File mode copies the full dataset to local storage before training starts. Use it when your algorithm needs random access to the entire dataset or when the dataset fits comfortably on the instance volume. FastFile mode streams from S3 on demand with a file system interface. Use this for large datasets or when you want training to start immediately. FastFile has replaced Pipe mode, which is still supported but no longer recommended.
IAM requirements
Every connector needs s3express:CreateSession on the directory bucket ARN. The ARN format is:
arn:aws:s3express:<region>:<account-id>:bucket/<bucket-name>
Object-level operations (s3:GetObject, s3:PutObject, etc.) use standard IAM actions against the same ARN.
{ "Statement": [ { "Effect": "Allow", "Action": "s3express:CreateSession", "Resource": "arn:aws:s3express:us-east-1:123456789012:bucket/my-bucket--use1-az4--x-s3" }, { "Effect": "Allow", "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"], "Resource": "arn:aws:s3express:us-east-1:123456789012:bucket/my-bucket--use1-az4--x-s3/*" }, { "Effect": "Allow", "Action": "s3:ListBucket", "Resource": "arn:aws:s3express:us-east-1:123456789012:bucket/my-bucket--use1-az4--x-s3" } ] }
Common mistakes
Using EMRFS with Express. EMRFS does not support Express. On EMR, s3:// URIs without the S3A config route through EMRFS and fail or silently fall back to general-purpose S3 behavior.
Missing s3express:CreateSession. Standard managed policies don't include it. The IAM template above covers what you need.
Wrong AZ. Express is single-AZ. If your training instances are in a different AZ than the directory bucket, you reduce the latency benefit (~4ms in-AZ vs ~10ms cross-AZ). Cross-AZ data transfer within the same region is free for Express, so it's a latency cost but not a dollar cost. Check AZ IDs, not AZ names (names are account-specific, IDs are not).
--cache-xz with the same bucket. The cache target must be a separate directory bucket. Same bucket for data and cache is not supported.
--cache-xz size limit. Only objects ≤1 MB are cached in the shared Express cache. For large shards, use local disk caching (--cache) instead.
S3A writes returning InvalidStorageClass on older EMR or self-managed Spark. FileOutputCommitter performs a directory rename at job commit that Express rejects. On EMR 6.15+ this is handled automatically — Spark's default is MagicCommitProtocol → MagicV2Committer, which is rename-free. On pre-6.15 EMR or Apache Hadoop older than 3.4.0, upgrade; those releases add Express-aware committer support.
Lightning 2.x checkpoint API. The plugins=[S3LightningCheckpoint(...)] pattern and default_root_dir="s3://..." both fail in Lightning 2.x. See the Lightning snippet for the working 2.x pattern.
aws s3 sync with directory buckets. Not supported. Use aws s3api copy-object per object, or script a list-and-copy loop.
num_workers=0 in the PyTorch DataLoader. The S3 Connector benefits from multiple workers. Single-threaded loading bottlenecks regardless of connector throughput.
In conclusion: Access patterns matter more than connector choice
The biggest lever for ML training throughput isn't the connector, it's whether you read sequentially or randomly.
- Sequential reads (large sharded files, many samples per GET) are high throughput and latency-tolerant. All connectors perform well; Express's latency advantage is less pronounced.
- Random reads (millions of small files, one GET per sample), are latency-bound. Express's sub-10ms latency even across AZs helps significantly vs. general-purpose S3.
Two distinct patterns to keep in mind:
- Training data loading (sequential batch reads): shard your dataset into larger files (WebDataset tar shards, TFRecords, Parquet) before picking a connector. Millions of individual GETs, even at 4ms each, add up. Fewer, larger files mean fewer requests and higher throughput on Express.
- High-TPS random access (checkpoint writes, metadata lookups, small-object key-value patterns): this is where Express shines. Each request completes in ~4ms and a single directory bucket can handle hundreds of thousands of TPS without prefix partitioning. Don't shard these datasets, instead take the fast access to individual objects.
Happy training.
References
- Applying data loading best practices for ML training with Amazon S3 clients
- Amazon S3 Connector for PyTorch on GitHub
- Configuring and using Mountpoint
- Mountpoint CSI driver for EKS
- Mountpoint CSI driver v2 announcement
- S3A configuration for Express on EMR
- EMR Spark MagicCommitProtocol
- S3A MagicV2 Committer (EMR)
- SageMaker training data access
- Checkpoint storage for large-scale ML training
- S3 Express One Zone endpoints and AZ IDs
- Language
- English
Relevant content
- asked 2 years ago
- Accepted Answerasked 2 years ago
- Accepted Answerasked a year ago
