- Newest
- Most votes
- Most comments
Hello.
As stated in the AI's automated response, I thought there would be no problem as long as the data was stored in "High-performance storage".
https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-performance.html#s3-files-performance-how
High-performance storage – The low-latency storage layer within your file system where actively used file data and metadata reside. S3 Files automatically manages this storage, copying data onto it when you access files and removing data that has not been read within a configurable expiration window. You pay a storage rate for data residing on the high-performance storage.
However, as you suspect, if the file size is 1MB or larger, it will be streamed from S3, so depending on when your application accesses it, you might see data that's up to 60 seconds old.
I think verification is necessary, but for example, in a pipeline configuration where job A writes a file and job B immediately retrieves that object via the S3 API, I think there's a possibility that job B might be referencing outdated data.
I tried it with my AWS account, but no errors occurred.
However, the result may change for large files such as 10GB and 100GB.
I think it would be better to use S3 event notifications to detect object creation before triggering subsequent actions.
Based on the S3 Files architecture, the 60-second writeback delay should not cause problems for your use case, even when reading large files that were recently written by another node.
Here's why: When a file is written to S3 Files, the write goes immediately to the high-performance storage layer and is durable right away. The 60-second delay only applies to when those changes are copied back to the underlying S3 bucket. This batching mechanism is designed to aggregate successive changes and reduce S3 request costs.
For your specific questions:
-
Reading >1MiB files written by another node: You should not encounter read errors. Even though the documentation states that large files (>1MiB) are streamed directly from S3, this applies when data is not stored in the file system's high-performance storage. However, when a file is actively written, it resides in the high-performance storage layer. Other nodes accessing the same S3 Files file system can read this data from the high-performance storage before it's synchronized back to S3. The file system maintains consistency across all connected compute instances.
-
Reading/writing within the same node: This should work seamlessly. Since writes are immediately durable in the high-performance storage, subsequent reads from the same node will access the current data without any issues.
-
Workarounds: Based on the architecture described, you shouldn't need workarounds for your use case. S3 Files is designed to handle multiple compute resources (up to 25,000) simultaneously accessing the same file system, which aligns well with your AWS Batch deployment pattern where multiple EC2 instances need to share data.
The key point is that S3 Files keeps the file system synchronized and consistent across all connected compute resources, with the 60-second window only affecting when changes propagate to the underlying S3 bucket, not when they're visible to other nodes through the file system interface.
Sources
Performance specifications - Amazon Simple Storage Service
How S3 Files is metered - Amazon Simple Storage Service
Amazon S3 Files
Working with Amazon S3 Files - Amazon Simple Storage Service
Relevant content
- asked a year ago
- asked a month ago
- AWS OFFICIALUpdated a year ago

Hi, thanks for the quick response. My application is not typically deployed on AWS, so building out more application logic for responding to aws event notifications would be a significant obstacle towards using s3Files. This could work for us if there was a setting that would make filesystem writes wait for s3 files to finish uploading >1MB files to the s3 bucket before completing, so that subsequent reads of the file that gets streamed directly from s3 would be consistent.
To clarify, in my case both jobs would be accessing the same s3 files filesystem, not directly through the s3 API. I was concerned because in the documentation it says that when accessing large files through s3 files, the files are streamed from s3 "even when the data also resides on the file system's high-performance storage", which made me wonder if it may still work as long as all readers of the files are using s3files.
Could you go into more detail what you tried? Was it the case that sometimes the write and subsequent read of a >1MB file from s3 by different processes worked without issue?