Questions tagged with Amazon S3 Glacier
Content language: English
Sort by most recent
AWS S3 Glacier Flex
If I create a lifecycle policy rule of 2 days on my S3 bucket to move that data to S3 Glacier Flex for Storage. 1. Will that rule move empty parent directories created by rsync to glacier flex and then charge me for filling those directories with new objects if rsync only fills those parent directories with files 2 days after the parent directory has been created? 2. Or does the flat structure of the S3 bucket prevent that?
Cannot see files in S3/Glacier when using Synology Glacier app for backup
Hi I'm using my new Synology NAS and its app Glacier to backup my files to AWS Glacier. Everything is fine. Glacier vaults are in place and files are in there. I was curious to see how Synology use S3 Glacier. So, I tried to find the corresponding S3 bucket used by Synology App.... but there is no bucket ... Question : How can I have all my files in Amazon Glacier and not having a S3 bucket with my files ? Thanks
Does changing a s3 lifecycle policy change all older objects in an s3 bucket?
I have an s3 life cycle policy using intelligent tiering where objects go to archive after 90 days and then transition to deep glacier after the next 90. And then remain in deep glacier until the retention policy hits at 305 total days. I'd like to remove the deep glacier rule and let the stuff thats currently on deep glacier age out. But if I remove the deep glacier transition, I'm concerned that all objects already on deep glacier will then hit the archive rule since technically they are over 90 days old, after which all those older objects will transition from deep glacier back to the archive storage tier. So the question is, what's the behavior: Will only objects turning 90 days old transition to the archive tier and all deep glacier objects can age out, or will ALL objects currently in deep glacier transition back to archive?
Right directory structure to optimize read throughput
We are trying to determine how to organize our S3 bucket for optimizing Read operation for this use case. We have a daily job that will write few single digit million files (each file less than 1 MB) to the bucket. The read pattern would be spread throughout the day with parallel requests to read one different file per request. We are thinking to choose between 3 options 1. Write job creates new directory for each daily run in the bucket. Distributes few files in the daily directory into 36 sub-directories hashed by last character in alphanumeric file name (26 characters + 10 digits) 2. The bucket contain 36 directories and each directory contains new directory for each daily run. 3. Manually create 36 partitions and find a way to randomly distribute data (if at all possible) Which of these options will provide maximum throughput for randomly accessing each file? We are looking at the docs here https://docs.aws.amazon.com/AmazonS3/latest/userguide/organizing-objects.html Remember the read traffic on a particular day will be interested in reading data for a single day's run. So for option 1 will route all daily read requests to single directory which will limit throughput to 5500 as mentioned in the best practices https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html Which options here will give us the highest throughput?
How to batch delta json files from S3 follwing standart Kinesis Firehose partition?
Hello, I am using Kinesis Firehose and saving the raw streamed data into json files in S3. I am using the standard firehose partition <stream_name>/YYYY/MM/DD/HH. For the data that is really urgent as soon as the file is saved into s3 a lambada function is triggered to process the data in the file. Other data doesn't have the same urgency so we can do batches every 5 or 10 minutes. My question is related to the data that can be processed in batches. I don't know what processing strategy or methodology I should implement so every time the batch runs it will only process the json files that have not been processed before. For example it is 2022-01-28 14:15:00 . We have 2 files in the same partition. My process runs and loads those 2 files. Then at 2022-01-28 14:25:00 the process runs again and there are 3 files in the partition. The previous batch already processed 2 of those files so the new batch should only process one file. How can I know which files were already processed so I don't read them again on my next batch? I was planning to use Airflow to schedule some spark jobs to do the batch processing. What tool or technology would you recommend for doing this kind of batches?
Your AWS CloudShell data is scheduled for deletion
I received an email with this title. And it continues with "Some users of this account haven’t used AWS CloudShell for over 110 days in the...". I have no idea what this means or what I need to do. I use Amazon Glacier for backing up my NAS and make no other use of AWS. Is this a scam of some sort?
S3 Glacier (2013) Flexible running without Bucket, want to move to Glacier Deep Archive
I have several S3 Glacier legacy /now Flexible (2013) Vaults running without any apparent Buckets attached. I want to move to S3 Glacier Deep Archive without having to offload/reload all of my data. Is there a way to accomplish this? It seems today Glacier must run via a Bucket? Yes, I am a newbie. I access my data via FastGlacier application.
Local Zip Files Transferred to S3
I have a number of local files sitting on a Windows 10 desktop in .Zip format that I am needing uploaded to a S3 bucket. I am comfortable with Python (preferred language) and would be open to learning Node.js or JS for a part of the solution. Guidance on this is appreciated.
Small Scale VOD Streaming - Am I looking in the right place?
Hi, I am learning AWS as I go... I work for a small company that would like to sell video subscriptions. I am looking into MediaConvert, MediaPackage and MediaStore with the plan to embed the output on our website in a secure way (users unable to download). If I had to guess, I would say we have about 20 hours of videos and would stream to about 30 users a month. Am I looking in the right spot or is this overkill?
glacier.us-east-1.amazonaws.com timeouts during retrieval
I'm trying to DOWNLOAD files from Glacier vaults at us-east-1 . RETRIEVE is submitted successfully, but after 4 hours, when I try to download the files, https and http connections to glacier.us-east-1.amazonaws.com timeout after about 60 seconds. Retrieval/download jobs at us-west-1 work flawlessly. sample request below: "GET /-/vaults/vaultname/jobs/XXX/output HTTP/1.1\r TE: deflate,gzip;q=0.3\r Connection: TE, close\r Authorization: AWS4-HMAC-SHA256 Credential=X/20220414/us-east-1/glacier/aws4_request, SignedHeaders=host;x-amz-date;x-amz-glacier-version, Signature=X\r Host: glacier.us-east-1.amazonaws.com\r User-Agent: agent string\r X-Amz-Date: 20220414T205439Z\r X-Amz-Glacier-Version: 2012-06-01\r \r "