- Newest
- Most votes
- Most comments
Hello,
You can refer to this FAQ and verification options to choose the proper option for your environment.
- FAQ
- Q: How does AWS DataSync ensure my data is copied correctly?
- A: As AWS DataSync transfers and stores data, it performs integrity checks to ensure the data written to the destination matches the data read from the source. Additionally, an optional verification check can be performed to compare source and destination at the end of the transfer. DataSync will calculate and compare full-file checksums of the data stored in the source and in the destination. You can check either the entire dataset or just the files or objects that DataSync transferred.
- Verification options
- During a transfer, AWS DataSync always checks the integrity of your data, but you can specify how and when this verification happens with the following options:
- Verify only the data transferred (recommended) – DataSync calculates the checksum of transferred files and metadata at the source location. At the end of the transfer, DataSync then compares this checksum to the checksum calculated on those files at the destination. We recommend this option when transferring to S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes. For more information, see Storage class considerations with Amazon S3 locations.
- Verify all data in the destination – At the end of the transfer, DataSync scans the entire source and destination to verify that both locations are fully synchronized. You can't use this option when transferring to S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes. For more information, see Storage class considerations with Amazon S3 locations.
- Check integrity during transfer – DataSync doesn't run additional verification at the end of the transfer. All data transmissions are still integrity-checked with checksum verification during the transfer.
Hello, As you already know about the "/L" option of robocopy, it does compare just the list of data between source and destination location.
/L :: List only - don't copy, timestamp or delete any files.(quoted from robocopy help page)
So, I think you should consider adjusting the AWS DataSync option to get a similar result.
https://docs.aws.amazon.com/datasync/latest/userguide/API_Options.html
Regards, SeungYong
Thanks, but DataSync does NOT offer an option to disable content/checksum verification during the Preparation phase. So what you suggest is currently NOT possible.
Rephrasing my question:
Why does the Preparation Phase of the AWS DataSync take SIX times longer than RoboCopy /L to do the same? I.e. where both only do an existence and metadata comparison of the same source and destination?
DataSync Preparation Phase: 13-14 hours RoboCopy /L: 2 hours 10 minutes
Hello,
Did you check the requirements of the DataSync agent? I think you should consider multiple agents for files of 75 millions.
https://docs.aws.amazon.com/datasync/latest/userguide/agent-requirements.html
Virtual machine requirements
When deploying a DataSync agent on-premises, the agent VM requires the following resources:
- Virtual processors: Four virtual processors assigned to the VM.
- Disk space: 80 GB of disk space for installation of VM image and system data.
- RAM: Depending on your transfer scenario, choose one of the following:
- 32 GB of RAM assigned to the VM for tasks that transfer up to 20 million files.
- 64 GB of RAM assigned to the VM for tasks that transfer more than 20 million files.
Regards,
SeungYong
Relevant content
- Accepted Answerasked 7 months ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
Thanks for responding SeungYong,
However, this information still doesn't answer my question. Maybe I should phrase it better:
Why does the Preparation Phase of the AWS DataSync take SIX times longer than RoboCopy /L to do the same? I.e. where both only do an existence and metadata comparison of the same source and destination?
DataSync Preparation Phase: 13-14 hours RoboCopy /L: 2 hours 10 minutes
Regards, Nick