Why is DataSync preparation phase slow compared to RoboCopy?

0

RoboCopy can do a full /L comparison between a Windows UNC source and a AWS FSX destination involving 75 million files that mostly exist at both ends within 131 minutes.

Why does AWS DataSync need 13-14 hours to do the same? Does it do a content comparison, byte-for-byte or checksum? If so, how can we configure DataSync to only do a metadata comparison based on filename, date, size, just like RoboCopy does?

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows                              
-------------------------------------------------------------------------------

  Started : Monday, January 2, 2023 2:01:21 PM
   Source : V:\FileServer\FileStore_Prod\NL\Tier3\ADM\
     Dest : \\<IP number edited>\share\NL\Tier3\ADM\

    Files : *.*
	    
  Options : *.* /TS /FP /NDL /L /S /E /DCOPY:DA /COPY:DAT /R:1000000 /W:30 
...
------------------------------------------------------------------------------

               Total    Copied   Skipped  Mismatch    FAILED    Extras
    Dirs :     57934       281         0         0         0         0
   Files :  74729131    599103  74130028         0         0       298
   Bytes :  10.706 t  92.138 g  10.616 t         0         0   76.77 m
   Times :   2:11:03   0:00:00                       0:00:00   2:11:03
   Ended : Monday, January 2, 2023 4:12:25 PM

asked a year ago926 views
4 Answers
0

Hello,

You can refer to this FAQ and verification options to choose the proper option for your environment.

  1. FAQ
  • Q: How does AWS DataSync ensure my data is copied correctly?
  • A: As AWS DataSync transfers and stores data, it performs integrity checks to ensure the data written to the destination matches the data read from the source. Additionally, an optional verification check can be performed to compare source and destination at the end of the transfer. DataSync will calculate and compare full-file checksums of the data stored in the source and in the destination. You can check either the entire dataset or just the files or objects that DataSync transferred.
  1. Verification options
  • During a transfer, AWS DataSync always checks the integrity of your data, but you can specify how and when this verification happens with the following options:
  • Verify only the data transferred (recommended) – DataSync calculates the checksum of transferred files and metadata at the source location. At the end of the transfer, DataSync then compares this checksum to the checksum calculated on those files at the destination. We recommend this option when transferring to S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes. For more information, see Storage class considerations with Amazon S3 locations.
  • Verify all data in the destination – At the end of the transfer, DataSync scans the entire source and destination to verify that both locations are fully synchronized. You can't use this option when transferring to S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes. For more information, see Storage class considerations with Amazon S3 locations.
  • Check integrity during transfer – DataSync doesn't run additional verification at the end of the transfer. All data transmissions are still integrity-checked with checksum verification during the transfer.
AWS
answered a year ago
  • Thanks for responding SeungYong,

    However, this information still doesn't answer my question. Maybe I should phrase it better:

    Why does the Preparation Phase of the AWS DataSync take SIX times longer than RoboCopy /L to do the same? I.e. where both only do an existence and metadata comparison of the same source and destination?

    DataSync Preparation Phase: 13-14 hours RoboCopy /L: 2 hours 10 minutes

    Regards, Nick

0

Hello, As you already know about the "/L" option of robocopy, it does compare just the list of data between source and destination location.

/L :: List only - don't copy, timestamp or delete any files.(quoted from robocopy help page)

So, I think you should consider adjusting the AWS DataSync option to get a similar result.

https://docs.aws.amazon.com/datasync/latest/userguide/API_Options.html

Regards, SeungYong

AWS
answered a year ago
  • Thanks, but DataSync does NOT offer an option to disable content/checksum verification during the Preparation phase. So what you suggest is currently NOT possible.

0

Rephrasing my question:

Why does the Preparation Phase of the AWS DataSync take SIX times longer than RoboCopy /L to do the same? I.e. where both only do an existence and metadata comparison of the same source and destination?

DataSync Preparation Phase: 13-14 hours RoboCopy /L: 2 hours 10 minutes

answered a year ago
0

Hello,

Did you check the requirements of the DataSync agent? I think you should consider multiple agents for files of 75 millions.

https://docs.aws.amazon.com/datasync/latest/userguide/agent-requirements.html

Virtual machine requirements

When deploying a DataSync agent on-premises, the agent VM requires the following resources:

  • Virtual processors: Four virtual processors assigned to the VM.
  • Disk space: 80 GB of disk space for installation of VM image and system data.
  • RAM: Depending on your transfer scenario, choose one of the following:
    • 32 GB of RAM assigned to the VM for tasks that transfer up to 20 million files.
    • 64 GB of RAM assigned to the VM for tasks that transfer more than 20 million files.

Regards,

SeungYong

AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions