Feature Requirement: Automatically sync when folder content changes

0

Hello,

Does anyone know if a future DataSync update will allow the agent to be 'always on' or automatically run tasks? This will help decide if it fits my use case.

Thanks,
Carter

Edited by: Carter11 on Feb 7, 2019 5:36 PM

Edited by: Carter11 on Feb 7, 2019 5:37 PM

asked 5 years ago273 views
3 Answers
0

Hi Carter,
Thanks for reaching out. DataSync currently doesn't automatically re-run tasks. However, you can use the built in cron functionality in CW Events (https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html) and run a DataSync task periodically. Each run will be incremental and will only copy new/modified files. Feel free to contact support or send me a private message if you'd like more assistance with this. In addition, I've noted your suggestion and we'll consider it as we plan future features for the service.
Regards,
Olga Kogan
AWS DataSync

answered 5 years ago
0

Just adding my +1 here. Real-time, continuous replication of on-premises file data to the cloud for processing is our core use case and seems to be what you propose with AWS DataSync on your home page for the service:

Data processing for hybrid workloads
If you have on-premises systems generating or using data that needs to move into or out of AWS for processing, you can use DataSync to accelerate and schedule the transfers. It can help speed up critical hybrid cloud workflows in industries that need to move active files into AWS quickly, including ... machine learning in life science.

We need to process lab instrumentation real-time with machine learning and it's dissapointing to find this use case not well-supported, i.e. having to trigger the transfer task every minute via a separate AWS cron-like service. It would be much preferable to have this managed by the agent as part of task configuration.

dtlhlbs
answered 5 years ago
0

Thanks for your feedback.

Can you elaborate on what "real time" means to you? What would the impact to your process be of, say, a 1 minute delay? Bearing in mind, there is always going to be some delay due to the time taken to transfer data from your lab instruments over a network to AWS.

How would you expect DataSync to identify the the data being generated by your lab instruments? Where do they put this data today? Or would you prefer to "push" this data into DataSync?

How does your machine learning processes run? How would you like to start them running? Would a CloudWatch Event be sufficient?

Thanks
Paul

AWS
Paul_R
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions