Questions tagged with AWS DataSync
Content language: English
Sort by most recent
Hello All,
Today I was trying to copy a directory from one location to another, and was using the following command to execute my copy.
aws s3 s3://bucketname/directory/ s3://bucketname/directory/subdirectory --recursive
The copy took overnight to complete because it was 16.4TB in size, but when I got into work the next day, it was done, or at least it had completed.
But when I do a compare between the two locations I get the following
bucketname/directory/ 103,690 objects - 16.4TB
bucketname/directory/subdirectory/ 103,650 - 16.4TB
So there is a 40 object difference between the source location and the destination location.
I tried using the following command to copy over the files that were missing
aws s3 sync s3://bucketname/directory/ s3://bucket/directory/subdirectory/
which returned no results. It sat for a while maybe like 2 minutes or so, and then just returned to the next line.
I am at my wits end trying to copy of the missing objects, and my boss thinks that I lost the data, so I need to figure out a way to get the difference between the source and destination copied over.
If anyone could help me with this, I would REALY appreciate it. I am a newbie with AWS, so I may not understand everything that I am told, but I will try anything to get this resolved. I am doing all the commands through an EC2 instance that I am ssh into, and then use AWS CLI commands.
Thanks to anyone who might be able to help me.
Take care,
-Tired & Frustrated :)
We want to create a 2-way replication between AWS FSx and an on-premise Windows file server. Can we do this using AWS DataSync or DFS Replication?
The purpose is to use AWS FSx as active-active DR solution for an on-premises Windows file server.
From what I found so far, AWS DataSync does not support bi-directional replication. As for DFS Replication, it does not support file locking so multiple people can't work on the same files simultaneously.
After using DataSync to copy files from S3 to EFS we can see that the files have the permissions of:
-rwxr-xr-x 1 nfsnobody 65534 (when copy Ownership and copy Permissions is enabled)
or
-rw-r--r-- 1 root root (when copy Ownership and copy Permissions is disabled)
From Lambda we can see our current user is:
sbx_user1051 with uid: 993, gid: 990
Because of this we're unable to delete any files copied from DataSync. Even when we change our access point, posix user we still get a permission error when trying to delete files.
Is there a better way to programmatically clear files on EFS?
I am looking to transfer 200TB of on prem files to EFS IA directly or in the most optimized way. Few questions:
1)Is snowball into s3 cheaper than datasync?
2)Can snowball be used to copy files directly to efs?
3)What is the data journey on the most optimized approach? Example : from on prem to s3. From s3 to efs using data sync.
Thanks
Hi -
Attempting to test a local environment with AWS Datasync
Running AWS Datasync Agent in Windows Hyper-V
Created the agent record in AWS console and registered the AgentID
NFS is running locally using: haneWIN NFS Server (https://www.hanewin.net/doc/nfs/nfsd.htm)
I have some NFS shares that I can mount successfully in a windows cmd prompt.
Every time I create a task, I get a Task Status of "Unavailable"
*mount.nfs: Connection timed out*
How can I fix this?
If DataSync is configured with an on-prem NFS mount as the source, and EFS as the target, will changes to the EFS volume be reflected on-prem?
Hi Experts,
I need to to change an EFS filesystem from General Performance to Max I/O. The only way to do this is to create a new FS and use DataSync to copy the data. EFS is the underlying storage for my EKS cluster.
My approach is to:
1. create the new FS.
2. copy the data with DataSync.
3. Stop services on the EKS cluster.
4. Change the mount points on the CSI Driver.
5. Restart application services.
Is the best approach? Or are there other documented steps to change EFS filesystems with an EKS cluster.
Thanks in advance.
Customer wants to migrate 900 TB of unstructured data( file data) and this data continues to change/update.
We are thinking to use snowball for the data migration and datasync to sync the incremental data after the migration.
Can we use DataSync to transfer any incremental changes after the migration (something like DMS used to migrate database CDC) ? what are the other options to sync the incremental changes ?
Hi Everyone,
I tried to do remote re-indexing(Both domain's are in Opensearch 1.1 ) and received below error message.
{
"error" : {
"root_cause" : [
{
"type" : "null_pointer_exception",
"reason" : null
}
],
"type" : "null_pointer_exception",
"reason" : null
},
"status" : 500
}
Originally I thought _source = false is set in my indexes, so I tested on new indexes that their _source was explicitly set to true , but still was receiving the same error message.
PUT my-index-000002
{
"mappings": {
"_source": {
"enabled": true
}
}
}
I would appreciate if you help me on this.
Thanks
Aspet
I have my HDFS setup on a Azure VM and datasync agent setup on a different Azure VM and enabled communication between two VMS. My ultimate goal is to transfer data from HDFS to AWS s3. I have configured and activated the datasync agent and have connected the datasync agent using AWS public end points . I have tested network connectivity to public end points and self managed storage i.e. HDFS here. The connectivity showed PASSED for both. But when I create a task using the activated agent and keeping the source as HDFS and S3, it is just throwing me error as " input/output error cannot read source file ". Can you please let me know how I can fix this issue.
I'm trying to clone a number of buckets between two AWS accounts. Of the 6 buckets that are being copied, 2 consistently fail to sync with the error:
```
The endpoint has already opened a volume for this session
```
The sync task starts and goes into a "Preparing" state, it then stops here for up to a few hours, then just errors.
The two buckets that fail are the largest buckets from this batch (one is 10 TB the other is 1 TB), but object count looks like it shouldn't be exceeding the quota for a task (12.3 million and 7.6 million respectively).
I've tried various combinations of only synchronising some directories and turning off only synchronising changes, neither has helped. I've also tried completely recreating the synchronisation locations and tasks and this also didn't make any impact.
Looking at the logs, I see a normal startup sequence:
```
[INFO] Request to start task-xxx.
[INFO] Execution exec-xxx started.
[INFO] Started logging in destination hostId: host-xxx for Execution exec-xxx
[INFO] Started logging in destination hostId: host-xxx for Execution exec-xxx
```
Then no further log messages are recorded.
Has anyone had any similar experiences with s3-s3 sync? or any ideas what could be going wrong?
I am consistently unable to start a Datasync task to transfer from a local NAS (Synology) to s3.
Tried dozen of times with slight changes, NFS versions, squash options, agent reboot, etc... with no success.
The tasks all have the unavailable status, with an error message "mount.nfs: access denied by server while mounting..."
Setup:
- DataSync agent running as a KVM VM on ubuntu 20.04 LTS
- DataSync agent connected
- NFS share on a Synology NAS on the local network
- NFS share permissions provided for the agent IP (and even for the local network)
- DataSync agent connectivity and NFS connectivity tests are all successful
- mounting the NFS share on ubuntu 20.04 LTS works perfectly
Could you please help? What is going on?