Skip to content

Optimizing AWS DataSync Costs: A Strategic Guide to Transfer Modes and Amazon S3 Storage Classes

19 minute read
Content level: Advanced
0

AWS DataSync is a powerful migration tool, but understanding its cost implications is crucial. By selecting the right transfer modes, configuring verification options, choosing appropriate S3 storage classes, and optimizing sync frequency, you can reduce costs while maintaining data integrity. Match your configuration to your needs—dataset size, file count, change rate, and performance requirements—to stay within budget.

Optimizing AWS DataSync Costs: A Strategic Guide to Transfer Modes and Amazon S3 Storage Classes

by Judith Mettoudi and Noam Lichtblau


Organizations worldwide are increasingly adopting AWS DataSync to migrate large volumes of data from on-premises environments to Amazon S3. With the right configuration and preparation, you can ensure a smooth, cost-efficient migration and get the most value out of your investment.

In this blog post, we'll walk you through several best practices to optimize your DataSync migration and keep your costs under control from day one.

Introduction

AWS DataSync is widely adopted by customers for migrating on-premises data to Amazon S3. When planning a migration, it can be an invaluable tool for efficiently transferring large datasets. However, poor configuration of DataSync can trigger unplanned additional costs—sometimes to the point where the migration itself costs more than the storage.

In order to prevent this, it's important to understand how DataSync interacts with Amazon S3 pricing — particularly around API requests, storage classes, and lifecycle policies. A well-planned configuration can make a significant difference in your overall migration cost. Let's break down the key areas to focus on.

A clear understanding of the use case, combined with proper configuration, is key to avoiding unnecessary expenses. This blog explores the different phases of the DataSync migration process and highlights how to choose the right Amazon S3 storage tier for your data requirements, considering factors such as the number of objects, their size, and access pattern, which can significantly impact both transfer and storage costs.


Understanding AWS DataSync Transfer Modes

AWS DataSync offers two distinct transfer modes, each with different cost implications:

Transfer All Data

DataSync copies everything in the source to the destination without comparing differences between the locations. This mode is ideal for initial migrations.

However, by default, DataSync is configured to overwrite existing objects at the destination (OverwriteMode = ALWAYS). Running this mode multiple times will therefore generate a new PutObject call for every source object — even if the object already exists and hasn't changed. If Amazon S3 Versioning is enabled on the bucket, each execution creates a new object version rather than replacing the existing one, resulting in multiple copies of the same data. In both cases — with or without versioning — repeated executions result in redundant API calls (PUT, multipart uploads) and additional data transfer costs. For objects stored in storage classes with minimum storage duration requirements (such as S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive), overwriting an object also triggers early deletion charges on the previous object, as the overwrite is treated as a delete-and-recreate operation.

Important: DataSync also supports an OverwriteMode = NEVER option, which prevents any modification to existing objects at the destination. While this avoids redundant writes and early deletion charges, it also means that modified files at the source will not be updated at the destination — even if the content has changed. Since "Transfer All Data" mode does not compare source and destination, DataSync has no way to detect that an existing object is outdated. Use this option only when your goal is to add new objects without ever modifying existing ones.

Transfer Only Data That Has Changed

After your initial full transfer, DataSync copies only the data and metadata that differs between the source and destination location. While this reduces data transfer costs by not copying existing content, it requires additional API calls for comparison operations (ListObjectsV2, HeadObject), which can be costly, especially for large datasets with many files.

Behind the Scenes: DataSync executes various AWS API calls such as ListObjectsV2 (to list objects in an Amazon S3 bucket) and HeadObject (to retrieve metadata like object size and modification dates). These calls, necessary for comparing source and destination, increase the overall cost of your migration. This becomes especially significant as your dataset grows, since API costs scale linearly with the number of objects in your dataset.

For a detailed walkthrough of how DataSync behaves when synchronizing data to S3 buckets — including scenarios where objects were written by a utility other than DataSync — see Synchronizing your data to Amazon S3 using AWS DataSync by David DeLuca.


Task Modes: Enhanced Mode vs Basic Mode

DataSync offers two task modes with different capabilities and cost implications:

Enhanced Mode

  • Performance: Transfer virtually unlimited numbers of objects with higher performance than Basic mode
  • Processing: Lists, prepares, transfers, and verifies data in parallel
  • Limitations: Does not support "Verify all data" option
  • Verification: Only supports "Verify only transferred data" option
  • Cost: Higher per-GB transfer fee than Basic mode, plus an additional fee per task execution

Basic Mode

  • Performance: Lower performance than Enhanced mode, as processing runs sequentially rather than in parallel
  • Processing: Processes data sequentially (prepare → transfer → verify)
  • Limitations: Subject to quotas on the number of files and objects in a dataset
  • Verification: Supports all verification options including "Verify all data"
  • Cost: Lower per-GB transfer fee than Enhanced mode, with no per-task-execution fee

Breaking Down the DataSync Task Execution Phases

AWS DataSync operates in three main phases, each with its own cost considerations:

DataSync Task Execution Phases

With "Transfer All Data", DataSync begins transferring immediately (no preparation phase).

1. Preparing Phase

During the Preparation Phase, DataSync scans both the source and destination locations to identify the data that needs to be transferred. This phase occurs only when using the "Transfer Only Data That Has Changed" mode. DataSync performs API calls like ListObjectsV2 and HeadObject to compare metadata between the locations. These calls are particularly important for large datasets with many files, as each object requires an individual metadata check.

Additionally, the task mode can impact the overall job duration. In Enhanced mode, DataSync starts transferring files before the preparation phase completes, reducing the total execution time. In Basic mode, the entire preparation must complete before the transfer begins.

2. Transferring Phase

The Transferring Phase is when DataSync actually moves your data from the source to the destination.

This phase generates two distinct cost components that scale differently:

  • DataSync transfer fee: charged per GB of data copied. Scales with data volume, not object count. Tiered pricing decreases as volume increases.
  • S3 API request charges: each object written generates PutObject calls billed as standard Amazon S3 requests. Scales with number of objects, not data volume.

For large objects exceeding the multipart upload threshold, DataSync automatically splits the file into multiple parts — each part generating a separate API call. This means large files produce more API requests than a single PutObject, but still far fewer than an equivalent volume of small files.

The DataSync transfer fee is based on data volume (GB), so it remains the same regardless of how many objects you transfer. However, each object generates its own S3 PutObject API call — so transferring 10 million small files costs significantly more in S3 API requests than transferring a few large files of the same total size.

3. Verifying Phase

DataSync performs data integrity checks to ensure that everything has transferred successfully. You have three verification options:

  • Don't verify data after transfer: DataSync performs data integrity checks only during the transfer itself. No additional verification occurs at the end.
  • Verify only transferred data (recommended): DataSync calculates checksums of transferred data at the source and compares them with checksums at the destination. This option is supported for all storage classes, including S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive.
  • Verify all data: DataSync performs GetObject API calls across the entire destination dataset to recalculate checksums and verify full synchronization with the source. This option is not supported in Enhanced mode, nor when the destination uses S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive — because GetObject cannot be performed on archived objects without a prior restore operation. For objects in S3 Standard-IA, S3 One Zone-IA, or S3 Glacier Instant Retrieval, this option will incur data retrieval costs in addition to request charges.

In Enhanced mode, DataSync verifies each object as it is transferred to the destination (inline verification). In Basic mode, verification happens at the end of the transfer, which can take significant time for large datasets.


How Object Count and Dataset Characteristics Affect Cost

The number of objects and their size are the two main factors that influence the overall cost of your migration — but they impact different cost components.

When evaluating the cost of a DataSync migration, it's important to look beyond DataSync pricing itself. Each DataSync task execution triggers Amazon S3 API calls (such as ListObjectsV2, HeadObject, GetObject and PutObject) that are billed as standard Amazon S3 request charges — separate from both the DataSync transfer fee and your Amazon S3 storage costs. These API charges are easy to overlook because they appear under S3 on your AWS bill, not under DataSync. For small datasets, the impact is negligible. But at scale — millions of objects with daily sync cycles running over weeks or months — these hidden API costs can accumulate significantly and even exceed the cost of both the data transfer and the storage itself.

Number of Objects Impact

Each object in your dataset generates its own set of Amazon S3 API calls — HeadObject for metadata comparison, ListObjectsV2 for enumeration (paginated at 1,000 objects per response), PutObject for the actual transfer and GetObject during the verification phase. These are standard Amazon S3 request charges, triggered by DataSync internally as part of the transfer process, and they appear on your regular Amazon S3 usage. A dataset with millions of small files can incur significantly more in API costs than a dataset with fewer, larger files — even if the total data volume is the same.

Note: HeadObject requests are billed based on the storage class of the destination objects. For objects stored in S3 Standard-IA, S3 One Zone-IA, or S3 Glacier Instant Retrieval, HEAD requests are more expensive than for S3 Standard — and these costs apply on every sync execution, even when no data is transferred.

File Size Consideration

For larger objects, DataSync automatically uses the Amazon S3 multipart upload feature, splitting files into multiple parts for transfer. Each part generates a separate API call, so the number of Amazon S3 requests is proportional to the number of parts uploaded, not the number of objects transferred. That said, large files still produce far fewer total API requests than an equivalent data volume composed of many small files, where each file requires its own individual PutObject call.

API Pagination

The Amazon S3 ListObjectsV2 API returns a maximum of 1,000 objects per response. For buckets with more than 1,000 objects, DataSync must make multiple paginated calls using continuation tokens to enumerate the full dataset.


Choosing the Right Amazon S3 Storage Class and Transfer Mode

Selecting the appropriate Amazon S3 storage class is a critical factor in controlling migration costs. Beyond storage and retrieval pricing, S3 API request charges also differ significantly across storage classes. At the time of writing, PUT requests to S3 Glacier Flexible Retrieval cost approximately more, and to S3 Glacier Deep Archive approximately 10× more, than the same operations to S3 Standard. LIST requests, however, are always billed at the S3 Standard rate regardless of the destination storage class, while HEAD requests are billed based on the storage class of the destination objects.

Since every DataSync execution generates PutObject calls, and additionally ListObjectsV2 and HeadObject calls depending on the transfer mode, the choice of storage class directly impacts your API costs, not just your storage costs.

Understanding Storage Class Access Patterns

Some Amazon S3 storage classes do not provide immediate access to your data. S3 Standard, S3 Standard-IA, S3 One Zone-IA, S3 Intelligent-Tiering, and S3 Glacier Instant Retrieval all provide direct, millisecond access to objects. However, S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive require a restore operation before objects can be accessed — which can take minutes to hours depending on the retrieval option selected.

While these classes are best suited for long-term archiving requirements, they directly impact DataSync migrations: objects stored in these archival classes cannot be read by DataSync without a prior restore, and verification options that require reading destination objects may incur additional retrieval costs and delays.

Choosing the right storage class is therefore not only a storage cost decision, but also a factor that directly influences your DataSync migration costs and operational workflow.

Let's look at a real-world customer example...


Real-World Cost Insights: When API Calls Outweigh Storage Costs

A Customer successfully uses AWS DataSync to securely migrate large volumes of data from their on-premises environments to Amazon S3. To ensure data integrity throughout the migration process, they chose the "Transfer Only Data That Has Changed" mode. This decision was deliberate — the goal was to guarantee that every file was accounted for, and that all deltas between source and destination were captured, even across multiple migration cycles.

Because the migration involved critical legal documents, the team prioritized completeness and accuracy over cost. They ran multiple DataSync tasks over a 5-month period (May–September 2025), with each execution scanning the entire dataset to identify deltas and verify consistency.

Let's break down the DataSync transfer costs and Amazon S3 API operation costs over the past 5 months to better understand the spending pattern.

Datasync and S3 Cost

June — Peak Migration Month

  • Amazon S3 API calls ($23,097) cost 9.4× more than the storage of 142.76 TB itself ($2,449)
  • Amazon S3 API calls represent 86% of the total migration cost for June ($23,097 out of $26,734) for 142.76 TB transferred, while storage and DataSync transfer combined account for 14% ($3,637)

August — Migration Nearly Complete

  • By August, PutObject charges dropped to just $27 (0.12 TB transferred), showing a direct correlation between the volume of data transferred and the PutObject API calls, confirming that transfer costs decrease naturally as the migration completes.
  • However, ListObjectsV2 ($2,142) and HeadObject ($2,175) remained significant at $4,317 — because these costs are not driven by the amount of data transferred, but by the total number of objects in the dataset. Every sync execution scans the entire dataset across all objects to detect deltas, regardless of how little has actually changed.

This demonstrates how API costs can accumulate significantly and even exceed the cost of both the data transfer and the storage itself.

In fact, every time DataSync runs in "Transfer Only Data That Has Changed" mode, it performs a full scan of both source and destination locations:

  • ListObjectsV2: enumerates every object in the bucket to build an inventory. With millions of objects and Amazon S3's limit of 1,000 objects per API response, this requires thousands of paginated calls.
  • HeadObject: retrieves metadata (size, modification date) for each individual object to compare source and destination. With millions of objects, this means millions of individual API calls.

These operations ran behind the scenes with every task execution throughout the month. Even when only a small number of files had changed, the full dataset was scanned each time.

In this case, the customer accepted the API costs as a necessary trade-off to guarantee that no legal document would be lost or duplicated.


Best Practices for AWS DataSync Configuration

1. Choose Your Transfer Mode Strategically

Transfer All Data

  • Use this mode for your initial migration. It skips the preparation phase and immediately begins transferring — the most efficient option when copying to an empty destination.
  • Use this mode if you only want to add new objects to the destination without modifying existing ones. By setting OverwriteMode to NEVER, DataSync will skip any object that already exists at the destination, even if it has been modified at the source.
  • Avoid running this mode repeatedly. Each execution re-transfers everything, meaning you pay PutObject requests and DataSync per-GB transfer fees on the full data volume every time. This is especially costly when the destination uses archival storage classes, where PUT requests are significantly more expensive than S3 Standard: approximately 6× for S3 Glacier Flexible Retrieval and 10× for S3 Glacier Deep Archive. Repeated executions also trigger early deletion charges on overwritten objects in Glacier classes, and create additional object versions if S3 Versioning is enabled.

Transfer Only Data That Has Changed

  • Reserve this mode for subsequent synchronizations. Each execution scans all objects at both source and destination regardless of how many files changed. While this scan cost is significantly lower than re-transferring the entire dataset, it can still be substantial for large datasets with millions of objects — especially when sync executions are frequent.

2. Select Appropriate Task Mode

  • Enhanced Mode: Choose for higher performance — DataSync lists, prepares, transfers, and verifies data in parallel, with support for virtually unlimited numbers of objects. Note that this mode only supports "Verify only transferred data".
  • Basic Mode: Choose this mode when you need "Verify all data" verification (available only for non-archival storage classes), bandwidth limiting, or sequential processing control.

3. Optimize Verification Settings

  • Verify only transferred data (recommended): Best balance of data integrity and cost. The only option available in Enhanced mode, and the only option supported when the destination is S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive.
  • Verify all data: Use only when you need complete synchronization verification between source and destination. This option requires Basic mode and is not supported for S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive destinations. Be aware that it generates GetObject calls across the entire destination dataset, and incurs data retrieval fees for objects in S3 Standard-IA, S3 One Zone-IA, or S3 Glacier Instant Retrieval.
  • Don't verify data after transfer: For trusted environments where speed is prioritized. DataSync still performs integrity checks during the transfer itself.

4. Plan for Amazon S3 API Costs

  • Estimate API call volumes based on your dataset characteristics — the number of objects is the primary cost driver, not just the data size.
  • Consider consolidating smaller files into compressed bundles (e.g., ZIP or TAR) before transfer to reduce the number of objects and associated API overhead.
  • Monitor your AWS bill during initial transfers to understand cost patterns. Note that Amazon S3 API charges triggered by DataSync appear under Amazon S3 on your bill, not under DataSync.

5. Optimize Sync Frequency

When using "Transfer Only Data That Has Changed" mode, every sync execution scans all objects at both source and destination to detect deltas, generating ListObjectsV2 and HeadObject calls across the entire dataset. As shown in our customer example, once the migration was nearly complete and very few files remained to be transferred, scanning costs reached 160× the actual transfer costs. (August: $4,317 in ListObjectsV2 and HeadObject calls vs. $27 in PutObject charges).

  • If continuous sync is not required, reduce execution frequency — running once a month instead of daily can reduce API costs by ~30×.
  • Plan a cutover strategy based on the rate of change in your data. If changes accumulate significantly between syncs, consider a two-step approach: run a large sync a few days before cutover to reduce the delta, then run a final short sync during the cutover window to capture only the remaining changes. This minimizes the cutover duration and reduces the risk of data inconsistency.

6. Choose Your Destination Storage Class Wisely

When using "Transfer Only Data That Has Changed" mode, the destination storage class impacts your PutObject costs on every sync execution. PUT requests cost approximately more for S3 Glacier Flexible Retrieval and 10× more for S3 Glacier Deep Archive compared to S3 Standard. LIST requests are always billed at the S3 Standard rate regardless of the destination storage class, but HEAD requests (scan) and GetObject calls (verification) are billed based on the storage class of the destination objects.

The right approach depends on how much data changes between sync executions:

  • Low rate of change (< 5% per sync): the scan cost dominates each execution, and the PUT premium on Glacier has minimal impact. Transferring directly to your target archival storage class is generally more cost-effective than adding a Lifecycle transition step.
  • High rate of change with multiple sync executions: when a significant portion of your dataset changes between syncs, the PUT cost dominates each execution, and approximately 10× premium on S3 Glacier Deep Archive compounds with every additional sync. Migrating to Amazon S3 Standard first and transitioning via Lifecycle policies after cutover becomes significantly cheaper — often from the second or third sync onward for datasets with very high change rates.

Recommendation: Evaluate the expected rate of change in your dataset before choosing your destination storage class. For migrations with frequent syncs and a high volume of changes, the two-phase approach (S3 Standard during migration, then Lifecycle transition to archival tier) can significantly reduce API costs compared to migrating directly to S3 Glacier Deep Archive.


Conclusion

AWS DataSync is a powerful tool for data migration, but understanding its cost implications is crucial for successful implementations. By carefully selecting transfer modes, configuring verification options appropriately, choosing the right Amazon S3 storage classes, and optimizing sync frequency based on your rate of change, you can significantly reduce migration costs while maintaining data integrity.

The key is to match your DataSync configuration to your specific migration needs, considering factors like dataset size, file count, rate of change between syncs, the number of planned sync executions and performance requirements.

With proper planning and configuration, you can avoid the costly surprises and ensure your migration project stays within budget.

The key takeaway: for large-scale migrations, the number of objects — not the volume of data — is the primary cost driver. Plan your DataSync configuration accordingly.

For more information about AWS DataSync configuration and best practices, visit the AWS DataSync documentation.


Authors

Judith Mettoudi Technical Account Manager

Noam Lichtblau Principal Technical Account Manager


Resources