Migrating from on-prem Redis to Amazon DynamoDB

12 minute read
Content level: Intermediate
0

Customers love Redis and Amazon DynamoDB. Redis is a very fast in-memory data store that provides sub-millisecond latency. It is available as a managed service through Amazon Elasticache for Redis and Amazon MemoryDB for Redis. Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale. In this article I show you how to migrate a Redis cluster to Amazon DynamoDB.

Migrating from Redis to Amazon DynamoDB

Customers love Redis and Amazon DynamoDB. Redis is a very fast in-memory data store that provides sub-millisecond latency. It is available as a managed service through Amazon Elasticache for Redis and Amazon MemoryDB for Redis. Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale.

In this post I show you how to migrate a Redis cluster to Amazon DynamoDB.

Reasons to migrate from Redis to Amazon DynamoDB

We list 3 reasons why migrating from Redis to DynamoDB can make sense for your application:

  • Migrating from an on-premises Redis or from an alternative Cloud Service Provider to an AWS environment. While Amazon Elasticache and Amazon MemoryDB are available and the best options for Redis workloads on AWS, your application data access patterns might be better suited to use Amazon DynamoDB which would have not been available on other environments.

  • Evolving requirements. While your application might have started with the need of an in-memory data store or requiring a particular Redis feature, this might no longer be the case. Depending on your data and access patterns Amazon DynamoDB can be more cost effective, especially if you have a self-managed Redis cluster.

  • New features required. DynamoDB performance at scale, point-in-time recovery, data document models, ACID transactions and a very broad and deep integration with other AWS services such as Kinesis Data Streams, Amazon S3, AWS Glue, Athena. It is fully managed and offers built-in security, continuous backups, automated multi-Region replication, in-memory caching, and data import and export tools.

Types of migrations and migration process

Should you decide to migrate from Redis to DynamoDB, you have different options on how to perform the migration. Database migrations can happen in different scenarios:

  1. Offline migration: We migrate the data all at once, while there are no writes on the source database. This process typically will be performed during a weekend or at night, and might require some time where new writes are not allowed.

  2. Staged migration: To facilitate the burden of a migrating multiple clients accessing a database, some teams chose to keep both databases in a replica mode during a period of time where teams progressively migrate read operations from the source database to the target. This is similar to having a replica database in a different database engine. Eventually all read operations and write operations will have moved from the source database to the target in which case we can simply stop the source database.

  3. Zero downtime migration: In this scenario, the database migration happens all at once, but the requirements is that there should not be any downtime.

In our example we explain how to perform an offline migration, focusing on data migration aspects and leaving the query patterns to the reader. Note that this process depends heavily on the data access patterns since these are different technologies.

These are the general steps for an offline migration from Redis to DynamoDB (assuming our source code is already able to access data from DynamoDB):

  1. Getting a copy of the original Redis database. You can accomplish this through querying the database (a performance and more time intensive process) or converting the Redis dump files to be imported to DynamoDB.

  2. Transforming the data to fit the DynamoDB import format. The choices of mapping data are not straightforward. This process requires adapting the queries from Redis to DynamoDB which can be time consuming depending on the variety of data stored in Redis. Also we need to adapt the queries since both databases have different ways of querying the information.

  3. Importing the transformed data in Amazon DynamoDB. DynamoDB allows importing data through regular write queries or using Amazon S3 Import to DynamoDB. If we have big amounts of data and a new database we recommend to import from S3 since it is more cost effective. If we are doing a live replication, we need to use the regular write API since importing from S3 currently only works with a new table.

  4. Monitoring, debugging and verification.

Data mapping between Redis and DynamoDB

Redis and DynamoDB use the concept of key-value with some differences:

  • Keys. Redis is centered around the notion of key-value, where the key acts as an index for the information. This can easily be mapped to a partition key in DynamoDB. Note that DynamoDB offers additional features, such as sort keys or global secondary indexes.

  • Values. In Redis every key usually only can store a value (which can be of different types), while in DynamoDB each key can store many values, allowing for more compact representations of the same data.

Let's go through each of the basic types supported in Redis and see how these would map in DynamoDB:

  • Strings, Lists, Hashes, Sets. These types are easily mapped between Redis and DynamoDB since both engines support these out of the box. Note that that there are nuances as there are differences between the two.

  • Sorted sets. These Redis types can be mapped to lists since sets in DynamoDB do not preserve order.

  • JSON. Unlike Redis, DynamoDB allows for JSON representation without any plugin of any depth. You can accomplish this with a document type.

  • Geospatial, HyperLogLog, Bitmap and Bitfield. These types are not supported out of the box by DynamoDB and would be best to support them at the application level or consider other databases such as Amazon OpenSearch (for geospatial) or implementing them at the application level (for the rest).

  • Indexes. While Redis does not have the concept of an index, very often these are implemented using key-value matches. This opens up new opportunities to use Global Secondary Indexes in the new DynamoDB table.

Migrating 1 billion records from Redis to DynamoDB

In our example, we have a Redis installation that has grown very fast in size with over 1 billion key value pairs and 100+ Gigabytes of data. Overtime we realized that latency requirements were not as strict as originally thought and scale was much more important for us than latency. We decided to migrate our self-managed Redis database to Amazon DynamoDB using the "offline" scenario.

We follow this process:

  1. We read the data from Redis and transform it to a valid DynamoDB S3 import format . In our example we chose DynamoDB JSON. Other valid formats include CSV and Amazon Ion.

  2. We save the DynamoDB JSON files in an S3 bucket. Since we have quite a lot of data we chose to use Gzip compression.

  3. Using the S3 import to DynamoDB feature we create a new table with the relevant files, keys, indices and capacity.

  4. After importing we can create a DynamoDB Accelerator (DAX) and redirect queries to the new DynamoDB table.

Enter image description here

Decide how to map our Redis structure to DynamoDB

In our Redis database we had a collection of hashes each indexed with a key (representing an IP address). Here is an entry example:

>redis-cli -h <MY_REDIS_HOST> --tls -p 6379                                                                                        
MY_REDIS_HOST:6379> hgetall i:119.205.177.176
1) "name"
2) "John Burns"
3) "age"
4) "6867"
5) "longitute"
6) "-48.711598"
7) "email"
8) "davidlee@delacruz.com"

We first create an example manually using the Amazon DynamoDB console to see how it would be represented. The following screenshot shows how we decided to map it on DynamoDB:

Enter image description here

Once the example format fits our requirements we can click on "JSON View" on the top right corner and keep the "View DynamoDB JSON" selector and we see the following:

{
 "key": {
  "S": "i:119.205.177.176"
 },
 "value": {
  "M": {
   "age": {
    "N": "6867"
   },
   "email": {
    "S": "davidlee@delacruz.com"
   },
   "longitute": {
    "N": "-48.711598"
   },
   "name": {
    "S": "John Burns"
   }
  }
 }

If we convert this to a single line and add the necessary "Item" key it looks like this, which is the format each of our S3 file lines needs to have:

{ "Item": { "key": {"S": "i:119.205.177.176" }, "value": {"M": { "age": {"N": "6867" }, "email": {"S": "davidlee@delacruz.com" }, "longitute": {"N": "-48.711598" }, "name": {"S": "John Burns" }} }} }

Note that there are different ways to represent values across databases and this will depend on our preferences and requirements.

Extraction Job:

We use Python together with Ray to make the extraction process concurrent.

pip install boto3 redis-py-cluster gzip json "ray[default]"

Code Sample:

The following code will migrate data from a Redis cluster to DynamoDB using Ray. Ray is used to allow for higher concurrency, you can use this process with regular multiprocessing or in serial if you don't have time constraints.

These are the main blocks of our solution:

  • The `__main__` section gets all the keys in the Redis cluster and calls process_keys() in a separate process for each list of keys

  • process_keys() gathers all the values of the given list of keys, transforms them from Redis format to Python types, and then from Python to DynamoDB JSON. Each process will save the results in a different file in Amazon S3.

  • redis_to_python() transforms a dictionary returned by Redis into the most appropriate Python native types (for example casting bytes into integers, decimals or strings)

  • python_to_dynamodb() transforms a dictionary with Python types (integers, strings, decimals, ...) into JSON DynamoDB format using TypeSerializer from boto3.

import ray 
from rediscluster import RedisCluster
from boto3.dynamodb.types import  TypeSerializer
import boto3
import gzip 
import json
import time
import yaml 
from decimal import Decimal

# ray will gather the number of cpus from the current machine/environment
ray.init() 

HOST='<YOUR_REDIS_CLUSTER_HOST>'

def redis_to_python(redis_dict: dict) -> dict:
    """
    Replace redis format with python native types, using yaml as a serializer.
    """
    python_dict = {}
    for k, v in redis_dict.items():
        value = yaml.safe_load(v.decode())
        # we encode all numbers as Decimal, which is preferred by DynamoDB 
        if isinstance(value, float) or isinstance(value, int):
            value = Decimal(v.decode())
        python_dict[k.decode()] = value
    return python_dict

def python_to_dynamodb(python_obj: dict) -> dict:
    """
    Convert a python object (usually a dictionary) into the equivalent dynamoDB json.
    """    
    serializer = TypeSerializer()
    dynamodb_obj = {}
    for k, values in python_obj.items():
        value = serializer.serialize(values)
        dynamodb_obj[k] = value
    return dynamodb_obj
    
@ray.remote
def process_keys(keys: list, index: int):
    """ 
    Each process_keys() execution will run independently in a separate worker.
    This function will :
      1. read the values of the given keys, 
      2. transform them in DynamoDB format 
      3. store them in a file
      4. Upload the file to S3.
    """
    print("Process ", index, " starting")
    redis_db = RedisCluster(host=HOST, port=6379, ssl=True, skip_full_coverage_check=True)

    # 1. get all values for the given keys using a pipeline
    # we will split the process into 100 batches/files
    BUCKET_NAME = "redis2dynamodb"
    
    pipe = redis_db.pipeline()
    for key in keys:
        pipe.hgetall(key)
    results = pipe.execute()

    # print("results: ", results)
    # 2. process each of the values, adapting the format to DynamoDB JSON format
    to_write = []
    for key, value in zip(keys, results):
        python_item = {}
        python_item["key"] = key
        python_item["value"] = redis_to_python(value)

         # before appending we need to transform them to DynamoDB and add "Item" as a key
        to_write.append({"Item": python_to_dynamodb(python_item)})
        
    # print("to_write:", to_write)
    # 3. write the results to a compressed temp file
    filename = "dbb"+str(index)+".json.gz"
    with gzip.open("/tmp/"+filename, 'wt') as f:
        for entry in to_write:
            # json.dumps is used to keep the entries in double quotes as expected in DynamoDB JSON
            f.write(json.dumps(entry) + "\n")

    # 4. upload the transformed file to our BUCKET_NAME from our /tmp to the S3 location BUCKET_NAME://redis2ddb/
    s3 = boto3.resource('s3')
    s3.meta.client.upload_file("/tmp/"+filename, BUCKET_NAME, "redis2ddb/"+filename)
    print("File ",index, " uploaded")


if __name__ == "__main__":
    # We will iterate through all keys in the database and call a worker for each batch of keys
    redis_db = RedisCluster(host=HOST, port=6379, ssl=True, skip_full_coverage_check=True)
    
    # split in 100 files
    NUM_FILES = 100
    BATCH_SIZE = list(redis_db.dbsize().values())[0] // NUM_FILES
    
    print("Processing ", NUM_FILES*BATCH_SIZE, " in ", NUM_FILES, " parts.")
    
    futures = []
    index = 0
    batch = []
    for key in redis_db.scan_iter(count=BATCH_SIZE):
        if len(batch) < BATCH_SIZE:
            batch.append(key.decode())
        else: 
            # pass these 1000 keys to a worker pool to process the values
            f = process_keys.remote(batch,index)
            # keep the promises
            futures.append(f)
            index += 1
            batch = []

    # add the last batch
    f = process_keys.remote(batch,index)
    futures.append(f)
    
    # wait for all workers to finish
    ray.get(futures)

Import to DynamoDB from S3.

Once we have executed the code above we will have an S3 bucket with all the files ready to be imported by Amazon DynamoDB S3 import.

Below you can see the options we used for our example:

Enter image description here

Monitoring, verification and DAX creation (optional)

In the following screenshot you can see we were able to import 1 billion items (115 gigabytes worth of data) in less than 2 hours. This cost us just $16.
Enter image description here

My recommendation is to import test with few files first and to check the Amazon CloudWatch logs for errors. If you are like me, you may need a few tries to get the format correct. An option I used was to create a similar table with few entries manually, export it to S3 and compared the format you generate to the one generated by DynamoDB.

DAX is optional but I found that it is relevant for those cases especially coming from a low latency engine such as Redis. In the following screenshot you can see the creation options I used:
Enter image description here

Conclusion

In this post I showed you how to migrate a Redis cluster into Amazon DynamoDB. While migrations are expensive, I believe there is not a single best database for all jobs. Check out our managed Amazon Elasticache for Redis and Amazon MemoryDB if you prefer to modernize your existing Redis installation. Features such as data tiering and auto-scaling help customers make the most of it.

What has been your experience on migrating Redis to DynamoDB? Let me know in the comments section.

profile pictureAWS
EXPERT
published 9 months ago1485 views