How To: Ongoing replication of tables containing LOB fields from Oracle to Redshift

0

Per the documentation (Setting LOB support for source databases in an AWS DMS task) there are three modes: Full LOB Mode, Limited LOB Mode, Inline LOB Mode. We are using Redshift as the target so that appears to limit us to Limited LOB Mode.

From the Task Settings you have two options; Don't include LOB columns, and Limited LOB mode. If you select "Don't include LOB columns" you get this error when trying to save the task:

"Replication Task settings limited lob mode should be enabled for Redshift target engine type and LobMaxSize should have value between 1 and 63 KB"

The task fails to transfer any data in the table containing LOB data, and as best I can tell this is because there are rows that have rows containing more than 63Kb of data in the LOB column. Again, per the documentation, the value of the the Max LOB size (K) is 102400, which equates to 100Mb. For our needs this is more than enough. However, when attempting to set the value of this setting to anything greater than 63, we get the same error as above.

Thus, in practice it seems you are forced to transfer LOB columns, and no values in the LOB column can exceed 63Kb, which makes this entirely useless and prevents us from migrating the table at all. Very frustrating, and so broken I assume I must be doing something wrong.

Any suggestions for what boneheaded mistake I may be making would be greatly appreciated.

1 Answer
1
Accepted Answer

Hi there.

There is no error on your part. This is a documented limitation when using Amazon Redshift as a target. The documentation reads:

Amazon Redshift doesn't support VARCHARs larger than 64 KB. LOBs from traditional databases can't be stored in Amazon Redshift.

As a workaround, I recommend storing this particular table in a different database engine, for example Aurora MySQL, then use federated queries in Amazon Redshift.

By using federated queries in Amazon Redshift, you can query and analyze data across operational databases, data warehouses, and data lakes. With the Federated Query feature, you can integrate queries from Amazon Redshift on live data in external databases with queries across your Amazon Redshift and Amazon S3 environments. Federated queries can work with external databases in Amazon RDS for PostgreSQL, Amazon Aurora PostgreSQL-Compatible Edition, Amazon RDS for MySQL, and Amazon Aurora MySQL-Compatible Edition.

I hope this helps.

profile pictureAWS
EXPERT
answered a year ago
profile picture
EXPERT
reviewed 3 months ago
profile picture
EXPERT
reviewed 6 months ago
  • Ah, I missed that in the documentation. I thought I read somewhere else something that suggested it would work with chunks larger than 64K, but only in inline mode.

    If I understand your workaround correctly, synchronize specific tables with a database supported by federated queries (which cannot work with Oracle directly) then expose the data in Redshift via a federated query. Interesting, if not convoluted. I'll keep that in my back-pocket as a possible solution.

    Thank you for the response!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions