AWS glue to Azure Datalake

0

Hello All,

We need to small POC . In this we need to pick data from salesforce and push to Azure datalake using Glue . Can we connect to Azure datalake from Glue .

  • Hi, @Purnima.

    What do you want to connect to in Azure Data Lake?
    Is it Azure Data Lake Storage Gen2?

  • Hi @iwasa ,

    yes it is Azure data Lake Storage Gen2

Purnima
asked 2 years ago569 views
3 Answers
1

Hi, @Purnima.

You may only be able to connect via JDBC from Glue via Azure Synapse or using a 3rd party product such as CData.
However, it's probably intended for reading by Glue, so it's probably not suitable for writing.

So, for Glue, I think you'll need to write a custom script to send objects directly to Azure Storage using an Azure authentication token, or handle the write workflow with something like Lambda or StepFunction.

In this case, I think you'd be smarter to use Azure Data Factory (ADF) that ETL service on Microsoft Azure.
ADF also supports Salesforce as a source.

https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview

profile picture
EXPERT
iwasa
answered 2 years ago
0

Hi,

I understand that you need to build ETL pipeline to copy data files from salesforce and push to Azure Delta Lake using AWS Glue service and you would like to know how to connect to Azure datalake.

I investigated on you concern, and unfortunately I could not find any official document for connecting Glue to Azure Data Lake Storage Gen2 (Azure ADLS) container or any available JDBC drivers. But, I found an official article[1] explaining how to access and analyze on-premises data stores using AWS Glue. Although it doesn't cover your use case specifically, it may give you an idea in setting up the connection. Please refer this article[2] for understanding about setting up jdbc connection and the additional properties that can be set up.

However, I have also found a third party article[3] which explains how to connect to Azure Data Lake Storage Data in AWS Glue Jobs Using JDBC. Although this is not an official document, but I suggest you can give it read and see if that helps.

[1] https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/
[2] https://docs.aws.amazon.com/glue/latest/dg/connection-properties.html#connection-properties-jdbc
[3] https://www.cdata.com/kb/tech/azuredatalake-jdbc-aws-glue.rst

Thank you.

AWS
SUPPORT ENGINEER
answered 2 years ago
0

Not getting option to accept answer but I have upvoted the answer

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions