How to connect On-premise Oracle database from Glue without using Crawler

0

Hello All, I have some doubts regarding designing glue job .

  1. Do we always need catalog and crawler to connect to any database ?
  2. Can we connect to database directly either using spark read or dynamic_frame_from_options by passing DB details ?
  3. I was going though https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/ but in this we are connecting to database via catelog . cant we connect directly ?
  4. If answer to 1 and 3 is yes then do we need to attach VPC to Glue , if yes then how ?I dont see any option in console to attach VPC directly ?
  5. In security group do we need database ip address mentioned in outbound rules ?
질문됨 2년 전4023회 조회
1개 답변
0

1. Do we always need catalog and crawler to connect to any database ?

Ans:- If you would like to leverage the advantages of Glue Dynamic Frames then it is mandatory to have a Glue data catalog table else in case your requirement is to make use of spark read and store data in data frames then Glue Catalog is not required. The use of Crawlers is for the purpose of creation of Glue data catalog. Instead of Crawlers, you can manually create a catalog table as well.

2. Can we connect to database directly either using spark read or dynamic_frame_from_options by passing DB details ?

Ans:- Provided that you attach a new or existing JDBC connection to the Glue job, you can connect to the database directly. The JDBC connection shall contain the details about your on-premise Oracle database along with the VPC and other details that are to be used by the Glue job. Again, if you want to use dynamic_frame_from_options then Glue catalog must be present.

3. I was going though https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/ but in this we are connecting to database via catelog . cant we connect directly ?

Ans:- As I mentioned in my previous answer, if you want to use dynamic frame then creation of data catalog is necessary. The method followed in the link that you have mentioned, is making use of dynamic frame and thus Glue Catalog is used.

4. If answer to 1 and 3 is yes then do we need to attach VPC to Glue , if yes then how ?I dont see any option in console to attach VPC directly ?

Ans:- While creating a JDBC connection, you will be prompted to give the details of VPC, subnet and security group. So, when the Glue job is run, the backend resources of Glue shall make use of this VPC. There is no option to attach a VPC directly to the Glue job. This is done only through connection.

5. In security group do we need database ip address mentioned in outbound rules ?

Ans:- In case your security group is restrictive then, you would have to add an outbound rule mentioning the IP address. But on the other hand if there is no restriction on outbound traffic to flow out of the security group then it is not necessary to add any outbound rule. By default, the security group allows all the outgoing traffic. Please do note that an inbound self referencing rule must be added to the security group of Glue job such that it allows all TCP traffic.

profile pictureAWS
지원 엔지니어
Chaitu
답변함 2년 전
  • Hello Chaitu Thanks for sharing knowledge . I have one more doubt after reading your response :

    1. As per my knowledge while doing local development on docker using AWS glue image I think we cannot create Catelog locally . Do we need to create aws catelog using console and then use it dynamic_frame_from_options within docker .

    2. Also I would like to ask is there way to connect to On premise Database without using catelog ?

    Also can you please help me with https://repost.aws/questions/QU74cjzPDLRE2FZy12yjmBGA/aws-glue-connect-with-on-premise-db

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠