- 最新
- 最多得票
- 最多評論
I am not sure you would be using the benefits of the Glue core if you are calling an API. The driver would have to handle the API requests, while the executors would not be able to use their compute power to call the APIs in parallel. Given that, I would believe you would not use the power of Glue until you use Pyspark/DynamicFrames to process data. It may be more efficient (less expensive) to orchestrate a Lambda function to read S3, call API and do transformation and write into S3 before you use a Glue job to process/transform for your ETL.
That said, there may be an use case for what you want to implement. In case you want to try calling an API from Glue using Python code, you could try the following code.
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3 ## Library for invoking Lambda
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## your ETL logic prior to invoking Lambda
## Once the ETL completes
lambda_client = boto3.client('lambda')
response = lambda_client.invoke(FunctionName='LambdaName')
## Your ETL code after invoking lanbda
if you want to call an external API, you need to install requests
module using additional-python-modules option and then use the below code:
import requests
url="https://example.com/api/jobs/test"
response = requests.post(url)
print(response.text) #TEXT/HTML
print(response.status_code, response.reason) #HTTP
相關內容
- AWS 官方已更新 2 年前
- AWS 官方已更新 3 年前
- AWS 官方已更新 1 年前