내용으로 건너뛰기

Not able to to retrieve Salesforce Bulk API JOB ID using AWS Glue

0

Hi Team,

We are trying to insert records in Salesforce health cloud edition using AWS Glue (Version 5) but unfortunately we are not able to retrieve the success & failure results after the job gets completed in Salesforce. Neither we are able to click on "View Request" & "View Result" on the "Bulk Data Load Page" on Salesforce UI since AWS Glue Salesforce native connector makes use of Bulk API V2.

We know that Salesforce by default provides below 2 APIs for retrieving success & failure results in a CSV file but since AWS Glue native connector does not return Salesforce Bulk API JOB ID so we are not able to call these 2 APIs to retrieve success & failure results:

https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_failed_results.htm

https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_successful_results.htm

We are using below Glue script (Spark code) to insert records in Salesforce Org:

def insert_to_salesforce(df, glue_context, entity_name, connection_name, api_version=API_VERSION): if df.limit(1).count() == 0: logger.info("No rows to insert.") return logger.info(f"Trying to load data into salesforce (INSERT) ..................") try: ddf = DynamicFrame.fromDF(df, glue_context, f"insert_{entity_name}") glue_context.write_dynamic_frame.from_options( frame=ddf, connection_type="salesforce", connection_options={ "connectionName": connection_name, "ENTITY_NAME": entity_name, "API_VERSION": "v60.0", "WRITE_OPERATION": "INSERT", "TRANSFER_MODE": "ASYNC", #"bulkApi": "true", #"bulkApiVersion": "1.0", #"errorMode": "PERMISSIVE",
#"errorOutputPath": "s3://cld-aws-jnj-wme-migration-non-prod-dev-raw-masked-bucket/dev/withme/error_archived/", "FAIL_ON_FIRST_ERROR": "false", }, ) logger.info(f"Insert requested for {df.count()} rows into {entity_name}") except Exception as e: logger.error(f"Error writing to Salesforce (INSERT): {str(e)}") logger.error(traceback.format_exc())

Thanks, Hardik

질문됨 3달 전76회 조회
1개 답변
0

Based on the AWS Glue documentation [1][5], the native Salesforce connector for AWS Glue does not provide a mechanism to retrieve the Salesforce Bulk API Job ID after job completion. This is a known limitation of the current connector implementation, not a configuration issue. The connector handles the bulk data operations internally and abstracts away the underlying Salesforce Bulk API details, including the Job ID. This means you cannot directly access the Salesforce Bulk API Job ID through the connector to call Salesforce's success/failure result APIs.

Additionally, it is important to note that the AWS Glue Salesforce connector has several other known limitations [5], including:

  • Lack of support for job bookmarks
  • The requirement to use Spark SQL rather than Salesforce's SOQL

These constraints are part of the current connector design. The AWS Glue service team is already aware of these limitations, and a feature request is already in place to address the Job ID limitation.

In order to work around this known limitation, we have the following recommendations. Please evaluate and use them as per your use-case:

=> Enable "FAIL_ON_FIRST_ERROR": "true" I noticed in your current script this is set to "false". Setting this to "true" [1] can help minimize partial writes by failing the Glue job when the first error is encountered on the Salesforce side. However, please note that some records may still be written before the first error occurs, so partial data in Salesforce is still possible. It can help reduce the scope of unknown failures.

=> Error Handling in AWS Glue While the connector doesn't return the Salesforce Job ID, you can implement error handling within your Glue job to capture and log any failures that occur during the write operation. Monitor the job execution logs in Amazon CloudWatch [6] for any errors related to the Salesforce write operation.

=> Retrieve Job ID from logs and use Salesforce APIs externally: The Salesforce Bulk API V2 Job ID does appear in the execution logs. You can extract this Job ID from the logs and then call Salesforce's native Bulk API V2 endpoints to retrieve the results of the job. However, please note that this cannot be done within the Glue Spark script itself.

The following three Salesforce Bulk API V2 endpoints (available in API version 41.0 and later) can be used to retrieve results:

Get Job Failed Record Results [2]: GET /services/data/vXX.X/jobs/ingest/{jobId}/failedResults/ — Returns a CSV file containing all records that encountered an error during processing, including the error code and message (sf__Error) and the record ID (sf__Id).

Get Job Successful Record Results [3]: GET /services/data/vXX.X/jobs/ingest/{jobId}/successfulResults/ — Returns a CSV file containing all records that were successfully processed, including whether the record was created (sf__Created) and the record ID (sf__Id).

Get Job Unprocessed Record Results [4]: GET /services/data/vXX.X/jobs/ingest/{jobId}/unprocessedrecords/ — Returns a CSV file containing all records that were not processed by the job. This applies to jobs that were interrupted or otherwise failed to complete. Note that unprocessed records are not the same as failed records — failed records were processed but encountered an error during processing, while unprocessed records were not processed.

I hope you found the above information useful and clarifies the situation you are encountering with the AWS Glue Salesforce connector not returning the Bulk API V2 Job ID.

Please be informed that there is already an existing feature request with the AWS Glue service team to support returning the Salesforce Bulk API V2 Job ID and response data through the native connector. While as a Support Engineer, I cannot provide an ETA on when this will be implemented, the service team is actively aware of this limitation.

It is recommended monitoring the AWS blogs and announcements pages linked below for updates on new features and releases. [+] https://aws.amazon.com/new/ [+] https://aws.amazon.com/blogs/aws/

===Reference(s)===

[1] https://docs.aws.amazon.com/glue/latest/dg/salesforce-writing-to.html

[2] https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_failed_results.htm

[3] https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_successful_results.htm

[4] https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_unprocessed_results.htm

[5] https://docs.aws.amazon.com/glue/latest/dg/salesforce-connector-limitations.html

답변함 3달 전
AWS
지원 엔지니어
수정됨 3달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

관련 콘텐츠