Skip to content

Not able to to retrieve Salesforce Bulk API JOB ID using AWS Glue

0

Hi Team,

We are trying to insert records in Salesforce health cloud edition using AWS Glue (Version 5) but unfortunately we are not able to retrieve the success & failure results after the job gets completed in Salesforce. Neither we are able to click on "View Request" & "View Result" on the "Bulk Data Load Page" on Salesforce UI since AWS Glue Salesforce native connector makes use of Bulk API V2.

We know that Salesforce by default provides below 2 APIs for retrieving success & failure results in a CSV file but since AWS Glue native connector does not return Salesforce Bulk API JOB ID so we are not able to call these 2 APIs to retrieve success & failure results:

https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_failed_results.htm

https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_successful_results.htm

We are using below Glue script (Spark code) to insert records in Salesforce Org:

def insert_to_salesforce(df, glue_context, entity_name, connection_name, api_version=API_VERSION): if df.limit(1).count() == 0: logger.info("No rows to insert.") return logger.info(f"Trying to load data into salesforce (INSERT) ..................") try: ddf = DynamicFrame.fromDF(df, glue_context, f"insert_{entity_name}") glue_context.write_dynamic_frame.from_options( frame=ddf, connection_type="salesforce", connection_options={ "connectionName": connection_name, "ENTITY_NAME": entity_name, "API_VERSION": "v60.0", "WRITE_OPERATION": "INSERT", "TRANSFER_MODE": "ASYNC", #"bulkApi": "true", #"bulkApiVersion": "1.0", #"errorMode": "PERMISSIVE",
#"errorOutputPath": "s3://cld-aws-jnj-wme-migration-non-prod-dev-raw-masked-bucket/dev/withme/error_archived/", "FAIL_ON_FIRST_ERROR": "false", }, ) logger.info(f"Insert requested for {df.count()} rows into {entity_name}") except Exception as e: logger.error(f"Error writing to Salesforce (INSERT): {str(e)}") logger.error(traceback.format_exc())

Thanks, Hardik

asked 2 months ago71 views
1 Answer
0

Based on the AWS Glue documentation [1][5], the native Salesforce connector for AWS Glue does not provide a mechanism to retrieve the Salesforce Bulk API Job ID after job completion. This is a known limitation of the current connector implementation, not a configuration issue. The connector handles the bulk data operations internally and abstracts away the underlying Salesforce Bulk API details, including the Job ID. This means you cannot directly access the Salesforce Bulk API Job ID through the connector to call Salesforce's success/failure result APIs.

Additionally, it is important to note that the AWS Glue Salesforce connector has several other known limitations [5], including:

  • Lack of support for job bookmarks
  • The requirement to use Spark SQL rather than Salesforce's SOQL

These constraints are part of the current connector design. The AWS Glue service team is already aware of these limitations, and a feature request is already in place to address the Job ID limitation.

In order to work around this known limitation, we have the following recommendations. Please evaluate and use them as per your use-case:

=> Enable "FAIL_ON_FIRST_ERROR": "true" I noticed in your current script this is set to "false". Setting this to "true" [1] can help minimize partial writes by failing the Glue job when the first error is encountered on the Salesforce side. However, please note that some records may still be written before the first error occurs, so partial data in Salesforce is still possible. It can help reduce the scope of unknown failures.

=> Error Handling in AWS Glue While the connector doesn't return the Salesforce Job ID, you can implement error handling within your Glue job to capture and log any failures that occur during the write operation. Monitor the job execution logs in Amazon CloudWatch [6] for any errors related to the Salesforce write operation.

=> Retrieve Job ID from logs and use Salesforce APIs externally: The Salesforce Bulk API V2 Job ID does appear in the execution logs. You can extract this Job ID from the logs and then call Salesforce's native Bulk API V2 endpoints to retrieve the results of the job. However, please note that this cannot be done within the Glue Spark script itself.

The following three Salesforce Bulk API V2 endpoints (available in API version 41.0 and later) can be used to retrieve results:

Get Job Failed Record Results [2]: GET /services/data/vXX.X/jobs/ingest/{jobId}/failedResults/ — Returns a CSV file containing all records that encountered an error during processing, including the error code and message (sf__Error) and the record ID (sf__Id).

Get Job Successful Record Results [3]: GET /services/data/vXX.X/jobs/ingest/{jobId}/successfulResults/ — Returns a CSV file containing all records that were successfully processed, including whether the record was created (sf__Created) and the record ID (sf__Id).

Get Job Unprocessed Record Results [4]: GET /services/data/vXX.X/jobs/ingest/{jobId}/unprocessedrecords/ — Returns a CSV file containing all records that were not processed by the job. This applies to jobs that were interrupted or otherwise failed to complete. Note that unprocessed records are not the same as failed records — failed records were processed but encountered an error during processing, while unprocessed records were not processed.

I hope you found the above information useful and clarifies the situation you are encountering with the AWS Glue Salesforce connector not returning the Bulk API V2 Job ID.

Please be informed that there is already an existing feature request with the AWS Glue service team to support returning the Salesforce Bulk API V2 Job ID and response data through the native connector. While as a Support Engineer, I cannot provide an ETA on when this will be implemented, the service team is actively aware of this limitation.

It is recommended monitoring the AWS blogs and announcements pages linked below for updates on new features and releases. [+] https://aws.amazon.com/new/ [+] https://aws.amazon.com/blogs/aws/

===Reference(s)===

[1] https://docs.aws.amazon.com/glue/latest/dg/salesforce-writing-to.html

[2] https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_failed_results.htm

[3] https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_successful_results.htm

[4] https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_unprocessed_results.htm

[5] https://docs.aws.amazon.com/glue/latest/dg/salesforce-connector-limitations.html

answered 2 months ago
AWS
SUPPORT ENGINEER
revised 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.