1 Answers
0
i'd rather increase the batch size to reduce the overall number of requests to Opensearch. You also may want to increase refresh timeout. https://aws.amazon.com/ru/premiumsupport/knowledge-center/opensearch-indexing-performance/ On the other point, t3.small cluster is really small so you might need to use different type of instances
answered a month ago
Relevant questions
AWS Glue job updating an existing table
asked a month agoSetting ACL in S3 objects written by an AWS Glue Job
Accepted AnswerHow to pass parameters from an event rule through a glue workflow trigger to a job
asked 3 months agoAWS Glue Studio - AWS Lambda
asked 3 months agoGlue ETL PySpark Job Fails after Upgrade from Glue Version 2.0 to 3.0 error occurred while calling pyWriteDynamicFrame EOFException occurred while reading the port number from pyspark.daemon's stdout
asked 6 months agoAWS Glue pyspark, AWS OpenSearch and 429 Too Many Requests
asked a month agoHow to manage transaction in AWS glue job
asked 4 months agoAWS Glue ETL Job: IllegalArgumentException: Missing collection name.
asked 3 months agoUsing Pandas in Glue ETL Job ( How to convert Dynamic DataFrame or PySpark Dataframe to Pandas Dataframe)
Accepted Answerasked 4 months agoHow do I get the output of an AWS Glue DataBrew job to be a single CSV file?
Accepted Answerasked 2 years ago
Hi, thanks for your quick reply @Alex_T.
Updated to t3.small cluster and the indexing of the 1.2 million records was immediately successful with AWS Glue in 7 minutes. :) So was a useful hint with upscaling cluster instances.
For those who are interested: I also asked the fine people at Elastic about this:
https://discuss.elastic.co/t/aws-es-hadoop-and-429/310124
The guys over there mentioned that the option "es.batch.size.entries" is not respected under any circumstances. For my use case for example I enabled PySpark's overwrite mode in AWS Glue:
df.write.mode('overwrite')...
Before new documents are indexed the index is first emptied here. Turns out there is no config option in the elastic-hadoop module for that initial delete, so I always saw "1000 records" in my logs. Maybe this will help someone later.
Thanks again.
Best, Matthias