Cant read DF from DocumentDB connector: Partitioner calling collStats command failed through spark-mongo connector

0

I am trying to connect to my documentDB trhough the spark-mongodb connector, but it looks like DocumentDB does not support Collstats. How disable the collstats command so i can do my transformations with Spark.

dataFrame = spark.read.format("mongodb").option("spark.mongodb.database","testdb").option("spark.mongodb.collection", "collection1").load()

dataFrame.show()

It gives the following error:

Py4JJavaError: An error occurred while calling o83.showString. : com.mongodb.spark.sql.connector.exceptions.MongoSparkException: Partitioning failed. Partitioner calling collStats command failed

But dataFrame.printSchema() gives the result with the schema, i already find out that the collStats is not supported on DocDB, but how can i turn this function off with the mongo-connector for spark

preguntada hace 5 meses382 visualizaciones
1 Respuesta
3

Hello,

Yes, Collstats is different in AWS DocumentDB which is unsupported as per the doc. If you are using default values for partitioning, perhaps it is using the partitioner helper because no values are being passed. If you included specific partitioning values, perhaps it might skip calling the partitioner helper and won’t call upon $collStats. (as per this logic). Recommend to test this at your end as I havent tested before.

Alternatively, you might use a different partitioner, such as the HashPartitioner. or disable partitioning by setting the spark.mongodb.input.partitioner configuration property to none.

AWS
INGENIERO DE SOPORTE
respondido hace 5 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas