Cant read DF from DocumentDB connector: Partitioner calling collStats command failed through spark-mongo connector

0

I am trying to connect to my documentDB trhough the spark-mongodb connector, but it looks like DocumentDB does not support Collstats. How disable the collstats command so i can do my transformations with Spark.

dataFrame = spark.read.format("mongodb").option("spark.mongodb.database","testdb").option("spark.mongodb.collection", "collection1").load()

dataFrame.show()

It gives the following error:

Py4JJavaError: An error occurred while calling o83.showString. : com.mongodb.spark.sql.connector.exceptions.MongoSparkException: Partitioning failed. Partitioner calling collStats command failed

But dataFrame.printSchema() gives the result with the schema, i already find out that the collStats is not supported on DocDB, but how can i turn this function off with the mongo-connector for spark

feita há 5 meses382 visualizações
1 Resposta
3

Hello,

Yes, Collstats is different in AWS DocumentDB which is unsupported as per the doc. If you are using default values for partitioning, perhaps it is using the partitioner helper because no values are being passed. If you included specific partitioning values, perhaps it might skip calling the partitioner helper and won’t call upon $collStats. (as per this logic). Recommend to test this at your end as I havent tested before.

Alternatively, you might use a different partitioner, such as the HashPartitioner. or disable partitioning by setting the spark.mongodb.input.partitioner configuration property to none.

AWS
ENGENHEIRO DE SUPORTE
respondido há 5 meses

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas