IXSCAN slow on $regex search including "."

0

I have a simple collection, sites with ~10m obejcts. It contains two fields, domain and reverse_domain, both having a unique index.

Now finding documents with a domain starting with something works like a charm db.sites.find( { domain: { $regex:/^amazon/} } ).hint({domain:1})

exectionStats shows: executionStats: { executionSuccess: true, executionTimeMillis: '5.023', planningTimeMillis: '0.448', executionStages: { stage: 'SUBSCAN', nReturned: '597', executionTimeMillisEstimate: '4.506', inputStage: { stage: 'SORT', nReturned: '597', executionTimeMillisEstimate: '4.326', sortPattern: { domain: 1 }, inputStage: { stage: 'FETCH', nReturned: '597', executionTimeMillisEstimate: '2.837', inputStage: { stage: 'IXOR', nReturned: '0', executionTimeMillisEstimate: '0.327', inputStages: [ { stage: 'IXSCAN', nReturned: '600', executionTimeMillisEstimate: '0.318', indexName: 'domain_1' }, { stage: 'IXSCAN', nReturned: '0', executionTimeMillisEstimate: '0.007', indexName: 'domain_1' } ] } } } } }

The problem is that using it on the reverse_domain it scans it all.

db.sites.find( { reverse_domain: { $regex:/^com\.amazon/} } ).hint({reverse_domain:1})

The stats are: executionStats: { executionSuccess: true, executionTimeMillis: '27047.026', planningTimeMillis: '0.366', executionStages: { stage: 'SUBSCAN', nReturned: '345', executionTimeMillisEstimate: '27046.556', inputStage: { stage: 'IXSCAN', nReturned: '345', executionTimeMillisEstimate: '27046.456', indexName: 'reverse_domain_1', direction: 'forward' } } }

The reverse_domain index, works, as it is used when trying: db.sites.find({reverse_domain:'com.amazon.aws'})

with stats: executionStats: { executionSuccess: true, executionTimeMillis: '0.184', planningTimeMillis: '0.137', executionStages: { stage: 'IXSCAN', nReturned: '1', executionTimeMillisEstimate: '0.027', indexName: 'reverse_domain_1', direction: 'forward' } }

My gut feeling tells me that it has something to do with the affinity of the "com." start. Searching using a different superdomain fx. "shop." gives much better results. Trying searching for "com111.example" is ultra fast while searching for "com.111" is slow, which is weird as "com" and "com." should have roughly the same matches in a huge db of domains.

My question is if anyone can explain what is going on, and why the "." matters so much.

Cheers

Kezzel
preguntada hace 7 meses41 visualizaciones
No hay respuestas

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas