IXSCAN slow on $regex search including "."

0

I have a simple collection, sites with ~10m obejcts. It contains two fields, domain and reverse_domain, both having a unique index.

Now finding documents with a domain starting with something works like a charm db.sites.find( { domain: { $regex:/^amazon/} } ).hint({domain:1})

exectionStats shows: executionStats: { executionSuccess: true, executionTimeMillis: '5.023', planningTimeMillis: '0.448', executionStages: { stage: 'SUBSCAN', nReturned: '597', executionTimeMillisEstimate: '4.506', inputStage: { stage: 'SORT', nReturned: '597', executionTimeMillisEstimate: '4.326', sortPattern: { domain: 1 }, inputStage: { stage: 'FETCH', nReturned: '597', executionTimeMillisEstimate: '2.837', inputStage: { stage: 'IXOR', nReturned: '0', executionTimeMillisEstimate: '0.327', inputStages: [ { stage: 'IXSCAN', nReturned: '600', executionTimeMillisEstimate: '0.318', indexName: 'domain_1' }, { stage: 'IXSCAN', nReturned: '0', executionTimeMillisEstimate: '0.007', indexName: 'domain_1' } ] } } } } }

The problem is that using it on the reverse_domain it scans it all.

db.sites.find( { reverse_domain: { $regex:/^com\.amazon/} } ).hint({reverse_domain:1})

The stats are: executionStats: { executionSuccess: true, executionTimeMillis: '27047.026', planningTimeMillis: '0.366', executionStages: { stage: 'SUBSCAN', nReturned: '345', executionTimeMillisEstimate: '27046.556', inputStage: { stage: 'IXSCAN', nReturned: '345', executionTimeMillisEstimate: '27046.456', indexName: 'reverse_domain_1', direction: 'forward' } } }

The reverse_domain index, works, as it is used when trying: db.sites.find({reverse_domain:'com.amazon.aws'})

with stats: executionStats: { executionSuccess: true, executionTimeMillis: '0.184', planningTimeMillis: '0.137', executionStages: { stage: 'IXSCAN', nReturned: '1', executionTimeMillisEstimate: '0.027', indexName: 'reverse_domain_1', direction: 'forward' } }

My gut feeling tells me that it has something to do with the affinity of the "com." start. Searching using a different superdomain fx. "shop." gives much better results. Trying searching for "com111.example" is ultra fast while searching for "com.111" is slow, which is weird as "com" and "com." should have roughly the same matches in a huge db of domains.

My question is if anyone can explain what is going on, and why the "." matters so much.

Cheers

Kezzel
posta 7 mesi fa41 visualizzazioni
Nessuna risposta

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande