IXSCAN slow on $regex search including "."

0

I have a simple collection, sites with ~10m obejcts. It contains two fields, domain and reverse_domain, both having a unique index.

Now finding documents with a domain starting with something works like a charm db.sites.find( { domain: { $regex:/^amazon/} } ).hint({domain:1})

exectionStats shows: executionStats: { executionSuccess: true, executionTimeMillis: '5.023', planningTimeMillis: '0.448', executionStages: { stage: 'SUBSCAN', nReturned: '597', executionTimeMillisEstimate: '4.506', inputStage: { stage: 'SORT', nReturned: '597', executionTimeMillisEstimate: '4.326', sortPattern: { domain: 1 }, inputStage: { stage: 'FETCH', nReturned: '597', executionTimeMillisEstimate: '2.837', inputStage: { stage: 'IXOR', nReturned: '0', executionTimeMillisEstimate: '0.327', inputStages: [ { stage: 'IXSCAN', nReturned: '600', executionTimeMillisEstimate: '0.318', indexName: 'domain_1' }, { stage: 'IXSCAN', nReturned: '0', executionTimeMillisEstimate: '0.007', indexName: 'domain_1' } ] } } } } }

The problem is that using it on the reverse_domain it scans it all.

db.sites.find( { reverse_domain: { $regex:/^com\.amazon/} } ).hint({reverse_domain:1})

The stats are: executionStats: { executionSuccess: true, executionTimeMillis: '27047.026', planningTimeMillis: '0.366', executionStages: { stage: 'SUBSCAN', nReturned: '345', executionTimeMillisEstimate: '27046.556', inputStage: { stage: 'IXSCAN', nReturned: '345', executionTimeMillisEstimate: '27046.456', indexName: 'reverse_domain_1', direction: 'forward' } } }

The reverse_domain index, works, as it is used when trying: db.sites.find({reverse_domain:'com.amazon.aws'})

with stats: executionStats: { executionSuccess: true, executionTimeMillis: '0.184', planningTimeMillis: '0.137', executionStages: { stage: 'IXSCAN', nReturned: '1', executionTimeMillisEstimate: '0.027', indexName: 'reverse_domain_1', direction: 'forward' } }

My gut feeling tells me that it has something to do with the affinity of the "com." start. Searching using a different superdomain fx. "shop." gives much better results. Trying searching for "com111.example" is ultra fast while searching for "com.111" is slow, which is weird as "com" and "com." should have roughly the same matches in a huge db of domains.

My question is if anyone can explain what is going on, and why the "." matters so much.

Cheers

Kezzel
asked 7 months ago39 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions