IXSCAN slow on $regex search including "."

0

I have a simple collection, sites with ~10m obejcts. It contains two fields, domain and reverse_domain, both having a unique index.

Now finding documents with a domain starting with something works like a charm db.sites.find( { domain: { $regex:/^amazon/} } ).hint({domain:1})

exectionStats shows: executionStats: { executionSuccess: true, executionTimeMillis: '5.023', planningTimeMillis: '0.448', executionStages: { stage: 'SUBSCAN', nReturned: '597', executionTimeMillisEstimate: '4.506', inputStage: { stage: 'SORT', nReturned: '597', executionTimeMillisEstimate: '4.326', sortPattern: { domain: 1 }, inputStage: { stage: 'FETCH', nReturned: '597', executionTimeMillisEstimate: '2.837', inputStage: { stage: 'IXOR', nReturned: '0', executionTimeMillisEstimate: '0.327', inputStages: [ { stage: 'IXSCAN', nReturned: '600', executionTimeMillisEstimate: '0.318', indexName: 'domain_1' }, { stage: 'IXSCAN', nReturned: '0', executionTimeMillisEstimate: '0.007', indexName: 'domain_1' } ] } } } } }

The problem is that using it on the reverse_domain it scans it all.

db.sites.find( { reverse_domain: { $regex:/^com\.amazon/} } ).hint({reverse_domain:1})

The stats are: executionStats: { executionSuccess: true, executionTimeMillis: '27047.026', planningTimeMillis: '0.366', executionStages: { stage: 'SUBSCAN', nReturned: '345', executionTimeMillisEstimate: '27046.556', inputStage: { stage: 'IXSCAN', nReturned: '345', executionTimeMillisEstimate: '27046.456', indexName: 'reverse_domain_1', direction: 'forward' } } }

The reverse_domain index, works, as it is used when trying: db.sites.find({reverse_domain:'com.amazon.aws'})

with stats: executionStats: { executionSuccess: true, executionTimeMillis: '0.184', planningTimeMillis: '0.137', executionStages: { stage: 'IXSCAN', nReturned: '1', executionTimeMillisEstimate: '0.027', indexName: 'reverse_domain_1', direction: 'forward' } }

My gut feeling tells me that it has something to do with the affinity of the "com." start. Searching using a different superdomain fx. "shop." gives much better results. Trying searching for "com111.example" is ultra fast while searching for "com.111" is slow, which is weird as "com" and "com." should have roughly the same matches in a huge db of domains.

My question is if anyone can explain what is going on, and why the "." matters so much.

Cheers

Kezzel
已提問 7 個月前檢視次數 41 次
沒有答案

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南