#9058: slow performance of 'SELECT count(_id) ...' on ADS

rkshakya reported 2013-05-21T08:24:01Z · last modified 2013-05-25T04:53:03Z

slow performance of 'SELECT count(_id) ...' on ADS

Dev	emil.goicovici
QA	tariqrahiman

Priority	Minor
Complexity	Unknown

Component	MongoSQL
Version	14.0

Observed on ADS 14-alpha-16.

On a collection with 5 million docs, when we run following query on ADS, it takes about 11 secs to return the result (Pls refer to the attached screenshot)

select count(_id) from hupla

whereas same operation run via mongo shell returns the result within a second.

db.hupla.find().count()

Given that _id field is indexed and it is count operation, the time taken on ADS seems high.

Could there be any optimization tweaks done on ADS end to make these run faster?

2 attachments

slow_execution.png

2013-05-21T08:24:01Z

28 KB
slowcount.png

2013-05-22T12:35:27Z

72 KB

All Comments (7) Change History

emil.goicovici 2013-05-22T11:18:51Z

The correct MongoShell command for this query

select count(_id) from hupla

db.hupla.count({_id: {$exists: true}})

which also takes a lot of time under MongoShell as the check has to be performed on each document (row) to see if the given field really exist. Of course, this check is redundant for the _id field as this field always exist on a MongoDB collection, but for other fields , this check make a difference on the results (considering that any collection has an unstructured schema and some field may exist in one document and be absent in another).

In order to support complex queries (e.g. select count(_id), sum(_id), avg(_id) from hupla ), the count() function is currently implemented inside MongoSQL using the $sum aggregation operator (Aggregation Framework).

What I can improve is the query below:

select count(*) from hupla

i.e. implement it directly, without using the $sum operator and thus making it equivalent to this MongoShell command:

db.hupla.count()

The correct MongoShell command for this query

select count(_id) from hupla

db.hupla.count({_id: {$exists: true}})

What I can improve is the query below:

select count(*) from hupla

i.e. implement it directly, without using the $sum operator and thus making it equivalent to this MongoShell command:

db.hupla.count()

rkshakya 2013-05-22T12:34:59Z

Emil, i checked query select count(*) from hupla against same collection with 5 million docs and it took even more time (23 s) (Pls refer to image slowcount.png).

whereas executing db.hupla.find().count() on mongo shell returns result in less than 1 sec.

So any optimization you could do will help I believe. Thank you

emil.goicovici 2013-05-22T19:19:24Z

OK, I'll make the changes for the count(*) .

I would like to know how long it takes to execute on your collection the MongoShell command below:

> db.hupla.count({_id: {$exists: true}})

then try the same command on a non-indexed field of the "hupla" collection.

I emphasize this kind of MongoShell query because it is different and it takes much more time than the

> db.hupla.count()

which executes almost instantly, because it just count the documents, without performing any operation on them (i.e. test if field exists etc).

So, please tell me how long it takes under MongoShell each of the two queries above.

OK, I'll make the changes for the count(*) .

I would like to know how long it takes to execute on your collection the MongoShell command below:

> db.hupla.count({_id: {$exists: true}})

then try the same command on a non-indexed field of the "hupla" collection.

I emphasize this kind of MongoShell query because it is different and it takes much more time than the

> db.hupla.count()

which executes almost instantly, because it just count the documents, without performing any operation on them (i.e. test if field exists etc).

So, please tell me how long it takes under MongoShell each of the two queries above.

NielsGron 2013-05-22T21:02:35Z

MongoDB 'count()' is very slow. How do we refine/work around with it?

rkshakya 2013-05-23T03:04:56Z · (edited)

Hi Emil,

Following are the metrics(measured with profiler on) for the query execution on collection hupla (5 million docs) (ran it on primary of the repl set):

1) db.hupla.count({_id : {$exists:true}}) took about 2730 millis.

2) db.hupla.count({number : {$exists:true}}) where number is non-indexed field took about 4968 millis.

Thanks

emil.goicovici 2013-05-23T12:47:47Z

Optimized the COUNT() aggregate function when it is the only aggregate function present on the select query.

The following queries should execute now in less time:

select count(*) from hupla

select count(_id) from hupla

select count(number) from hupla

Please let me know if you notice performance improvements running them with mongo-jdbc 1.1.7.

Optimized the COUNT() aggregate function when it is the only aggregate function present on the select query.

The following queries should execute now in less time:

select count(*) from hupla

select count(_id) from hupla

select count(number) from hupla

Please let me know if you notice performance improvements running them with mongo-jdbc 1.1.7.

rkshakya 2013-05-25T04:53:03Z

Emil,

Noticed good perf. improvement in running following queries

1) select count(*) from hupla
go
--2040 millis

2) select count(_id) from hupla
go
--2665 millis

3) select count(number) from hupla
go
--3800 millis

Search Tips

Aqua Data Studio / nhilam

slow performance of 'SELECT count(_id) ...' on ADS

2 attachments

Issue #9058

Completion

1 issue link

Issue #9971

Search Tips

Aqua Data Studio / nhilam

Title

slow performance of 'SELECT count(_id) ...' on ADS

2 attachments

Issue #9058

Completion

1 issue link

Issue #9971