Observed on ADS 14-alpha-16.
On a collection with 5 million docs, when we run following query on ADS, it takes about 11 secs to return the result (Pls refer to the attached screenshot)
select count(_id) from hupla
whereas same operation run via mongo shell returns the result within a second.
db.hupla.find().count()
Given that _id field is indexed and it is count operation, the time taken on ADS seems high.
Could there be any optimization tweaks done on ADS end to make these run faster?
|
28 KB
|
72 KB
The correct MongoShell command for this query
select count(_id) from hupla
is
db.hupla.count({_id: {$exists: true}})
which also takes a lot of time under MongoShell as the check has to be performed on each document (row) to see if the given field really exist. Of course, this check is redundant for the _id field as this field always exist on a MongoDB collection, but for other fields , this check make a difference on the results (considering that any collection has an unstructured schema and some field may exist in one document and be absent in another).
In order to support complex queries (e.g. select count(_id), sum(_id), avg(_id) from hupla
), the count() function is currently implemented inside MongoSQL using the $sum aggregation operator (Aggregation Framework).
What I can improve is the query below:
select count(*) from hupla
i.e. implement it directly, without using the $sum operator and thus making it equivalent to this MongoShell command:
db.hupla.count()
Emil, i checked query select count(*) from hupla against same collection with 5 million docs and it took even more time (23 s) (Pls refer to image slowcount.png).
whereas executing db.hupla.find().count() on mongo shell returns result in less than 1 sec.
So any optimization you could do will help I believe. Thank you
Emil, i checked query select count(*) from hupla against same collection with 5 million docs and it took even more time (23 s) (Pls refer to image slowcount.png).
whereas executing db.hupla.find().count() on mongo shell returns result in less than 1 sec.
So any optimization you could do will help I believe. Thank you
OK, I'll make the changes for the count(*)
.
I would like to know how long it takes to execute on your collection the MongoShell command below:
> db.hupla.count({_id: {$exists: true}})
then try the same command on a non-indexed field of the "hupla" collection.
I emphasize this kind of MongoShell query because it is different and it takes much more time than the
> db.hupla.count()
which executes almost instantly, because it just count the documents, without performing any operation on them (i.e. test if field exists etc).
So, please tell me how long it takes under MongoShell each of the two queries above.
OK, I'll make the changes for the count(*)
.
I would like to know how long it takes to execute on your collection the MongoShell command below:
> db.hupla.count({_id: {$exists: true}})
then try the same command on a non-indexed field of the "hupla" collection.
I emphasize this kind of MongoShell query because it is different and it takes much more time than the
> db.hupla.count()
which executes almost instantly, because it just count the documents, without performing any operation on them (i.e. test if field exists etc).
So, please tell me how long it takes under MongoShell each of the two queries above.
Hi Emil,
Following are the metrics(measured with profiler on) for the query execution on collection hupla (5 million docs) (ran it on primary of the repl set):
1) db.hupla.count({_id : {$exists:true}}) took about 2730 millis.
2) db.hupla.count({number : {$exists:true}}) where number is non-indexed field took about 4968 millis.
Thanks
Hi Emil,
Following are the metrics(measured with profiler on) for the query execution on collection hupla (5 million docs) (ran it on primary of the repl set):
1) db.hupla.count({_id : {$exists:true}}) took about 2730 millis.
2) db.hupla.count({number : {$exists:true}}) where number is non-indexed field took about 4968 millis.
Thanks
Optimized the COUNT() aggregate function when it is the only aggregate function present on the select query.
The following queries should execute now in less time:
select count(*) from hupla select count(_id) from hupla select count(number) from hupla
Please let me know if you notice performance improvements running them with mongo-jdbc 1.1.7.
Optimized the COUNT() aggregate function when it is the only aggregate function present on the select query.
The following queries should execute now in less time:
select count(*) from hupla select count(_id) from hupla select count(number) from hupla
Please let me know if you notice performance improvements running them with mongo-jdbc 1.1.7.
Emil,
Noticed good perf. improvement in running following queries
1) select count(*) from hupla
go
--2040 millis
2) select count(_id) from hupla
go
--2665 millis
3) select count(number) from hupla
go
--3800 millis
Emil,
Noticed good perf. improvement in running following queries
1) select count(*) from hupla
go
--2040 millis
2) select count(_id) from hupla
go
--2665 millis
3) select count(number) from hupla
go
--3800 millis
Issue #9058 |
Closed |
Fixed |
Resolved |
Completion |
No due date |
Fixed Build ADS 14.0.0-alpha-23 (mongo-jdbc 1.1.7) |
No time estimate |
1 issue link |
relates to #9971
Issue #9971Test of COUNT() performance |
The correct MongoShell command for this query
is
which also takes a lot of time under MongoShell as the check has to be performed on each document (row) to see if the given field really exist. Of course, this check is redundant for the _id field as this field always exist on a MongoDB collection, but for other fields , this check make a difference on the results (considering that any collection has an unstructured schema and some field may exist in one document and be absent in another).
In order to support complex queries (e.g.
select count(_id), sum(_id), avg(_id) from hupla
), the count() function is currently implemented inside MongoSQL using the $sum aggregation operator (Aggregation Framework).What I can improve is the query below:
i.e. implement it directly, without using the $sum operator and thus making it equivalent to this MongoShell command: