select count(a) as cnt from arrnull
select count(a) as cnt,a from arrnull group by a
select a,b from arrnull where a is null group by a,b
select a,b from arrnull where a is not null group by a,b
select count(a[0]) as cnt from arrnull
select a[0] as a0,count(a[0]) as cnt from arrnull group by a0
select a[0] ,b from arrnull where a[0] is null
select count(a[0]) from arrnull where a[0] is null
|
86 KB
1)select count(a) as cnt from arrnull
--wrong from SQL point of view--cnt--1go
db.arrnull.count({$and:[ { "a" : { "$exists" : true, $not:{$type:10} } }]})
3)select a,b from arrnull where a is null group by a,b
--wrong from SQL point of view( see result of count)--a b---- -----[ 1, null ] 1--[ null , 3 ] 2--[ null , null ] 3--(null) 5go
db.arrnull.aggregate({ "$match" : { "a" : { "$type" : 10}}}, { "$group" : { "_id" : { "a" : "$a" , "b" : "$b"}}}, { "$project" : { "_id" : 0 , "a" : "$_id.a" , "b" : "$_id.b"}})
4)select a,b from arrnull where a is not null group by a,b
--wrong from SQL point of view( see result of count)--a b---- ------[ ] 4go
db.arrnull.aggregate({ "$match" : { "a" : {"$exists":true, "$not": {"$type" : 10}}}}, { "$group" : { "_id" : { "a" : "$a" , "b" : "$b"}}}, { "$project" : { "_id" : 0 , "a" : "$_id.a" , "b" : "$_id.b"}})
5)select count(a[0]) as cnt from arrnull
--cnt--0
6)select a[0] as a0,count(a[0]) as cnt from arrnull group by a0
--a0 cnt--(null) 0--[ ] 4go
db.arrnull.aggregate({ "$group" : { "_id" : "$a.0" , "cnt" : { "$sum" : { "$cond" : [ { "$or" : [ "$a.0" , { "$eq" : [ "$a.0" , 0]}]} , 1 , 0]}}}}, { "$project" : { "_id" : 0 , "a0" : "$_id" , "cnt" : 1}})
8) --but count wrongselect count(a[0]) from arrnull where a[0] is null
--cnt------0--should be 2go
I see that I should give more clarifications and separate problems/solutions:
Problems with elements of arrays: 1. Currently program mostly realizes IS NOT NULL through $ne:null. This does not work for elements of array( see (5)). So seems the easiest solution is to change all over the program $ne:null on $not:{$type:10}. ( I mean where it is already used for the realization of IS NOT NULL for any types).
2. In aggregation framework program uses $cond statement for the COUNT( ) which does not work for the elements of array( see (6)). This can be fixed by using {$match:{$and:[ { "a.0" : { "$exists" : true, $not:{$type:10} } }]}} instead $cond
3.
Problems with arrays itself: Mongo DB interprests array as NULL if any of its element is null ( see output of (3)). This is opposite to the SQL point of view which is currently realized in the program for COUNT( ) of array by means of $cond statement in aggregation framework (see output of (2)).
1. But program does not use this way for the COUNT( ) of array when aggreagation framework is not used ( see (1)). This can be solved by using aggregation framework in these case too with $cond statement.
2. Realization of IS NULL/IS NOT NULL for arrays do not use SQL way( $cond statement) ( see (3) and (4))
3. COUNT( ), IS NULL/IS NOT NULL do not work for array when DISTINCT is used. ( See ticket 9904). Necessary use ADDTOSET in aggregation framework and $cond statement.
If these 3 conditions for array hard to do then probably we can refuse for SQL point of view for arrays and just use {$type:10} instead $cond and thus accept Mongo DB point of view for NULL array
I see that I should give more clarifications and separate problems/solutions:
Problems with elements of arrays: 1. Currently program mostly realizes IS NOT NULL through $ne:null. This does not work for elements of array( see (5)). So seems the easiest solution is to change all over the program $ne:null on $not:{$type:10}. ( I mean where it is already used for the realization of IS NOT NULL for any types).
2. In aggregation framework program uses $cond statement for the COUNT( ) which does not work for the elements of array( see (6)). This can be fixed by using {$match:{$and:[ { "a.0" : { "$exists" : true, $not:{$type:10} } }]}} instead $cond
3.
Problems with arrays itself: Mongo DB interprests array as NULL if any of its element is null ( see output of (3)). This is opposite to the SQL point of view which is currently realized in the program for COUNT( ) of array by means of $cond statement in aggregation framework (see output of (2)).
1. But program does not use this way for the COUNT( ) of array when aggreagation framework is not used ( see (1)). This can be solved by using aggregation framework in these case too with $cond statement.
2. Realization of IS NULL/IS NOT NULL for arrays do not use SQL way( $cond statement) ( see (3) and (4))
3. COUNT( ), IS NULL/IS NOT NULL do not work for array when DISTINCT is used. ( See ticket 9904). Necessary use ADDTOSET in aggregation framework and $cond statement.
If these 3 conditions for array hard to do then probably we can refuse for SQL point of view for arrays and just use {$type:10} instead $cond and thus accept Mongo DB point of view for NULL array
Problems with elements of arrays: 1. Currently program mostly realizes IS NOT NULL through $ne:null. This does not work for elements of array( see (5)). So seems the easiest solution is to change all over the program $ne:null on $not:{$type:10}. ( I mean where it is already used for the realization of IS NOT NULL for any types).
Yes, I've replaced the implementation for the IS NOT NULL from the former {$ne:null} to the {$not: {$type:10}} and this will be reflected on the mongo-jdbc 1.3.3 will be integrated into ADS.
2. In aggregation framework program uses $cond statement for the COUNT( ) which does not work for the elements of array( see (6)). This can be fixed by using {$match:{$and:[ { "a.0" : { "$exists" : true, $not:{$type:10} } }]}} instead $cond
The $cond trick is used as the implementation for the COUNT() function when Aggregation Framework is involved and this way we are able to count only existing, non-null fields. I can't proceed with your proposal to use a $match filter before counting the occurences of that field using { cnt: {$sum: 1} } because it would break other aggregate functions when used in the same query.
e.g.
select count(a[0]), sum(b) from arrnull
if the JDBC implementation would use the {$match:{$and:[ { "a.0" : { "$exists" : true, $not:{$type:10} } }]}} filter as proposed, the sum(b) won't consider the last document (the {b:6} one), because the field "a" doesn't exist inside this document.
Problems with arrays itself: Mongo DB interprests array as NULL if any of its element is null ( see output of (3)). This is opposite to the SQL point of view which is currently realized in the program for COUNT( ) of array by means of $cond statement in aggregation framework (see output of (2)).
1. But program does not use this way for the COUNT( ) of array when aggreagation framework is not used ( see (1)). This can be solved by using aggregation framework in these case too with $cond statement.
We can't simply use the $cond approach for every scenario where the COUNT() function is involved. There is an important performance penalty that is paid when Aggregation Framework is used (see issue #9058).
2. Realization of IS NULL/IS NOT NULL for arrays do not use SQL way( $cond statement) ( see (3) and (4))
3. COUNT( ), IS NULL/IS NOT NULL do not work for array when DISTINCT is used. ( See ticket 9904). Necessary use ADDTOSET in aggregation framework and $cond statement.
Let me check these complaints on the JDBC implementation and see how can be fixed.
If these 3 conditions for array hard to do then probably we can refuse for SQL point of view for arrays and just use {$type:10} instead $cond and thus accept Mongo DB point of view for NULL array.
I think we cannot get rid of the $cond approach when Aggregation Framework is used, because an aggregate function (the COUNT() in this case) can occur along with other projection fields (other aggregate functions etc) as explained above with the count(a[0]), sum(b) example.
I know it is important to have consistent behaviour for the COUNT() function, but this seems impossible considering the various translations (SQL -> MongoShell equiv) that are made for each case, the MongoSQL flexibility that allows several aggregate functions in the same query and the MongoDB Server point of view for interpreting null elements inside arrays.
Problems with elements of arrays: 1. Currently program mostly realizes IS NOT NULL through $ne:null. This does not work for elements of array( see (5)). So seems the easiest solution is to change all over the program $ne:null on $not:{$type:10}. ( I mean where it is already used for the realization of IS NOT NULL for any types).
Yes, I've replaced the implementation for the IS NOT NULL from the former {$ne:null} to the {$not: {$type:10}} and this will be reflected on the mongo-jdbc 1.3.3 will be integrated into ADS.
2. In aggregation framework program uses $cond statement for the COUNT( ) which does not work for the elements of array( see (6)). This can be fixed by using {$match:{$and:[ { "a.0" : { "$exists" : true, $not:{$type:10} } }]}} instead $cond
The $cond trick is used as the implementation for the COUNT() function when Aggregation Framework is involved and this way we are able to count only existing, non-null fields. I can't proceed with your proposal to use a $match filter before counting the occurences of that field using { cnt: {$sum: 1} } because it would break other aggregate functions when used in the same query.
e.g.
select count(a[0]), sum(b) from arrnull
if the JDBC implementation would use the {$match:{$and:[ { "a.0" : { "$exists" : true, $not:{$type:10} } }]}} filter as proposed, the sum(b) won't consider the last document (the {b:6} one), because the field "a" doesn't exist inside this document.
Problems with arrays itself: Mongo DB interprests array as NULL if any of its element is null ( see output of (3)). This is opposite to the SQL point of view which is currently realized in the program for COUNT( ) of array by means of $cond statement in aggregation framework (see output of (2)).
1. But program does not use this way for the COUNT( ) of array when aggreagation framework is not used ( see (1)). This can be solved by using aggregation framework in these case too with $cond statement.
We can't simply use the $cond approach for every scenario where the COUNT() function is involved. There is an important performance penalty that is paid when Aggregation Framework is used (see issue #9058).
2. Realization of IS NULL/IS NOT NULL for arrays do not use SQL way( $cond statement) ( see (3) and (4))
3. COUNT( ), IS NULL/IS NOT NULL do not work for array when DISTINCT is used. ( See ticket 9904). Necessary use ADDTOSET in aggregation framework and $cond statement.
Let me check these complaints on the JDBC implementation and see how can be fixed.
If these 3 conditions for array hard to do then probably we can refuse for SQL point of view for arrays and just use {$type:10} instead $cond and thus accept Mongo DB point of view for NULL array.
I think we cannot get rid of the $cond approach when Aggregation Framework is used, because an aggregate function (the COUNT() in this case) can occur along with other projection fields (other aggregate functions etc) as explained above with the count(a[0]), sum(b) example.
I know it is important to have consistent behaviour for the COUNT() function, but this seems impossible considering the various translations (SQL -> MongoShell equiv) that are made for each case, the MongoSQL flexibility that allows several aggregate functions in the same query and the MongoDB Server point of view for interpreting null elements inside arrays.
1. Performance of single COUNT( ) is faster through aggregation framework( I open new ticket 9971 and link with ticket 9058) so we can perform all COUNT( ) through aggreagation framework.
By the way better to change trick { "$sum" : { "$cond" : [ { "$or" : [ "$a.0" , { "$eq" : [ "$a.0" , 0]}]} , 1 , 0]}}}}
with using ifNULL( )
{"$sum" : {"$cond" : [{ "$eq" : [ {$ifNull:["$a",null]},null]},0, 1]}}
it is more correct and works the same
1. Performance of single COUNT( ) is faster through aggregation framework( I open new ticket 9971 and link with ticket 9058) so we can perform all COUNT( ) through aggreagation framework.
By the way better to change trick { "$sum" : { "$cond" : [ { "$or" : [ "$a.0" , { "$eq" : [ "$a.0" , 0]}]} , 1 , 0]}}}}
with using ifNULL( )
{"$sum" : {"$cond" : [{ "$eq" : [ {$ifNull:["$a",null]},null]},0, 1]}}
it is more correct and works the same
Scenario 5) has been fixed on the mongo-jdbc 1.3.3 drop. Fixes for the other scenarios are postponed for ADS v16.
Scenario 5) has been fixed on the mongo-jdbc 1.3.3 drop. Fixes for the other scenarios are postponed for ADS v16.
Given the introduction of MongoJS, we won't be resolving this issue unless we get a customer request to do so.
@qa: pls ensure scenarios work in MongoJS
Given the introduction of MongoJS, we won't be resolving this issue unless we get a customer request to do so.
@qa: pls ensure scenarios work in MongoJS
Issue #9905 |
Closed |
Won't Fix |
Resolved |
Completion |
No due date |
No fixed build |
No time estimate |
1 issue link |
relates to #9971
Issue #9971Test of COUNT() performance |
db.arrnull.count({$and:[ { "a" : { "$exists" : true, $not:{$type:10} } }]})
db.arrnull.aggregate({ "$match" : { "a" : { "$type" : 10}}}, { "$group" : { "_id" : { "a" : "$a" , "b" : "$b"}}}, { "$project" : { "_id" : 0 , "a" : "$_id.a" , "b" : "$_id.b"}})
db.arrnull.aggregate({ "$match" : { "a" : {"$exists":true, "$not": {"$type" : 10}}}}, { "$group" : { "_id" : { "a" : "$a" , "b" : "$b"}}}, { "$project" : { "_id" : 0 , "a" : "$_id.a" , "b" : "$_id.b"}})
gives the same result.
db.arrnull.aggregate({ "$group" : { "_id" : "$a.0" , "cnt" : { "$sum" : { "$cond" : [ { "$or" : [ "$a.0" , { "$eq" : [ "$a.0" , 0]}]} , 1 , 0]}}}}, { "$project" : { "_id" : 0 , "a0" : "$_id" , "cnt" : 1}})