Tested Aqua Data Studio Aqua Data Studio 14.0.0-beta-74 Build #: 33340 on Ubuntu 12.04( Mongo DB 2.4.3) and Windows XP ( Mongo DB 2.0.9).
Run next scriptset and see results:
The DISTINCT clause is implemented in MongoSQL by direct calling the MongoDB native distinct command, which compares the unexpanded field "a" from each document. The flattening mechanism is implemented at the JDBC driver level (MongoDB API has no support for this on a reqular db.collection.find() call, only the Aggregation Framework has something similar: the $unwind pipeline operator, but I don't think it suits our requirements for flexibility on MongoSQL queries). That's why we get the unexpected results when these clauses are used together.
Rather than implementing the DISTINCT document selection mechanism at the JDBC driver level, we should mention in the documentation that DISTINCT is not compatible with the FLATTEN | FLATTEN_ARRAY flags.
For ADS v15, we could try to implement the FLATTEN_ARRAY DISTINCT clause by using Aggregation Framework: the DISTINCT a
clause will be emulated using GROUP BY a
then the array flattening will be performed by the $unwind operation. However, there are two concerns on this approach:
Jonathan, please update the MongoSQL Query Reference. Add 2 notes in the "DISTINCT | ALL option" section:
- For an array field, MongoDB's distinct command flattens the array values first and then returns the distinct values.
- DISTINCT is not compatible with the FLATTEN_ARRAY output option for an array of array field.
Jonathan, please update the MongoSQL Query Reference. Add 2 notes in the "DISTINCT | ALL option" section:
- For an array field, MongoDB's distinct command flattens the array values first and then returns the distinct values.
- DISTINCT is not compatible with the FLATTEN_ARRAY output option for an array of array field.
Once documentation is updated, this issue can be closed
Once documentation is updated, this issue can be closed
I see that this is not a problem with FLATTEN_ARRAY. It is problem with DISTINCT itself. DISTINCT of<documents with subdocuments> flattens arrays by istself first( extracts elements of array):
select a from eqcoll
You can see that DISTINCT first "flattens" elements from array and then removes duplicates(. rows 3 and 4 )
So either necessary to add note that DISTINCT flattens documents with subdocuments or programmaticaly realize DISTINCT in "ANSI SQL" way.
I see that this is not a problem with FLATTEN_ARRAY. It is problem with DISTINCT itself. DISTINCT of<documents with subdocuments> flattens arrays by istself first( extracts elements of array):
select a from eqcoll
You can see that DISTINCT first "flattens" elements from array and then removes duplicates(. rows 3 and 4 )
So either necessary to add note that DISTINCT flattens documents with subdocuments or programmaticaly realize DISTINCT in "ANSI SQL" way.
Slava, you have a valid point that the select distinct a from eqcoll query flattens the array first and then removes duplicates. However, this is how MongoDB behaves.
Using the MongoDB shell to execute the "distinct" command:
> db.runCommand({distinct:"eqcoll", key:"a"})
{
"values" : [
1,
2,
3,
4,
[
1,
2
],
[
2,
1
]
],
...
}
> db.eqcoll.distinct('a')
[ 1, 2, 3, 4, [ 1, 2 ], [ 2, 1 ] ]
We'll add a note in our documentation that MongoDB flattens an array field when processing the DISTINCT command.
Slava, you have a valid point that the select distinct a from eqcoll query flattens the array first and then removes duplicates. However, this is how MongoDB behaves.
Using the MongoDB shell to execute the "distinct" command:
> db.runCommand({distinct:"eqcoll", key:"a"})
{
"values" : [
1,
2,
3,
4,
[
1,
2
],
[
2,
1
]
],
...
}
> db.eqcoll.distinct('a')
[ 1, 2, 3, 4, [ 1, 2 ], [ 2, 1 ] ]
We'll add a note in our documentation that MongoDB flattens an array field when processing the DISTINCT command.
a note has been added
a note has been added
Verified in Aqua Data Studio Aqua Data Studio 14.0.0-beta-79 Build #: 33402 on Ubuntu 12.04( Mongo DB 2.4.3) and Windows XP ( Mongo DB 2.0.9).
Verified in Aqua Data Studio Aqua Data Studio 14.0.0-beta-79 Build #: 33402 on Ubuntu 12.04( Mongo DB 2.4.3) and Windows XP ( Mongo DB 2.0.9).
Issue #9676 |
Closed |
Fixed |
Resolved |
Completion |
No due date |
No fixed build |
No time estimate |
1 issue link |
relates to #14359
Issue #14359Strange result of DISTINCT from FLATTEN_ARRAY |
The DISTINCT clause is implemented in MongoSQL by direct calling the MongoDB native distinct command, which compares the unexpanded field "a" from each document. The flattening mechanism is implemented at the JDBC driver level (MongoDB API has no support for this on a reqular db.collection.find() call, only the Aggregation Framework has something similar: the $unwind pipeline operator, but I don't think it suits our requirements for flexibility on MongoSQL queries). That's why we get the unexpected results when these clauses are used together.
Rather than implementing the DISTINCT document selection mechanism at the JDBC driver level, we should mention in the documentation that DISTINCT is not compatible with the FLATTEN | FLATTEN_ARRAY flags.
For ADS v15, we could try to implement the FLATTEN_ARRAY DISTINCT clause by using Aggregation Framework: the
DISTINCT a
clause will be emulated usingGROUP BY a
then the array flattening will be performed by the $unwind operation. However, there are two concerns on this approach: