Tested in Aqua Data Studio 14.0.0-rc-44 Build #: 34314 on Ubuntu 12.04( Mongo DB 2.4.3) and Windows XP ( Mongo DB 2.0.9).
Emil, it looks like the "null" record from insert into baseball(valuation,city,mangerName,stats,colors) values (null,null,null,null,null)
is causing the issue.
Well, this issue is not related to null values, but actually occurs when flattening unstructured collections. I think we should focus more on the specification rules for flattening documents (esp. for unstructured data) and establish how it should behave. I've linked this issue to #8424; we have there a similar scenario which can be reduced to the following:
INSERT INTO test1 VALUES ({ "Irrigated land": "NA (2008)" }) go INSERT INTO test1 VALUES ({ "Irrigated land": { "quantity": 1.3, "unit": "sq km" } }) go select * from test1 // we get the flattened results, as expected // // _id Irrigated land Irrigated land.quantity Irrigated land.unit // ------------------------ -------------------------------------- -------------------------- ---------------------- // 525fe30c44aeee909e14c084 NA (2008) (null) (null) // 525fe30c44aeee909e14c085 { "quantity" : 1.3, "unit" : 'sq km' } 1.3 sq km go select "Irrigated land" from test1 // resultset is not flattened // // Irrigated land // -------------------------------------- // NA (2008) // { "quantity" : 1.3, "unit" : 'sq km' }
I can fix the JDBC driver implementation so that for the above second scenario we get the same flattened results as we get for the first scenario (excepting the _id column that won't be displayed). This means that we will have an extra column named "Irrigated land" that will contain a JSON formatted cell on the second row (see the value highlighted with yellow). But that's the price we pay for flattening unstructured documents; otherwise we would loose the "NA (2008)" value which won't be displayed if we display columns only for the subfields (i.e just the "Irrigated land.quantity" and the "Irrigated land.unit" columns).
By applying this fix we also solve the inconsistency reported on the current issue (#10444). Jenny, please tell me if you have other suggestions regarding the way we should flatten unstructured documents.
Well, this issue is not related to null values, but actually occurs when flattening unstructured collections. I think we should focus more on the specification rules for flattening documents (esp. for unstructured data) and establish how it should behave. I've linked this issue to #8424; we have there a similar scenario which can be reduced to the following:
INSERT INTO test1 VALUES ({ "Irrigated land": "NA (2008)" }) go INSERT INTO test1 VALUES ({ "Irrigated land": { "quantity": 1.3, "unit": "sq km" } }) go select * from test1 // we get the flattened results, as expected // // _id Irrigated land Irrigated land.quantity Irrigated land.unit // ------------------------ -------------------------------------- -------------------------- ---------------------- // 525fe30c44aeee909e14c084 NA (2008) (null) (null) // 525fe30c44aeee909e14c085 { "quantity" : 1.3, "unit" : 'sq km' } 1.3 sq km go select "Irrigated land" from test1 // resultset is not flattened // // Irrigated land // -------------------------------------- // NA (2008) // { "quantity" : 1.3, "unit" : 'sq km' }
I can fix the JDBC driver implementation so that for the above second scenario we get the same flattened results as we get for the first scenario (excepting the _id column that won't be displayed). This means that we will have an extra column named "Irrigated land" that will contain a JSON formatted cell on the second row (see the value highlighted with yellow). But that's the price we pay for flattening unstructured documents; otherwise we would loose the "NA (2008)" value which won't be displayed if we display columns only for the subfields (i.e just the "Irrigated land.quantity" and the "Irrigated land.unit" columns).
By applying this fix we also solve the inconsistency reported on the current issue (#10444). Jenny, please tell me if you have other suggestions regarding the way we should flatten unstructured documents.
Emil, I agree that the second query in your example should produce the same results as the first query.
Emil, I agree that the second query in your example should produce the same results as the first query.
OK, so do you consider that this side effect (the JSON output cell highlighted with yellow) to be an acceptable behaviour? This side effect already occurs for the select flatten * from test1
query with ADS v14. I don't see any other viable approach to flatten unstructured documents.
OK, so do you consider that this side effect (the JSON output cell highlighted with yellow) to be an acceptable behaviour? This side effect already occurs for the select flatten * from test1
query with ADS v14. I don't see any other viable approach to flatten unstructured documents.
This ticket mainly about format difference with and without alias. And it is real bug.
This ticket mainly about format difference with and without alias. And it is real bug.
Slava, maybe it seems I was a bit off-topic, but this inconsistency is caused by the way how the flattening mechanism is handling unstructured collections (i.e. documents having different schema). When there is a fixed schema that is common for all the documents of a given collection, the flattening rules are simple and the implementation is straightforward.
What I want to standardize is how should we flatten documents having different schema (i.e. different number of fields and different hierarchy) to display the results in a tabular format. This is not so obvious and requires some trade off for the flattening rules.
Slava, maybe it seems I was a bit off-topic, but this inconsistency is caused by the way how the flattening mechanism is handling unstructured collections (i.e. documents having different schema). When there is a fixed schema that is common for all the documents of a given collection, the flattening rules are simple and the implementation is straightforward.
What I want to standardize is how should we flatten documents having different schema (i.e. different number of fields and different hierarchy) to display the results in a tabular format. This is not so obvious and requires some trade off for the flattening rules.
OK, so do you consider that this side effect (the JSON output cell highlighted with yellow) to be an acceptable behaviour?
Yes, it is acceptable.
OK, so do you consider that this side effect (the JSON output cell highlighted with yellow) to be an acceptable behaviour?
Yes, it is acceptable.
Verified in 14.0.3-11. @slava can you verify and close once you get the latest build ?
Verified in 14.0.3-11. @slava can you verify and close once you get the latest build ?
Tested in Aqua Data Studio 14.0.3-14 Build #: 34647 on Ubuntu 12.04
As I mentioned - query with alias and without alias should produce same result set. But
select flatten_array stats from baseball group by stats
select flatten_array stats from baseball
Tested in Aqua Data Studio 14.0.3-14 Build #: 34647 on Ubuntu 12.04
As I mentioned - query with alias and without alias should produce same result set. But
select flatten_array stats from baseball group by stats
select flatten_array stats from baseball
Pls log new issue with any remaining defects
Pls log new issue with any remaining defects
Issue #10444 |
Closed |
Fixed |
Resolved |
Completion |
No due date |
Fixed Build ADS 14.0.3-11 (mongo-jdbc 1.3.9) |
No time estimate |
3 issue links |
relates to #10699
Issue #10699FLATTEN_ARRAY with/without alias in the aggregation framework returns different number of columns |
relates to #9675
Issue #9675Issues with using alsiases and GROUP BY in FLATTEN_ARRAY |
relates to #8424
Issue #8424ResultSet.getObject error when field is primitive type for one document and sub-document for another document |
Emil, it looks like the "null" record from
insert into baseball(valuation,city,mangerName,stats,colors) values (null,null,null,null,null)
is causing the issue.