#10444: Different format of FLATTEN_ARRAY when aliases used

slavakiev reported 2013-10-15T09:58:59Z · last modified 2013-11-06T19:21:06Z

Different format of FLATTEN_ARRAY when aliases used

Dev	emil.goicovici
QA	tariqrahiman

Priority	Low
Complexity	Moderate

Component	MongoSQL
Version	14.0

Tested in Aqua Data Studio 14.0.0-rc-44 Build #: 34314 on Ubuntu 12.04( Mongo DB 2.4.3) and Windows XP ( Mongo DB 2.0.9).

It is link to closed ticket 9675. When data changed - same errors appear.

drop collection baseball

INSERT INTO baseball

VALUES (

{

"teamName": "Cubs-2",

"city": "Chicago",

"valuation": 10,

"managerName": { "first": "John", "last": "Zimmer" },

"colors": [ "blue", "gray" ],

"stats":

[

{

"year": 1904,

"wins": 100,

"mostRbis": 1000,

"grade": "A",

"battingAvg": 0.300

{

"year": 1987,

"wins":80,

"mostRbis":200,

"grade":"A",

"battingAvg":0.267

}

]

} )

INSERT INTO baseball

VALUES (

{

"teamName": "Cubs",

"city": "Luis",

"valuation": 100,

"managerName": { "first": "Dale", "last": "Sveum" },

"colors": [ "blue", "green","third" ],

"stats":

[

{

"year": 2008,

"wins": 60

{

"year": 1997,

"wins":100

}

]

} )

INSERT INTO baseball

VALUES (

{

"teamName": "Cubs-2",

"city": "Chicago",

"valuation": 10,

"managerName": { "first": "John", "last": "Belamy" },

"colors": [ "black", "grey" ],

"stats":

[

{

"year": 1008,

"wins": 0

{

"year": 1965,

"wins":2

}

]

} )

insert into baseball(valuation,city,mangerName,stats,colors) values (null,null,null,null,null)

select flatten_array stats from baseball group by stats

--only 1 column

select flatten_array stats as sts from baseball group by stats

--many columns

select flatten_array min(stats) from baseball

select flatten_array min(stats) from baseball group by stats

select flatten_array stats from baseball

--one column

select flatten_array stats as sts from baseball

--one column

select flatten_array first(stats) from baseball

select flatten_array distinct stats from baseball

--1 column only

select flatten_array distinct stats as sts from baseball

--many columns

All Comments (10) Change History

JennyNishimura 2013-10-15T16:25:35Z

Emil, it looks like the "null" record from insert into baseball(valuation,city,mangerName,stats,colors) values (null,null,null,null,null) is causing the issue.

emil.goicovici 2013-10-17T13:32:57Z

Well, this issue is not related to null values, but actually occurs when flattening unstructured collections. I think we should focus more on the specification rules for flattening documents (esp. for unstructured data) and establish how it should behave. I've linked this issue to #8424; we have there a similar scenario which can be reduced to the following:

INSERT INTO test1 VALUES ({
    "Irrigated land": "NA (2008)"
})
go

INSERT INTO test1 VALUES ({
    "Irrigated land": {
            "quantity": 1.3,
            "unit": "sq km"
    }
})
go

select * from test1
// we get the flattened results, as expected
//
// _id                       Irrigated land                          Irrigated land.quantity     Irrigated land.unit    
// ------------------------  --------------------------------------  --------------------------  ----------------------
// 525fe30c44aeee909e14c084  NA (2008)                               (null)                      (null)                 
// 525fe30c44aeee909e14c085  { "quantity" : 1.3, "unit" : 'sq km' }  1.3                         sq km                

go
select "Irrigated land" from test1
// resultset is not flattened 
//
// Irrigated land                         
// --------------------------------------
// NA (2008)                              
// { "quantity" : 1.3, "unit" : 'sq km' }

I can fix the JDBC driver implementation so that for the above second scenario we get the same flattened results as we get for the first scenario (excepting the _id column that won't be displayed). This means that we will have an extra column named "Irrigated land" that will contain a JSON formatted cell on the second row (see the value highlighted with yellow). But that's the price we pay for flattening unstructured documents; otherwise we would loose the "NA (2008)" value which won't be displayed if we display columns only for the subfields (i.e just the "Irrigated land.quantity" and the "Irrigated land.unit" columns).

By applying this fix we also solve the inconsistency reported on the current issue (#10444). Jenny, please tell me if you have other suggestions regarding the way we should flatten unstructured documents.

INSERT INTO test1 VALUES ({
    "Irrigated land": "NA (2008)"
})
go

INSERT INTO test1 VALUES ({
    "Irrigated land": {
            "quantity": 1.3,
            "unit": "sq km"
    }
})
go

select * from test1
// we get the flattened results, as expected
//
// _id                       Irrigated land                          Irrigated land.quantity     Irrigated land.unit    
// ------------------------  --------------------------------------  --------------------------  ----------------------
// 525fe30c44aeee909e14c084  NA (2008)                               (null)                      (null)                 
// 525fe30c44aeee909e14c085  { "quantity" : 1.3, "unit" : 'sq km' }  1.3                         sq km                

go
select "Irrigated land" from test1
// resultset is not flattened 
//
// Irrigated land                         
// --------------------------------------
// NA (2008)                              
// { "quantity" : 1.3, "unit" : 'sq km' }

JennyNishimura 2013-10-17T16:54:21Z

Emil, I agree that the second query in your example should produce the same results as the first query.

emil.goicovici 2013-10-17T17:50:39Z · (edited)

OK, so do you consider that this side effect (the JSON output cell highlighted with yellow) to be an acceptable behaviour? This side effect already occurs for the select flatten * from test1 query with ADS v14. I don't see any other viable approach to flatten unstructured documents.

slavakiev 2013-10-17T18:04:10Z

This ticket mainly about format difference with and without alias. And it is real bug.

emil.goicovici 2013-10-17T18:28:33Z

Slava, maybe it seems I was a bit off-topic, but this inconsistency is caused by the way how the flattening mechanism is handling unstructured collections (i.e. documents having different schema). When there is a fixed schema that is common for all the documents of a given collection, the flattening rules are simple and the implementation is straightforward.

What I want to standardize is how should we flatten documents having different schema (i.e. different number of fields and different hierarchy) to display the results in a tabular format. This is not so obvious and requires some trade off for the flattening rules.

JennyNishimura 2013-10-17T19:40:57Z

OK, so do you consider that this side effect (the JSON output cell highlighted with yellow) to be an acceptable behaviour?

Yes, it is acceptable.

tariqrahiman 2013-10-30T20:51:31Z

Verified in 14.0.3-11. @slava can you verify and close once you get the latest build ?

slavakiev 2013-11-01T11:13:40Z

Tested in Aqua Data Studio 14.0.3-14 Build #: 34647 on Ubuntu 12.04

As I mentioned - query with alias and without alias should produce same result set. But

select flatten_array stats from baseball group by stats

--outputs 7 columns. First column - "stats"

select flatten_array stats as sts from baseball group by stats

--outputs only 6 columns. Without "sts" as first column

while without group by both queries return 7 columns

select flatten_array stats from baseball

--outputs 7 columns. First column - "stats"

select flatten_array stats as sts from baseball

--outputs 7 columns. "sts" as first column

SachinPrakash 2013-11-06T19:21:06Z

Pls log new issue with any remaining defects

Search Tips

Aqua Data Studio / nhilam

Different format of FLATTEN_ARRAY when aliases used

Issue #10444

Completion

3 issue links

Issue #10699

Issue #9675

Issue #8424

Search Tips

Aqua Data Studio / nhilam

Title

Different format of FLATTEN_ARRAY when aliases used

Issue #10444

Completion

3 issue links

Issue #10699

Issue #9675

Issue #8424