Tested on Aqua Data Studio 14.0.0-beta-32 Build #: 32908 on Ubuntu 12.04( Mongo DB 2.4.3) and Windows XP ( Mongo DB 2.0.9).
Any select with using FROM(SELCT leaves in the DB temporary collection with name "tmp_from_subquery0".(see attached screenshot) E.g.
select valuation from (select valuation from baseball)
|
50 KB
Emil, we should not create any temporary data in customers' databases. In what situations do we create a temporary collection to process the query? We should consider storing the data in memory, and if that is not feasible, let's not support those cases.
IMHO it is possible to use tmp.mr namespace that already used for mapreduce commands which create temporary collections there. E.g. tmp.mr.mapreduce_1289483912_1
IMHO it is possible to use tmp.mr namespace that already used for mapreduce commands which create temporary collections there. E.g. tmp.mr.mapreduce_1289483912_1
Slava, thank you for the suggestion. The MapReduce commands are part of the MongoDB server. They can create temporary data.
ADS and the MongoJDBC Driver are not part of the MongoDB server. Ideally, we should not create temporary data in the database itself.
Slava, thank you for the suggestion. The MapReduce commands are part of the MongoDB server. They can create temporary data.
ADS and the MongoJDBC Driver are not part of the MongoDB server. Ideally, we should not create temporary data in the database itself.
Emil, we should not create any temporary data in customers' databases. In what situations do we create a temporary collection to process the query?
Temporary collections are created whenever a sub-query is used on a FROM clause. The resultset returned as the result the subquery execution is stored on a new collection (e.g " tmp_from_subquery0 " ) , thus providing the flexibility required by the outer query (e.g. aggregate columns, sort the result etc).
It would be difficult to provide this querying flexibility without storing the resultset of the inner query on a new collection: all aggregation & conversion functions etc should be implemented at JDBC driver level on this case.
A good example of subquery usage is the scenario described on issue #9371.
Either we introduce a new JDBC parameter that enables sub-queries on a FROM clause (i.e. disable them by default, thus providing the possibility of running non read-only queries if those users really need them) or we completely remove support for them.
A safer alternative for scenarios involving sub-queries (like the ones from issue #9371) would be that the user have to issue two statements (in a Query Analyzer Window) instead of a SELECT ... FROM SELECT .... query: an INSERT INTO <new_collection_name> SELECT ... FROM ... followed by a SELECT ... FROM <new_collection_name> .
e.g. for this scenario from #9371
select T.state, avg(T.citypop) from ( select state, city, sum(pop) as citypop from zips group by state, city ) as T group by state order by state
the alternative would be
insert into new_coll select state, city, sum(pop) as citypop from zips group by state, city GO select T.state, avg(T.citypop) from new_coll as T group by state order by state
i.e. use the same steps that the JDBC driver already performs for queries that contain subqueries on the FROM clause. By requiring the user to issue the two statements, we ensure that unwanted writes are not made on the database. The current implementation of using "temporary" collections for sub-queries also have the disadvantage that interleaving can occur if two different users issue such queries at the same time.
We should consider storing the data in memory
This is not possible, because unlike the sub-queries that occur on a WHERE filter (e.g. WHERE age IN SELECT ....
or WHERE age=SELECT MAX(age) FROM ...
etc) which eventually return just a value (singleton or array) that is used when building the filter, the sub-queries from a FROM clause involves higher flexibility , because of the GROUP BY, ORDER BY , aggregate functions etc that can be performed on the outer query.
This means that the resultset of the sub-query needs to be further processed by the rules defined on the outer query, and this processing is difficult to be performed by the JDBC driver. That's why those "temporary" collections were introduced.
Emil, we should not create any temporary data in customers' databases. In what situations do we create a temporary collection to process the query?
Temporary collections are created whenever a sub-query is used on a FROM clause. The resultset returned as the result the subquery execution is stored on a new collection (e.g " tmp_from_subquery0 " ) , thus providing the flexibility required by the outer query (e.g. aggregate columns, sort the result etc).
It would be difficult to provide this querying flexibility without storing the resultset of the inner query on a new collection: all aggregation & conversion functions etc should be implemented at JDBC driver level on this case.
A good example of subquery usage is the scenario described on issue #9371.
Either we introduce a new JDBC parameter that enables sub-queries on a FROM clause (i.e. disable them by default, thus providing the possibility of running non read-only queries if those users really need them) or we completely remove support for them.
A safer alternative for scenarios involving sub-queries (like the ones from issue #9371) would be that the user have to issue two statements (in a Query Analyzer Window) instead of a SELECT ... FROM SELECT .... query: an INSERT INTO <new_collection_name> SELECT ... FROM ... followed by a SELECT ... FROM <new_collection_name> .
e.g. for this scenario from #9371
select T.state, avg(T.citypop) from ( select state, city, sum(pop) as citypop from zips group by state, city ) as T group by state order by state
the alternative would be
insert into new_coll select state, city, sum(pop) as citypop from zips group by state, city GO select T.state, avg(T.citypop) from new_coll as T group by state order by state
i.e. use the same steps that the JDBC driver already performs for queries that contain subqueries on the FROM clause. By requiring the user to issue the two statements, we ensure that unwanted writes are not made on the database. The current implementation of using "temporary" collections for sub-queries also have the disadvantage that interleaving can occur if two different users issue such queries at the same time.
We should consider storing the data in memory
This is not possible, because unlike the sub-queries that occur on a WHERE filter (e.g. WHERE age IN SELECT ....
or WHERE age=SELECT MAX(age) FROM ...
etc) which eventually return just a value (singleton or array) that is used when building the filter, the sub-queries from a FROM clause involves higher flexibility , because of the GROUP BY, ORDER BY , aggregate functions etc that can be performed on the outer query.
This means that the resultset of the sub-query needs to be further processed by the rules defined on the outer query, and this processing is difficult to be performed by the JDBC driver. That's why those "temporary" collections were introduced.
>> The current implementation of using "temporary" collections for sub-queries also have the
>> disadvantage that interleaving can occur if two different users issue such queries at the same time.
I would not call this a disadvantage. This is a design flaw. We can't have this in our driver. ADS is not used by 1 guy working on his local database box. ADS is used by 500 users querying a central production server.
At this point, maybe the best thing to do is to disable support for subqueries and log a feature request for version 15.0 to support subqueries. I'll discuss this idea with Sachin on Thursday.
Emil, would it be easy enough to disable subquery support? And would this remove all temporary table usage?
>> The current implementation of using "temporary" collections for sub-queries also have the
>> disadvantage that interleaving can occur if two different users issue such queries at the same time.
I would not call this a disadvantage. This is a design flaw. We can't have this in our driver. ADS is not used by 1 guy working on his local database box. ADS is used by 500 users querying a central production server.
At this point, maybe the best thing to do is to disable support for subqueries and log a feature request for version 15.0 to support subqueries. I'll discuss this idea with Sachin on Thursday.
Emil, would it be easy enough to disable subquery support? And would this remove all temporary table usage?
We should have automated scripts for SELECT statements running with a user account that only has read-only access.
We should have automated scripts for SELECT statements running with a user account that only has read-only access.
Another problem with creating a temporary collection for the subquery is that it prevents users with read-only access from executing queries. Currently, if I log in as a read-only user, I'll get "db assertion failure" error when trying to use a subquery.
Another problem with creating a temporary collection for the subquery is that it prevents users with read-only access from executing queries. Currently, if I log in as a read-only user, I'll get "db assertion failure" error when trying to use a subquery.
Another problem with creating a temporary collection for the subquery is that it prevents users with read-only access from executing queries. Currently, if I log in as a read-only user, I'll get "db assertion failure" error when trying to use a subquery.
You're right, that's another problem with this approach.
Emil, would it be easy enough to disable subquery support? And would this remove all temporary table usage?
Yes, it is easy enough to make this change. I'll disable the subquery support for the next drop.
We should have automated scripts for SELECT statements running with a user account that only has read-only access.
I don't understand this request.
Another problem with creating a temporary collection for the subquery is that it prevents users with read-only access from executing queries. Currently, if I log in as a read-only user, I'll get "db assertion failure" error when trying to use a subquery.
You're right, that's another problem with this approach.
Emil, would it be easy enough to disable subquery support? And would this remove all temporary table usage?
Yes, it is easy enough to make this change. I'll disable the subquery support for the next drop.
We should have automated scripts for SELECT statements running with a user account that only has read-only access.
I don't understand this request.
>> We should have automated scripts for SELECT statements running with a
>> user account that only has read-only access.
Sachin will work with our QA team to do this. Once we put this in our automated tests, then any changes to the driver which will require write permission for a SELECT statement will fail.
>> We should have automated scripts for SELECT statements running with a
>> user account that only has read-only access.
Sachin will work with our QA team to do this. Once we put this in our automated tests, then any changes to the driver which will require write permission for a SELECT statement will fail.
Sachin will work with our QA team to do this. Once we put this in our automated tests, then any changes to the driver which will require write permission for a SELECT statement will fail.
OK, it's about regression tests run under a read-only account. Now I understand. Thanks.
Sachin will work with our QA team to do this. Once we put this in our automated tests, then any changes to the driver which will require write permission for a SELECT statement will fail.
OK, it's about regression tests run under a read-only account. Now I understand. Thanks.
The support for sub-queries on the FROM clause has been removed. Now an exception is thrown for this kind of queries (i.e. queries that don't have the SELECT ... FROM <collection_name>
syntax, but the SELECT ... FROM ( SELECT ... )
one ).
The support for sub-queries on the FROM clause has been removed. Now an exception is thrown for this kind of queries (i.e. queries that don't have the SELECT ... FROM <collection_name>
syntax, but the SELECT ... FROM ( SELECT ... )
one ).
The Query Reference wiki page should be updated by removing the sub-select syntax on the FROM clause.
e.g.
[FROM table_name1|(subselect1)]
should become
[FROM table_name1]
The Query Reference wiki page should be updated by removing the sub-select syntax on the FROM clause.
e.g.
[FROM table_name1|(subselect1)]
should become
[FROM table_name1]
The Query Reference document has been updated.
Subqueries are not allowed in the FROM clause, but they are still allowed in the WHERE clause. I tried the following queries using a read-only user and no temporary collection was created.
select * from baseball where worldChampionships between (select worldChampionships from baseball where teamName='Cubs') and (select worldChampionships from baseball where teamName='Yankees') go select * from baseball where city in (select city from baseball where teamName='Cubs') go
The Query Reference document has been updated.
Subqueries are not allowed in the FROM clause, but they are still allowed in the WHERE clause. I tried the following queries using a read-only user and no temporary collection was created.
select * from baseball where worldChampionships between (select worldChampionships from baseball where teamName='Cubs') and (select worldChampionships from baseball where teamName='Yankees') go select * from baseball where city in (select city from baseball where teamName='Cubs') go
Verified in Aqua Data Studio Aqua Data Studio 14.0.0-beta-64 Build #: 33188
Verified in Aqua Data Studio Aqua Data Studio 14.0.0-beta-64 Build #: 33188
Verified in Aqua Data Studio Aqua Data Studio 14.0.0-beta-68 Build #: 33238 on Ubuntu 12.04( Mongo DB 2.4.3) and Windows XP ( Mongo DB 2.0.9).
Verified in Aqua Data Studio Aqua Data Studio 14.0.0-beta-68 Build #: 33238 on Ubuntu 12.04( Mongo DB 2.4.3) and Windows XP ( Mongo DB 2.0.9).
Issue #9420 |
Closed |
Fixed |
Resolved |
Completion |
No due date |
Fixed Build ADS 14.0.0-beta-39 (mongo-jdbc 1.2.2) |
No time estimate |
6 issue links |
breaks #9371
Issue #9371Aggregation related queries no longer giving correct results on ADS |
breaks #8878
Issue #8878union all returns distinct rows if used in derived table |
breaks #8909
Issue #8909Issues with FROM of SELECT statement |
breaks #8647
Issue #8647select * from (select from .. ) as temp displays an extra _id column |
breaks #8728
Issue #8728Errors with the Union Clause in mongodb |
breaks #9295
Issue #9295Exception com.mongodb.MongoException$DuplicateKey |
Emil, we should not create any temporary data in customers' databases. In what situations do we create a temporary collection to process the query? We should consider storing the data in memory, and if that is not feasible, let's not support those cases.