I'm trying to create a Visual Analytics workbook from a query that is currently returning around two million rows. Currently you have to execute a query in either the Query Builder or the Query Analyzer before you can create a Visual Analytics workbook. I suspect this has a large overhead given the display of the data set as what I've observed is that Data Studio looks to be able to complete the "execution" of the query (which with a MySql driver also means pulling all the data down into a ResultSet) but yet runs out of memory when it's working on the display (ie when it says it's processing rows X, etc).
I'd like to be able to skip that display and extract that data directly into a VA packaged workbook.
@kin-hong: as you perform your memory analysis, can you look into this aspect & see how much time is actually spent on the rendering? I suspect that most of the time is spent on building the data set internally which is needed by VA as well as Query Analyzer.
By default, ADS does not make use of MySQL's JDBC ResultSet streaming and does not enable the useCursorFetch property. As a result, during ADS's query execution, the entire ResultSet raw data is first retrieved into memory, then we loop through each row and create the table model for rendering. This effectively doubles the runtime memory requirement when querying data.
If the user is using MySQL > 5.0.2, adding "useCursorFetch=true" to the driver parameters should inform the driver to use a fetch size of 5000, which will greatly reduce the maximum memory needed for querying and rendering data.
References:
JDBC API Implementation Notes (go to ResultSet section)
By default, ADS does not make use of MySQL's JDBC ResultSet streaming and does not enable the useCursorFetch property. As a result, during ADS's query execution, the entire ResultSet raw data is first retrieved into memory, then we loop through each row and create the table model for rendering. This effectively doubles the runtime memory requirement when querying data.
If the user is using MySQL > 5.0.2, adding "useCursorFetch=true" to the driver parameters should inform the driver to use a fetch size of 5000, which will greatly reduce the maximum memory needed for querying and rendering data.
References:
JDBC API Implementation Notes (go to ResultSet section)
@jasonk: let us know how well this configuration works for you.
@jasonk: let us know how well this configuration works for you.
I think we got a little sidetracked... I'll certainly test the useCursorFetch option but I'm not sure, in the context of this ticket, what that does for us? The scope of this ticket was to be able to directly extract a data set for use in VA. My assumption was that some level of memory being used is in building the display of the data in the Query Analyzer (along with the work of just doing that display) that isn't really needed given that VA doesn't display data in the same way.
And so if a query is being executed with the goal to go right into VA, I'm wonder if we can avoid the display (processing time and memory usage)?
Note, see my comments on the below ticket.. due to the memory leak I'm unable to test to see just how much memory is being used by the display :(
https://www.aquaclusters.com/app/home/project/public/aquadatastudio/issue/13369
I think we got a little sidetracked... I'll certainly test the useCursorFetch option but I'm not sure, in the context of this ticket, what that does for us? The scope of this ticket was to be able to directly extract a data set for use in VA. My assumption was that some level of memory being used is in building the display of the data in the Query Analyzer (along with the work of just doing that display) that isn't really needed given that VA doesn't display data in the same way.
And so if a query is being executed with the goal to go right into VA, I'm wonder if we can avoid the display (processing time and memory usage)?
Note, see my comments on the below ticket.. due to the memory leak I'm unable to test to see just how much memory is being used by the display :(
https://www.aquaclusters.com/app/home/project/public/aquadatastudio/issue/13369
Hi Jason,
You mentioned that you are using MySQL and you're pulling 2 million rows from the server. This means that the MySQL driver is first loading all 2 million rows inside the JDBC driver & then our result set is traversing those 2 million rows and creating our own representation. By using the suggestions above, you could possibly see your memory consumption be reduced by half.
>> I suspect this has a large overhead given the display of the data set
We don't believe its the display of the data set that causes the large memory overhead or wait time. Its the building of the result set which is the most likely culprit. Both VA & Query Analyzer share the same result set. Hence, when you launch VA there is no additional large memory allocation required.
To verify this, you could run the following test after you've modified the MySQL driver parameter & confirmed that the "useCursorFetch=true" is having the desired effect. In Query Analyzer, execute your query but instead of returning all 2 million records, set a limit of 1 record. Then launch VA and right click on the data source & choose "Edit Query". Remove the 1 row limit and re-execute the query. In this scenario, we are not displaying the result set. Let me know what you observe.
Hi Jason,
You mentioned that you are using MySQL and you're pulling 2 million rows from the server. This means that the MySQL driver is first loading all 2 million rows inside the JDBC driver & then our result set is traversing those 2 million rows and creating our own representation. By using the suggestions above, you could possibly see your memory consumption be reduced by half.
>> I suspect this has a large overhead given the display of the data set
We don't believe its the display of the data set that causes the large memory overhead or wait time. Its the building of the result set which is the most likely culprit. Both VA & Query Analyzer share the same result set. Hence, when you launch VA there is no additional large memory allocation required.
To verify this, you could run the following test after you've modified the MySQL driver parameter & confirmed that the "useCursorFetch=true" is having the desired effect. In Query Analyzer, execute your query but instead of returning all 2 million records, set a limit of 1 record. Then launch VA and right click on the data source & choose "Edit Query". Remove the 1 row limit and re-execute the query. In this scenario, we are not displaying the result set. Let me know what you observe.
Issue #13368 |
Closed |
Fixed |
Resolved |
Completion |
No due date |
No fixed build |
No time estimate |
1 issue link |
relates to #13370
Issue #13370Support huge data sets in Visual Analytics |
@kin-hong: as you perform your memory analysis, can you look into this aspect & see how much time is actually spent on the rendering? I suspect that most of the time is spent on building the data set internally which is needed by VA as well as Query Analyzer.