#13368: Extract Data Set for VA

JasonK reported 2015-06-16T13:37:34Z · last modified 2015-08-18T19:21:52Z

Extract Data Set for VA

Dev	kin-hong
QA	tariqrahiman

Priority	Low
Complexity	Unknown

Component	Visual Analytics
Version	17.0

I'm trying to create a Visual Analytics workbook from a query that is currently returning around two million rows. Currently you have to execute a query in either the Query Builder or the Query Analyzer before you can create a Visual Analytics workbook. I suspect this has a large overhead given the display of the data set as what I've observed is that Data Studio looks to be able to complete the "execution" of the query (which with a MySql driver also means pulling all the data down into a ResultSet) but yet runs out of memory when it's working on the display (ie when it says it's processing rows X, etc).

I'd like to be able to skip that display and extract that data directly into a VA packaged workbook.

All Comments (6) Change History

SachinPrakash 2015-06-18T21:00:21Z

@kin-hong: as you perform your memory analysis, can you look into this aspect & see how much time is actually spent on the rendering? I suspect that most of the time is spent on building the data set internally which is needed by VA as well as Query Analyzer.

kin-hong 2015-07-09T21:40:10Z

By default, ADS does not make use of MySQL's JDBC ResultSet streaming and does not enable the useCursorFetch property. As a result, during ADS's query execution, the entire ResultSet raw data is first retrieved into memory, then we loop through each row and create the table model for rendering. This effectively doubles the runtime memory requirement when querying data.

If the user is using MySQL > 5.0.2, adding "useCursorFetch=true" to the driver parameters should inform the driver to use a fetch size of 5000, which will greatly reduce the maximum memory needed for querying and rendering data.

References:

JDBC API Implementation Notes (go to ResultSet section)
URL Syntax and Configuration Properties for Connector/J

SachinPrakash 2015-07-09T22:02:57Z

@jasonk: let us know how well this configuration works for you.

JasonK 2015-07-23T22:22:54Z · (edited)

I think we got a little sidetracked... I'll certainly test the useCursorFetch option but I'm not sure, in the context of this ticket, what that does for us? The scope of this ticket was to be able to directly extract a data set for use in VA. My assumption was that some level of memory being used is in building the display of the data in the Query Analyzer (along with the work of just doing that display) that isn't really needed given that VA doesn't display data in the same way.

And so if a query is being executed with the goal to go right into VA, I'm wonder if we can avoid the display (processing time and memory usage)?

Note, see my comments on the below ticket.. due to the memory leak I'm unable to test to see just how much memory is being used by the display :(

https://www.aquaclusters.com/app/home/project/public/aquadatastudio/issue/13369

SachinPrakash 2015-07-24T02:10:05Z

Hi Jason,

You mentioned that you are using MySQL and you're pulling 2 million rows from the server. This means that the MySQL driver is first loading all 2 million rows inside the JDBC driver & then our result set is traversing those 2 million rows and creating our own representation. By using the suggestions above, you could possibly see your memory consumption be reduced by half.

>> I suspect this has a large overhead given the display of the data set

We don't believe its the display of the data set that causes the large memory overhead or wait time. Its the building of the result set which is the most likely culprit. Both VA & Query Analyzer share the same result set. Hence, when you launch VA there is no additional large memory allocation required.

To verify this, you could run the following test after you've modified the MySQL driver parameter & confirmed that the "useCursorFetch=true" is having the desired effect. In Query Analyzer, execute your query but instead of returning all 2 million records, set a limit of 1 record. Then launch VA and right click on the data source & choose "Edit Query". Remove the 1 row limit and re-execute the query. In this scenario, we are not displaying the result set. Let me know what you observe.

JasonK 2015-07-24T14:51:58Z

Ah, correct (on the MySql useCursorFetch). I've also got a large RedShift instance so I'll just run the test you suggested against that.

Search Tips

Aqua Data Studio / nhilam

Extract Data Set for VA

Issue #13368

Completion

1 issue link

Issue #13370

Search Tips

Aqua Data Studio / nhilam

Title

Extract Data Set for VA

Issue #13368

Completion

1 issue link

Issue #13370