#8369: Query Analyzer: high memory consumption

funfun reported 2013-02-21T00:33:25Z · last modified 2013-02-27T22:14:47Z

Query Analyzer: high memory consumption

Dev	funfun
QA	SachinPrakash

Priority	Low
Complexity	Unknown

Component	App - General
Version	14.0

Query Analyzer seems to consume a lot of memory, Sachin requested an investigation on this.

Test data:
   The test cases described below re-use the test table defined in issue #8187, i.e. a table consists of a few rows and each row containing N-MB data (in my test case, N is 10).

Test-Case-1 (retrieving one row from the test table):
(a1) Launch ADS with -Xmx768M and make sure no tabs opened.
(a2) On ADS' toolbar, check all of the following icons:
        'Show Text', 'Show Text History', 'Show DBMS_OUTPUT', 'Show Grid', 'Show Pivot Grid', 'Show Form', 'Show Execution Plan', 'Show Client Statistics'.
(a3) Open a Query Analyzer tab on the DB server where test table resides.
(a4) In Query panel, enter a select statement to retrieve a row from the test table, e.g.
        SELECT * FROM tableName LIMIT 1
        (in my test case, the size of each row is 10-MB)
     The 'Used memory:' displayed in the lower-right corner usually is ~55 MB at this moment.
(a5) Press Control-E to execute the query.
     The status line under the editor pane will show 'Executing...'.
     The value of 'Used memory:' remains unchanged.
(a6) After a while, 'Executing...' is changed to 'Script executed - No Errors',
     The value of 'Used memory:' changes to ~300 MB.
(a7) Wait until all of the following tabs are displayed:
        Text, Text History, Grid, Pivot Grid, Form, Execution Plan, Client Statistics
     The value of 'Used memory:' changes to ~690 MB.
(a8) Click the GC icon a few times.
     The value of 'Used memory:' changes to ~600 MB.
(a9) At the moment, the first tab is selected, i.e. the Text tab.
     Click each tab from left to right till the last tab is reached.
     The value of 'Used memory:' changes to ~690 MB when the 'Grid' tab is clicked.
     Click each tab from right to left till the first tab is reached.
     The value of 'Used memory:' remains unchanged, ~690 MB.
(a10) Click the GC icon a few times.
     The value of 'Used memory:' changes to ~600 MB.

From steps above, it seems that, to handle 10 MB of data,
(1) 300-55=245 MB is used (difference between step a4 and step a6) to retrieve 10 MB data from server into ADS.
(2) 690-300=395 MB is used (difference between step a6 and step a7) to display 10 MB data in the selected tabs.
(3) 690-600=90 MB is used (difference between steps a7, a8, a9 and a10) for the Grid tab to handle 10 MB data.

Test-Case-2 (retrieving two rows from the test table):
(b1) same as step a1 above.
(b2) same as step a2 above.
(b3) same as step a3 above.
(b4) In Query panel, enter a select statement to retrieve 2 rows from the test table, e.g.
        SELECT * FROM tableName LIMIT 2
        (in my test case, the size of each row is 10-MB)
     The 'Used memory:' displayed in the lower-right corner usually is ~55 MB at this moment.
(b5) Press Control-E to execute the query.
     The status line under the editor pane will show 'Executing...'.
     The value of 'Used memory:' remains unchanged.
(b6) After a while, 'Executing...' is changed to 'Script executed - No Errors',
     The value of 'Used memory:' changes to ~372 MB.
(b7) Wait until all of the following tabs are displayed:
        Text, Text History, Grid, Pivot Grid, Form, Execution Plan, Client Statistics
     The value of 'Used memory:' changes to ~716 MB.
(b8) Click the GC icon a few times.
     The value of 'Used memory:' changes to ~600 MB.
(b9) At the moment, the first tab is selected, i.e. the Text tab.
     Click each tab from left to right till the last tab is reached.
     An OutOfMemoryError exception is thrown after the Grid tab is clicked.

Test-Case-3 (retrieving three rows from the test table):
(c1) same as step a1 above.
(c2) same as step a2 above.
(c3) same as step a3 above.
(c4) In Query panel, enter a select statement to retrieve 3 rows from the test table, e.g.
        SELECT * FROM tableName LIMIT 3
        (in my test case, the size of each row is 10-MB)
     The 'Used memory:' displayed in the lower-right corner usually is ~55 MB at this moment.
(c5) Press Control-E to execute the query.
     The status line under the editor pane will show 'Executing...'.
     The value of 'Used memory:' remains unchanged.
(c6) After a while, 'Executing...' is changed to 'Script executed - No Errors',
     The value of 'Used memory:' changes to ~460 MB.
(c7) ADS won't be able to create tabs for showing output, runs out of memory.

24 attachments

MAT-Memory-Text-Tab

2013-02-21T06:46:29Z

130 KB
YJP-Memory-Before-Exec-SQL

2013-02-21T06:46:29Z

94 KB
YJP-Memory-Text-Tab

2013-02-21T06:46:29Z

103 KB
YJP-Memory-After-Exec-SQL

2013-02-21T06:46:29Z

90 KB
Text-Tab-1-Begin

2013-02-22T06:02:05Z

114 KB
Text-Tab-2-Column-Header

2013-02-22T06:02:06Z

115 KB
Text-Tab-3-To-Document

2013-02-22T06:02:06Z

117 KB
Text-Tab-5-After-Visible

2013-02-22T06:02:07Z

143 KB
Text-Tab-4-Before-Visible

2013-02-22T06:02:07Z

122 KB
Text-10MB-4-Before-Visible

2013-02-23T06:29:54Z

113 KB
Text-10MB-2-Begin

2013-02-23T06:29:54Z

106 KB
Text-10MB-5-After-Visible

2013-02-23T06:29:54Z

119 KB
Text-10MB-1-Test-Data

2013-02-23T06:29:54Z

85 KB
Text-10MB-3-To-Document

2013-02-23T06:29:54Z

113 KB
Text-30MB-5-After-Visible

2013-02-23T06:38:31Z

120 KB
Text-30MB-2-Begin

2013-02-23T06:38:31Z

94 KB
Text-30MB-4-Before-Visible

2013-02-23T06:38:31Z

106 KB
Text-30MB-1-Test-Data

2013-02-23T06:38:31Z

99 KB
Text-30MB-3-To-Document

2013-02-23T06:38:31Z

109 KB
Text-50MB-Summary

2013-02-23T22:13:06Z

141 KB
Grid-100K-1MB-Summary

2013-02-23T22:13:06Z

128 KB
Grid-50MB-Summary

2013-02-23T22:13:07Z

158 KB
CPU-500K-All-Threads

2013-02-27T02:09:46Z

134 KB
CPU-500K-EDT

2013-02-27T02:09:46Z

144 KB

All Comments (17) Change History

funfun 2013-02-21T06:46:29Z

I downloaded an evaluation copy of Java Profiler from http://www.yourkit.com/ and captured some memory snapshots based on the test cases described in this issue. To simplify the profiling output, I only turned on the 'Text' tab for the opened Query panel.

Attached please find the following memory snapshots produced by the YourKit's Java Profiler:
(1) YJP-Memory-Before-Exec-SQL: this is the memory snapshot before executing the SQL statement.
(2) YJP-Memory-After-Exec-SQL: this is the memory snapshot after executing the SQL statement and data is retrieved into memory.
(3) YJP-Memory-Text-Tab: this is the memory snapshot after the Text tab is presented.
and
(4) MAT-Memory-Text-Tab: this is the same as (3) but is generated from Eclipse Memory Analyzer Tool.

It seems that most of memory is consumed by netbeans related objects.

emil.goicovici 2013-02-21T15:18:58Z

The biggest memory consumer in your tests is the Text tab output and I'll explain below why. "Show Form" and "Show Grid" also require about 90 MB each to display a 10 MB table cell and even more memory is allocated when this big-content cell is selected with a mouse click. But I think these memory allocations for the "Show Form" and "Show Grid" are legitimate and required to properly render its GUI components (table cell renderers etc).

However, my tests indicate that even if Text Tab is the only output type activated on a Query Analyzer Window, about 550 MB is required to render a 14 MB string coming from a single longtext cell. The problem is that for this long string, the second row of the Text tab (i.e. the one after the first row that contains the column names) has about 14 million occurrences of the "-" char. The length of this huge "----" separator is probably established by the sum of each column's content length.

Text editors are not very happy with huge lines (i.e. lines containing too many chars). On my tests, Gedit under Ubuntu for example is messed up (overlaps the rendered chars etc) while handling a line having over 1M chars. While Netbeans editor renders a 1M chars line correctly, the GUI becomes a little bit less responsive (few seconds refresh delay) when handling a line containing 10M chars. I remember we had an issue on AquaClusters on the past regarding huge lines present on a text editor.

The ADS algorithm that computes the length of the "----" separator row can be probably enhanced by taking into account the new lines that occur on a text column. The Text tab displays the longtext cell content by going to the next line every time a \n is encountered on the input string and thus this single cell is spread on about 40k rows. The biggest line from the longtext.sql file attached on #8187 has only 681 chars. Therefore, the "----" separator corresponding to this column should have 681 chars, not 14 million chars (i.e. the length of the biggest line). By making the computation algorithm of this separator aware of the new line chars, the memory allocation will be probably greatly improved.

However, this issue will still occur on big longtext columns that don't contain new lines in their content. I've tested another scenario with a 10MB longtext column that doesn't have new line chars inside it. In this case, two 10m chars rows (one for the "---" separator and one for the content itself) are displayed on the Text tab output. There is a big memory spike (from 60MB goes up to about 620 MB used memory).

I think that the longtext column should be truncated in this case (displaying a "(truncated)" suffix or something similar). There is no reason to display the entire content as the 10M chars line is obviously not human readable. If the user wants to perform text search on longtext columns, I guess SQL WHERE filters such as LIKE can be used to perform this search. A text editor containing rows with 10M chars are difficult to operate and can easily trigger OutOfMemory errors. Just establish a reasonable threshold for the content length of each table cell that's going to be displayed on the Text tab and truncate the remaining text.

In order to avoid being too restrictive, I think the two enhancements proposed above (make the computation of the "---" separator aware of the new line chars and truncating continuous lines -- i.e. a text line that doesn't contain \n chars) can be combined. This way the user can still see the full content of longtext columns, if these cells contain enough \n separators.

funfun 2013-02-22T06:02:07Z

> I think that the longtext column should be truncated in this case (displaying a "(truncated)" suffix or something similar).
The maximum display length can be configured by this option:
   File -> Options -> Results -> Text Results -> Maximum column length

I have the following option checked in my ADS:
   File -> Options -> Results -> Text Results -> Display column headers
Hence, in my ADS, the total number of characters displayed in the Text tab actually is 30 MB:
   (a) 10 MB - column name header (column name with white spaces padded)
   (b) 10 MB - column separator (the line containing bunch of '-' characters)
   (c) 10 MB - the content

I have attached some more memory dumps generated from Java Profiler to explain how memory is consumed for creating a Text tab:
(1) Text-Tab-1-Begin: this memory dump is taken after the 10 MB data is retrieved from the server and right before passing the data to the object that is responsible for creating the Text tab.
    Retained memory size: 103 MB
(2) Text-Tab-2-Column-Header: this memory dump is taken after column header and separator are created, (a) and (b) above.
    168 MB memory is allocated for creating (a) and (b).
    Retained memory size: 284 MB
(3) Text-Tab-3-To-Document: this memory dump is taken after 30 MB data, (a)+(b)+(c), is saved to the Document object associated with the editor pane.
    Retained memory size: 351 MB
(4) Text-Tab-4-Before-Visible: this memory dump is taken after the Text tab is added to the container component but is not visible yet.
    Retained memory size: 422 MB
(5) Text-Tab-5-After-Visible: this memory dump is taken after the Text tab is visible.
    Some new objects show up here, likely has something to do with presenting a Document in an editor pane.
    I don't have access to netbeans source, and am not able to provide more information here.
    Retained memory size: 546 MB

emil.goicovici 2013-02-22T06:59:37Z · (edited)

I have the following option checked in my ADS:
   File -> Options -> Results -> Text Results -> Display column headers
Hence, in my ADS, the total number of characters displayed in the Text tab actually is 30 MB:
   (a) 10 MB - column name header (column name with white spaces padded)
   (b) 10 MB - column separator (the line containing bunch of '-' characters)
   (c) 10 MB - the content

It is not about the total document length. What it is important is the column length of a single line (10 MB on this case) and this cause the huge memory allocation.

Note that the content from (c) is not displayed on a single row, but split (due to the new line chars) on 30k-40k rows. Therefore, the 10MB length of the two rows that compose the table header --- (a) and (b) --- is unnecessary.

5) Text-Tab-5-After-Visible: this memory dump is taken after the Text tab is visible.
    Some new objects show up here, likely has something to do with presenting a Document in an editor pane.
    I don't have access to netbeans source, and am not able to provide more information here.
    Retained memory size: 546 MB

The objects you've highlighted in red on this screenshot are actually the GUI components built by Netbeans (ParagraphView etc) in order to render the document on the screen.

I state once again: text editors (generally) are not optimized for huge lines (e.g. 10MB columns), but are rather optimized for documents having a huge number of lines with reasonable length on each line.

I don't see why would we stick on rows having a length of 10MB columns, as it is difficult to horizontally scroll on such lines and Netbeans' editor and other editors clearly indicates me that they are not optimized for this kind of displaying a text content (probably because text files containing 10MB chars on each line are not very usual).

I'll play with the "Maximum column length" setting to see if it applies to the Text tab.

emil.goicovici 2013-02-22T08:43:53Z · (edited)

> I think that the longtext column should be truncated in this case (displaying a "(truncated)" suffix or something similar).
The maximum display length can be configured by this option:
File -> Options -> Results -> Text Results -> Maximum column length

Well, I've played with this setting and it seems it is correctly applied (i.e. truncates to the desired length) only to the column names header and the "---" separator. The content of the table cell itself is not correctly truncated, but rather the chars following the N-th position are replaced with spaces, but the string length remains unchanged. See issue #8374 for more details.

emil.goicovici 2013-02-22T08:24:11Z

I've performed another test scenario: I've unchecked the File -> Options -> Results -> Text Results -> Display column headers option and set the Maximum column length one to unlimited. Then I've run the same query over a 14M longtext content (obtained from the input file attached tot #8187).

There is a memory spike in this case, too (probably caused by the instantiation of 40k ParagraphView objects, one for each row), but after the results are displayed, these objects can be successfully released by calling the GC (the amount of used memory goes down to about 130 MB).

This demonstrates once again that the problem here is not the 10MB content length, but how it is structured. If it is split over several text lines and each line has a reasonable length, then there is no memory issue. But if there is a huge line (1M, 10M etc) on the output, then the allocated memory increase drastically and can cause out of memory errors.

emilianbold 2013-02-22T13:09:05Z

I won't go into profiling details but I'd like to add my 2c to the discussion.

The editor is written and has evolved according to some generic usecase, which is basically developers editing source code.

Source code is primordially human-readable so it must have an acceptable 'shape'. A 20MB file with a 10M char single line doesn't really qualify, so it's no wonder performance isn't great there.

I doubt there is much we can patch.

Initially NetBeans used a textarea for the terminal and output window too. But they noticed that the editor wasn't really good for that usecase (I guess, in particular, rapid scrolling lines that fill the buffer) so they introduced the terminal emulator which has a custom UI to display text.

It's good that we are learning the limits of the editor. But in the end we will have to accept the performance as-is or replace the component with something else in those places where it doesn't perform.

With the FluidShell/VT100 experience I wonder if it wouldn't be an good experiment to replace text results with a terminal window?

emil.goicovici 2013-02-22T13:54:47Z

With the FluidShell/VT100 experience I wonder if it wouldn't be an good experiment to replace text results with a terminal window?

Replacing the vincaed GUI component of the Text Results tab with a Terminal Emulator one (i.e. the GUI renderer used for FluidShell and VT100) also means that you'll loose the side bar decorators provided by the vincaed editor (error stripes displayed when SQL errors/warnings are present, line numbers, status bar etc). Some of them could be emulated with proper implementation, of course, but we should ask if it worth such effort.

I think the main question to be answered is to establish what is the main role (i.e. its daily basis usability) for the Text Results output tab? Then we should see what are the key behaviours expected from it and what limitations are considered acceptable. Having the answers to these questions, we can then go forward to see which GUI component would suit best.

SachinPrakash 2013-02-22T17:39:23Z

The primary role for the Text Results output tab is not in displaying 10MB long column strings. In addition, we recommend to users that if they have very large result sets (lots of rows), using grid is much more memory efficient than Text Results.

I state once again: text editors (generally) are not optimized for huge lines (e.g. 10MB columns), but are rather optimized for documents having a huge number of lines with reasonable length on each line.

Our typical use case is having large number of lines with reasonable length on each line.

When I initially saw the defect of a 10MB string consuming hundreds of MB, I was concerned that we had a very commonly occurring memory issue. But after this investigation, it seems that the large memory consumption only occurs if a particular cell value has a very large string (e.g. 10MB). This is certainly an edge case which we do not need to optimize for.

Separate question: lets says we execute a DB query which returns 100k rows. How much memory is consumed by grid vs text results? (No 10MB cell values.)

funfun 2013-02-23T06:29:54Z

I did test for the single Text tab case using 100K rows, the text grid size is 10MB. Everything seems to be fine. See following screenshots for details:
(1) Text-10MB-1-Test-Data: this screenshot describes the definition of test data and how it is presented in the Text tab.
(2) Text-10MB-2-Begin: this memory dump is taken at the time right after the test data is retrieved from the server and before the creation of the Text tab.
(3) Text-10MB-3-To-Document: this memory dump is taken at the time after the grid data is saved to the Document object associated with the editor pane.
(4) Text-10MB-4-Before-Visible: this memory dump is taken at the time after the Text tab is added to the container component but yet to be visible.
(5) Text-10MB-5-After-Visible: this memory dump is taken at the time after the Text tab is visible.

funfun 2013-02-23T06:38:31Z

I also did similar test for the single Text tab case using 100K rows, with 30MB text grid, looks good, too. The only difference between 10MB text grid and 30MB text grid is 30MB text grid takes much much longer time to create. See following screenshots for details:
(1) Text-30MB-1-Test-Data: this screenshot describes the definition of test data and how it is presented in the Text tab.
(2) Text-30MB-2-Begin: this memory dump is taken at the time right after the test data is retrieved from the server and before the creation of the Text tab.
(3) Text-30MB-3-To-Document: this memory dump is taken at the time after the grid data is saved to the Document object associated with the editor pane.
(4) Text-30MB-4-Before-Visible: this memory dump is taken at the time after the Text tab is added to the container component but yet to be visible.
(5) Text-30MB-5-After-Visible: this memory dump is taken at the time after the Text tab is visible.

funfun 2013-02-23T22:13:07Z

> Separate question: lets says we execute a DB query which returns 100k rows. How much memory is consumed by grid vs text results? (No 10MB cell values.)
Did test for Grid using 100K rows without 10 MB cell values.
The size of data retrieved from the server is 1 MB; the retained memory size is 111 MB after the Grid tab is visible.
See Grid-100K-1MB-Summary attachment for details.

Also did test for 'Text vs. Grid' using 10 MB cell values, see following memory dumps for more info:
(A) Text-50MB-Summary:
Test case: only the Text tab is activated, retrieving 3 rows of test data, each row size is 10 MB (there is a 10 MB cell).
The overall text grid size is 50 MB: 30 MB data + 10 MB column names header + 10 MB column separator.
This memory dump is taken at the time after the Text tab is made visible, the retained memory size is 644 MB.
With ADS configured as -Xmx768M, I could only execute the SELECT SQL a couple of times before running out of heap space.

(B) Grid-50MB-Summary:
Test case: only the Grid tab is activated, retrieving 5 rows of test data, each row size is 10 MB (there is a 10 MB cell).
The overall data size to be displayed in the Grid tab is 50 MB.
This memory dump is taken at the time after the Grid tab is made visible.
The retained memory size is 342 MB, and about 200 MB is ready for garbage collection (objects marked [Unreachable]).
With ADS configured as -Xmx768M, was able to execute the SELECT SQL quite many times without any problem.

To summarize: Grid tab is not only much more memory efficient, but also takes much less time to construct than Text tab.

funfun 2013-02-26T04:39:49Z · (edited)

Also tried this test case:
Repeat executing (12+ times) a SELECT statement on the tabs listed below with Grid tab:
   Text History, Pivot Grid, Form, Client Statistics
The SELECT statement will return one of the followings:
   (a) 5 rows, row size is 10 MB - overall data size is 50 MB
   (b) 102400 rows, row size is 10 bytes - overall data size is about 1 MB

All of tabs listed above seem to work fine except Pivot Grid on the case (b).
Pivot Grid seems to have a serious memory leak on case (b), logged as issue #8387.

funfun 2013-02-26T04:57:21Z

> (3) 690-600=90 MB is used (difference between steps a7, a8, a9 and a10) for the Grid tab to handle 10 MB data.

Query panel's TabbedPane component has a custom ChangeListener defined; this custom change listener always forces a repaint.

emil.goicovici 2013-02-26T19:15:27Z · (edited)

The primary role for the Text Results output tab is not in displaying 10MB long column strings. In addition, we recommend to users that if they have very large result sets (lots of rows), using grid is much more memory efficient than Text Results.
Separate question: lets says we execute a DB query which returns 100k rows. How much memory is consumed by grid vs text results? (No 10MB cell values.)

The amount of memory used is not the only concern regarding the Text Results component. ADS GUI freezes for a certain amount of time when Text Results GUI is going to display a large amount of output text, as follows:

Output size (sum of all rows)	DB Retrieve + Text Results render time (i.e. GUI freeze)
1 MB	< 1 sec
5 MB	about 5 seconds
12 MB	about 30 seconds
24 MB	aprox. 4 minutes

These tests were run over a column having 24 bytes on each row (i.e. 500k rows for the 12 MB test, 1M rows for the 24 MB test).

As you can see, there is no linear relationship between the total output size and the time the GUI is frozen up, but rather exponential. This time is mostly spent by the lexer for syntax highlighting etc. It freezes the GUI probably because this is done in EDT. Note that the "Cancel" action cannot be triggered during this time (as the EDT is unavailable).

It's a good thing to recommend users to use the Grid output for large resultsets, but I think we should avoid accidental / unexpected GUI freezes (e.g. if the Text Results is left enabled and the user runs a query returning a 50 MB resultset). We could avoid such freezes by displaying an info message instead of the whole resultset inside the Text results component, when the resultset is above a threshold (e.g. 5 MB). By seeing this info message, user can then disable the Text results, enable the Grid one and run query again.

funfun 2013-02-27T02:09:46Z

In the current ADS implementation, the SELECT statement is executed in a non-EDT thread. The result of the query is passed to the EDT in which the text grid is constructed. Attached is a CPU sampling from executing 'SELECT * from tableName limit 500000' (500K rows, 12 MB data).
(1) CPU-500K-All-Threads: this screenshot shows all of threads involved.
(2) CPU-500K-EDT: this screenshot shows details of the EDT.
In this 12MB/500K-row case, 90% time is spent in (2) on creating text grid; since it is in EDT, GUI won't respond until the Text tab is shown.

SachinPrakash 2013-02-27T22:14:47Z

We don't need to address the limitations of rendering data in the Text results tab. By default, text is not enabled, and we recommend to our users not to use text results when dealing with very large result sets.

This entire exercise has been very valuable in characterizing ADS behavior as well as fixing some significant leaks.

Search Tips

Aqua Data Studio / nhilam

Query Analyzer: high memory consumption

24 attachments

Issue #8369

Completion

3 issue links

Issue #8387

Issue #9217

Issue #8374

Search Tips

Aqua Data Studio / nhilam

Title

Query Analyzer: high memory consumption

24 attachments

Issue #8369

Completion

3 issue links

Issue #8387

Issue #9217

Issue #8374