#6787: Performance issue when I tail files that return large amount of data to screen

SachinPrakash reported 2012-04-11T22:12:20Z · last modified 2012-09-14T23:49:30Z

Performance issue when I tail files that return large amount of data to screen

Dev	funfun
QA	NielsGron

Priority	Minor
Complexity	Unknown

Component	App - FluidShell
Version	12.0

I have a text file which is 50 MB in size (a SQL dump of a DB). This file only has 2205 rows -- so each row is quite large on average.

o If I do a tail filename.sql -n 50 | grep asdfasfasdf where 'asdfasfasdf' is not a match of any data in the file, the tail operation returns very fast

o If I do a tail filename.sql -n 50 | grep sql where 'sql' matches multiple data sets, the command takes several minutes before displaying the output. The data amount that is outputted after grep is ~ 5 MB of character data.

o If I do a tail filename.sql -n 50 | grep sql > output.txt, the operation is extremely fast & the 5 MB of data is very quickly written to output.txt

Please investigate why performance is so slow when outputting to GUI.

All Comments (6) Change History

mattbrumley(*) 2012-04-21T00:53:55Z

A few notes:

[1] This slow behavior seems to be caused by passing extremely long lines (e.g. 5MB of characters on one line) to the JTextPane. JTextPane attempts to perform line wrapping on these long lines, which apparently is very slow & inefficient.

[2] A possible solution is to break up these long lines into strings based on the width of the editor (in characters). Then, these lines could be appended to the editor without the component having to perform any line-wrapping logic.

[3] A good example of this behavior is the bash terminal on ubuntu. If you cat a file containing lines that are a few megabytes long, they are added to the terminal as lines with a fixed length based on the current terminal width. If you resize the terminal, the previously cat'd lines are still at the old width & don't re-wrap. If you cat the file again, the new lines will obey the new terminal width.

[4] Steps to fix this:

Read each line from the file as a single String (this operation is fast).
If the ShellEditor is a 100 characters wide, then read 100 chars from the long line & append those 100 chars to the editor with a \n at the end.
Keep doing this until we hit the pre-defined max number of lines displayable in the editor.
Erase the first editor line & append the next 100 chars to the end.
Continue until the entire long line is processed.

[5] Some of the relevant methods are ShellEditor.trim(), ShellPipe.writeColor() and ShellPanelView.refreshResults()

[6] Normal user keyboard input might not encounter this problem, but the user could copy & paste a 1MB line of text into the editor.

emilianbold 2012-04-23T09:00:07Z

VT100 (SSH) has a special component specifically to go around these kind of problems. Matt's idea to wrap lines manually is probably what we should do short-term.

aquaclusters.josekibold 2012-04-23T10:04:33Z

Few notes on Matt's comment (based on my perspective from developing SSH Terminal):

[3] A good example of this behavior is the bash terminal on ubuntu. If you cat a file containing lines that are a few megabytes long, they are added to the terminal as lines with a fixed length based on the current terminal width. If you resize the terminal, the previously cat'd lines are still at the old width & don't re-wrap. If you cat the file again, the new lines will obey the new terminal width.

Yes, commands under these terminals make use of the SIGWINCH signal to know the current window size and hard-wrap output based on this. Interactive commands such as aptitude, mc, top etc. use SIGWINCH to update displayed content according to the new window size.

[2] A possible solution is to break up these long lines into strings based on the width of the editor (in characters). Then, these lines could be appended to the editor without the component having to perform any line-wrapping logic.

Hard-wrapping the output of FluidShell commands such as \cat etc. is desirable; attention should be paid to multi-byte chars such as Asian ones, etc, because these usually occupy more than one cell width.

Keeping only last N lines in memory (e.g. last 1000 rows, it should be an ajustable property) is also welcome and used by most terminals. If one wants to see/edit the entire content of that big text file, the \open command can be used to open it in a new editor. As I see, this editor pane doesn't perform any wrapping, therefore the problem problem shouldn't appear here.

[6] Normal user keyboard input might not encounter this problem, but the user could copy & paste a 1MB line of text into the editor.

I guess the same hard-wrap content changing can be inserted inside the Paste action, too.

SachinPrakash 2012-09-04T16:40:03Z

If I do a tail filename.sql -n 50 | grep sql where 'sql' matches multiple data sets, the command takes several minutes before displaying the output. The data amount that is outputted after grep is ~ 5 MB of character data.

For this scenario, the total time taken is approx equivalent to bash shell. The big difference is that in bash, output is seen almost immediately while in FluidShell, one has to wait 20 seconds prior to seeing any output.

If I do a tail filename.sql -n 50 | grep sql > output.txt, the operation is extremely fast & the 5 MB of data is very quickly written to output.txt

Large regression. It takes almost 20 seconds to create output.txt in latest build. It used to take a few seconds. In bash, it takes ~1 sec.

funfun 2012-09-12T03:37:06Z

>> Large regression. It takes almost 20 seconds to create output.txt in latest build. It used to take a few seconds.

I spent some time on this performance problem yesterday. It has something to do with SVN r28100 for fixing issue #7174 (\more, \tail and \exec with output redirection using binary files). r28100 changed the way how \tail handles I/O (from String to byte). The root cause of this performance problem is \tail uses a static utility method to read in a line (in byte[]), and, that utility method reads only one byte a time, that's the problem. We need to replace that utility method with an object that can read a chunk of bytes a time, perform buffered I/O and offer a read line method that returns a byte[]. That utility method is used by some other commands as well, not just \tail. I will revisit this issue perhaps later on this week.

funfun 2012-09-13T22:43:12Z

Resolved this performance problem in SVN r29256.

Below is the performance comparison among different versions of \tail on my desktop:
   SVN r28099: \tail reads in file contents in String with buffered I/O
   SVN r28100: \tail reads in file contents in byte[] w/o buffered I/O
   SVN r29256: \tail reads in file contents in byte[] with buffered I/O

(1) Time to see the very first output from executing 'tail filename.sql -n 50 | grep sql'
   SVN r28099: ~1 to 2 seconds
   SVN r28100: ~55 seconds
   SVN r29256: ~1 second

(2) Time to execute 'tail filename.sql -n 50 | grep sql > output.txt'
   SVN r28099: ~4 to 5 seconds
   SVN r28100: ~58 seconds
   SVN r29256: ~3 to 4 seconds

This performance problem is applied to \cat, \grep and \more as well; issue #7586 has been logged.

Search Tips

Aqua Data Studio / nhilam

Performance issue when I tail files that return large amount of data to screen

Issue #6787

Completion

1 issue link

Issue #7586

Search Tips

Aqua Data Studio / nhilam

Title

Performance issue when I tail files that return large amount of data to screen

Issue #6787

Completion

1 issue link

Issue #7586