I have a text file which is 50 MB in size (a SQL dump of a DB). This file only has 2205 rows -- so each row is quite large on average.
o If I do a tail filename.sql -n 50 | grep asdfasfasdf where 'asdfasfasdf' is not a match of any data in the file, the tail operation returns very fast
o If I do a tail filename.sql -n 50 | grep sql where 'sql' matches multiple data sets, the command takes several minutes before displaying the output. The data amount that is outputted after grep is ~ 5 MB of character data.
o If I do a tail filename.sql -n 50 | grep sql > output.txt, the operation is extremely fast & the 5 MB of data is very quickly written to output.txt
Please investigate why performance is so slow when outputting to GUI.
A few notes:
[1] This slow behavior seems to be caused by passing extremely long lines (e.g. 5MB of characters on one line) to the JTextPane. JTextPane attempts to perform line wrapping on these long lines, which apparently is very slow & inefficient.
[2] A possible solution is to break up these long lines into strings based on the width of the editor (in characters). Then, these lines could be appended to the editor without the component having to perform any line-wrapping logic.
[3] A good example of this behavior is the bash terminal on ubuntu. If you cat a file containing lines that are a few megabytes long, they are added to the terminal as lines with a fixed length based on the current terminal width. If you resize the terminal, the previously cat'd lines are still at the old width & don't re-wrap. If you cat the file again, the new lines will obey the new terminal width.
[4] Steps to fix this:
[5] Some of the relevant methods are ShellEditor.trim(), ShellPipe.writeColor() and ShellPanelView.refreshResults()
[6] Normal user keyboard input might not encounter this problem, but the user could copy & paste a 1MB line of text into the editor.
VT100 (SSH) has a special component specifically to go around these kind of problems. Matt's idea to wrap lines manually is probably what we should do short-term.
VT100 (SSH) has a special component specifically to go around these kind of problems. Matt's idea to wrap lines manually is probably what we should do short-term.
Few notes on Matt's comment (based on my perspective from developing SSH Terminal):
[3] A good example of this behavior is the bash terminal on ubuntu. If you cat a file containing lines that are a few megabytes long, they are added to the terminal as lines with a fixed length based on the current terminal width. If you resize the terminal, the previously cat'd lines are still at the old width & don't re-wrap. If you cat the file again, the new lines will obey the new terminal width.
Yes, commands under these terminals make use of the SIGWINCH signal to know the current window size and hard-wrap output based on this. Interactive commands such as aptitude, mc, top etc. use SIGWINCH to update displayed content according to the new window size.
[2] A possible solution is to break up these long lines into strings based on the width of the editor (in characters). Then, these lines could be appended to the editor without the component having to perform any line-wrapping logic.
Hard-wrapping the output of FluidShell commands such as \cat etc. is desirable; attention should be paid to multi-byte chars such as Asian ones, etc, because these usually occupy more than one cell width.
Keeping only last N lines in memory (e.g. last 1000 rows, it should be an ajustable property) is also welcome and used by most terminals. If one wants to see/edit the entire content of that big text file, the \open command can be used to open it in a new editor. As I see, this editor pane doesn't perform any wrapping, therefore the problem problem shouldn't appear here.
[6] Normal user keyboard input might not encounter this problem, but the user could copy & paste a 1MB line of text into the editor.
I guess the same hard-wrap content changing can be inserted inside the Paste action, too.
Few notes on Matt's comment (based on my perspective from developing SSH Terminal):
[3] A good example of this behavior is the bash terminal on ubuntu. If you cat a file containing lines that are a few megabytes long, they are added to the terminal as lines with a fixed length based on the current terminal width. If you resize the terminal, the previously cat'd lines are still at the old width & don't re-wrap. If you cat the file again, the new lines will obey the new terminal width.
Yes, commands under these terminals make use of the SIGWINCH signal to know the current window size and hard-wrap output based on this. Interactive commands such as aptitude, mc, top etc. use SIGWINCH to update displayed content according to the new window size.
[2] A possible solution is to break up these long lines into strings based on the width of the editor (in characters). Then, these lines could be appended to the editor without the component having to perform any line-wrapping logic.
Hard-wrapping the output of FluidShell commands such as \cat etc. is desirable; attention should be paid to multi-byte chars such as Asian ones, etc, because these usually occupy more than one cell width.
Keeping only last N lines in memory (e.g. last 1000 rows, it should be an ajustable property) is also welcome and used by most terminals. If one wants to see/edit the entire content of that big text file, the \open command can be used to open it in a new editor. As I see, this editor pane doesn't perform any wrapping, therefore the problem problem shouldn't appear here.
[6] Normal user keyboard input might not encounter this problem, but the user could copy & paste a 1MB line of text into the editor.
I guess the same hard-wrap content changing can be inserted inside the Paste action, too.
If I do a tail filename.sql -n 50 | grep sql where 'sql' matches multiple data sets, the command takes several minutes before displaying the output. The data amount that is outputted after grep is ~ 5 MB of character data.
For this scenario, the total time taken is approx equivalent to bash shell. The big difference is that in bash, output is seen almost immediately while in FluidShell, one has to wait 20 seconds prior to seeing any output.
If I do a tail filename.sql -n 50 | grep sql > output.txt, the operation is extremely fast & the 5 MB of data is very quickly written to output.txt
Large regression. It takes almost 20 seconds to create output.txt in latest build. It used to take a few seconds. In bash, it takes ~1 sec.
If I do a tail filename.sql -n 50 | grep sql where 'sql' matches multiple data sets, the command takes several minutes before displaying the output. The data amount that is outputted after grep is ~ 5 MB of character data.
For this scenario, the total time taken is approx equivalent to bash shell. The big difference is that in bash, output is seen almost immediately while in FluidShell, one has to wait 20 seconds prior to seeing any output.
If I do a tail filename.sql -n 50 | grep sql > output.txt, the operation is extremely fast & the 5 MB of data is very quickly written to output.txt
Large regression. It takes almost 20 seconds to create output.txt in latest build. It used to take a few seconds. In bash, it takes ~1 sec.
>> Large regression. It takes almost 20 seconds to create output.txt in latest build. It used to take a few seconds.
I spent some time on this performance problem yesterday. It has something to do with SVN r28100 for fixing issue #7174 (\more, \tail and \exec with output redirection using binary files). r28100 changed the way how \tail handles I/O (from String to byte). The root cause of this performance problem is \tail uses a static utility method to read in a line (in byte[]), and, that utility method reads only one byte a time, that's the problem. We need to replace that utility method with an object that can read a chunk of bytes a time, perform buffered I/O and offer a read line method that returns a byte[]. That utility method is used by some other commands as well, not just \tail. I will revisit this issue perhaps later on this week.
>> Large regression. It takes almost 20 seconds to create output.txt in latest build. It used to take a few seconds.
I spent some time on this performance problem yesterday. It has something to do with SVN r28100 for fixing issue #7174 (\more, \tail and \exec with output redirection using binary files). r28100 changed the way how \tail handles I/O (from String to byte). The root cause of this performance problem is \tail uses a static utility method to read in a line (in byte[]), and, that utility method reads only one byte a time, that's the problem. We need to replace that utility method with an object that can read a chunk of bytes a time, perform buffered I/O and offer a read line method that returns a byte[]. That utility method is used by some other commands as well, not just \tail. I will revisit this issue perhaps later on this week.
Resolved this performance problem in SVN r29256.
Below is the performance comparison among different versions of \tail on my desktop:
SVN r28099: \tail reads in file contents in String with buffered I/O
SVN r28100: \tail reads in file contents in byte[] w/o buffered I/O
SVN r29256: \tail reads in file contents in byte[] with buffered I/O
(1) Time to see the very first output from executing 'tail filename.sql -n 50 | grep sql'
SVN r28099: ~1 to 2 seconds
SVN r28100: ~55 seconds
SVN r29256: ~1 second
(2) Time to execute 'tail filename.sql -n 50 | grep sql > output.txt'
SVN r28099: ~4 to 5 seconds
SVN r28100: ~58 seconds
SVN r29256: ~3 to 4 seconds
This performance problem is applied to \cat, \grep and \more as well; issue #7586 has been logged.
Resolved this performance problem in SVN r29256.
Below is the performance comparison among different versions of \tail on my desktop:
SVN r28099: \tail reads in file contents in String with buffered I/O
SVN r28100: \tail reads in file contents in byte[] w/o buffered I/O
SVN r29256: \tail reads in file contents in byte[] with buffered I/O
(1) Time to see the very first output from executing 'tail filename.sql -n 50 | grep sql'
SVN r28099: ~1 to 2 seconds
SVN r28100: ~55 seconds
SVN r29256: ~1 second
(2) Time to execute 'tail filename.sql -n 50 | grep sql > output.txt'
SVN r28099: ~4 to 5 seconds
SVN r28100: ~58 seconds
SVN r29256: ~3 to 4 seconds
This performance problem is applied to \cat, \grep and \more as well; issue #7586 has been logged.
Issue #6787 |
Closed |
Fixed |
Resolved |
Completion |
No due date |
Fixed Build trunk/29256 |
No time estimate |
1 issue link |
relates to #7586
Issue #7586Performance problem on commands that read in file contents in byte[] |
A few notes:
[1] This slow behavior seems to be caused by passing extremely long lines (e.g. 5MB of characters on one line) to the JTextPane. JTextPane attempts to perform line wrapping on these long lines, which apparently is very slow & inefficient.
[2] A possible solution is to break up these long lines into strings based on the width of the editor (in characters). Then, these lines could be appended to the editor without the component having to perform any line-wrapping logic.
[3] A good example of this behavior is the bash terminal on ubuntu. If you cat a file containing lines that are a few megabytes long, they are added to the terminal as lines with a fixed length based on the current terminal width. If you resize the terminal, the previously cat'd lines are still at the old width & don't re-wrap. If you cat the file again, the new lines will obey the new terminal width.
[4] Steps to fix this:
[5] Some of the relevant methods are ShellEditor.trim(), ShellPipe.writeColor() and ShellPanelView.refreshResults()
[6] Normal user keyboard input might not encounter this problem, but the user could copy & paste a 1MB line of text into the editor.