When writing a large file (50 MB file in this test case) to screen in FluidShell there seems to be a quite noticeable difference between the time taken by the \cat command to finish (30 seconds as indicated by FS), versus the \grep and \tail commands (8 seconds in each case).
As described in the comments from #7586, the biggest part of the 30 seconds in case of the \cat command is taken by the "Swing EDT thread which is responsible for painting text on screen", however don't \grep and \tail use also the same thread for painting?
Test scenario:
*1 a 50 MB file containing a repeating sequence of two lines: first line contains random "Lorem ipsum" text of length 456 chars, second line being the following " New line here."
*2 calling \cat 50mb > out
OR \grep "" 50mb > out
OR \tail -n +1 50mb > out
takes about 1-2 seconds
*3 calling \cat 50mb | cat > out
OR \grep "" 50mb | grep "" > out
OR \tail -n +1 50mb | tail -n +1 > out
takes about 8 seconds
*4 calling \cat 50mb
takes about 30 seconds
*5 calling \grep "" 50mb
OR \tail -n +1 50mb
takes about 8 seconds
1. Download and unzip "enwiki-20121001-pages-articles-multistream-index.txt.bz2" from http://dumps.wikimedia.org/enwiki/20121001/
2. Execute the following ...
:$ cat enwiki-20121001-pages-articles-multistream-index.txt2 > enwiki-index.txt
// 44 seconds in FluidShell.
// 3 second in Ubuntu/Bash
:$ cp enwiki-20121001-pages-articles-multistream-index.txt2 enwiki-index.txt
// 3 seconds in FluidShell.
// 3 second in Ubuntu/Bash
:$ cat enwiki-20121001-pages-articles-multistream-index.txt2 | grep oracle
// 1m 26 seconds in FluidShell.
// 2 second in Ubuntu/Bash
:$ cat enwiki-20121001-pages-articles-multistream-index.txt2 | exec grep oracle
// 23 seconds in FluidShell
From Matt :
I ran several cat executions through jprofiler, and there seem to be 3 separate time-consuming operations:
CatCommand.CatTool.readLine
ShellPipe.write
InterpXterm.processChar
Regarding the question about the difference between "cat xyz.txt" and "cat xyz.txt > out":
[1] "cat xyz.txt" reads one-character-at-a-time while "cat xyz.txt > out" reads an entire line at a time.
[2] "cat xyz.txt > out" doesn't actually write text to the terminal text area.
I've attached a couple of images showing some statistics:
cat_call_tree.png: shows the call tree for the expensive top-level operations, where the leaf nodes represent the low-level problem methods
cat_method_stats.png: shows some statistics on the problem methods
I modified AbstractCommandContext.
From Matt :
I ran several cat executions through jprofiler, and there seem to be 3 separate time-consuming operations:
CatCommand.CatTool.readLine
ShellPipe.write
InterpXterm.processChar
Regarding the question about the difference between "cat xyz.txt" and "cat xyz.txt > out":
[1] "cat xyz.txt" reads one-character-at-a-time while "cat xyz.txt > out" reads an entire line at a time.
[2] "cat xyz.txt > out" doesn't actually write text to the terminal text area.
I've attached a couple of images showing some statistics:
cat_call_tree.png: shows the call tree for the expensive top-level operations, where the leaf nodes represent the low-level problem methods
cat_method_stats.png: shows some statistics on the problem methods
I modified AbstractCommandContext.
From Matt >>
Here are statistics for executions of cat, cp & grep on a 4mb file with long lines.
The attached zip contains screenshots & html (for the longer lists), where the v1 images were generated using the unmodified codebase, and the v2 images use the modified AbstractCommandContext.
From Matt >>
Here are statistics for executions of cat, cp & grep on a 4mb file with long lines.
The attached zip contains screenshots & html (for the longer lists), where the v1 images were generated using the unmodified codebase, and the v2 images use the modified AbstractCommandContext.
Related issues:
Issues #7586 - Performance problem on commands that read in file contents in byte[]
Issues #6787 - Performance issue when I tail files that return large amount of data to screen
---[Quote 1 from #6787 - begin] 9/11/2012 comment ---
... The root cause of this performance problem is \tail uses a static utility method to read in a line (in byte[]), and, that utility method reads only one byte a time, that's the problem. We need to replace that utility method with an object that can read a chunk of bytes a time, perform buffered I/O and offer a read line method that returns a byte[]. That utility method is used by some other commands as well, not just \tail...
---[Quote 1 from #6787 - end]---
The utility method mentioned above is AbstractCommandContext.readLineOfBytes(). \cat, \grep, \tail, etc. had been modified to use buffered I/O when reading input from file(s), reading input from standard input is different story for these commands, see next paragraph.
---[Quote 2 from #7586 - begin] 9/14/2012 comment ---
Note that \cat, \grep and \more do not perform buffered I/O on reading input from standard input. This is because in the case of reading input from standard input, these commands need to respond right away when a new line is seen; otherwise, for example, the command below will behave differently:
prompt$ cat
text <Enter> <----- user type 'text' followed by Enter key
text <----- \cat should echo back 'text' right away
In the example above, if \cat handles standard input using buffered I/O then 'text' won't be echoed back right away.
---[Quote 2 from #7586 - end] ---
Above is the reason why AbstractCommandContext.readLineOfBytes() is still used by \cat, \grep, etc.; it should remain performing non-buffered I/O.
---[Quote 3 from #7586 - begin] 9/14/2012 comment ---
... Command (c3) takes time to complete, if you press 'Control-C' while (c3) is still running, you are not likely able to kill it. Why? In fluid shell, 'Control-C' is a key event which is handled by Swing EDT thread. As explained in the previous paragraph, it only takes \cat command (running in a non-Swing thread) 1 second to read in file contents and dumps it to Swing EDT, at the time when 'Control-C' is pressed, \cat likely has done its job and lots of painting requests have been submitted and queued in Swing EDT queue; Swing won't be able to process that 'Control-C' key event until all of queued request are processed...
---[Quote 3 from #7586 - end] ---
To minimize Control-C problem described above, \cat command performs a non-buffered I/O when "reading input from file and writing out FS terminal". This is not mentioned in #7586 but described in the \cat source code as:
// special handling on writing output to editor:
// do not use reader; otherwise, CTRL-C on file with large size won't respond immediately
I guess the previous paragraph should have answered your original question:
"As described in the comments from #7586, the biggest part of the 30 seconds in case of the \cat command is taken by the "Swing EDT thread which is responsible for painting text on screen", however don't \grep and \tail use also the same thread for painting?"
Related issues:
Issues #7586 - Performance problem on commands that read in file contents in byte[]
Issues #6787 - Performance issue when I tail files that return large amount of data to screen
---[Quote 1 from #6787 - begin] 9/11/2012 comment ---
... The root cause of this performance problem is \tail uses a static utility method to read in a line (in byte[]), and, that utility method reads only one byte a time, that's the problem. We need to replace that utility method with an object that can read a chunk of bytes a time, perform buffered I/O and offer a read line method that returns a byte[]. That utility method is used by some other commands as well, not just \tail...
---[Quote 1 from #6787 - end]---
The utility method mentioned above is AbstractCommandContext.readLineOfBytes(). \cat, \grep, \tail, etc. had been modified to use buffered I/O when reading input from file(s), reading input from standard input is different story for these commands, see next paragraph.
---[Quote 2 from #7586 - begin] 9/14/2012 comment ---
Note that \cat, \grep and \more do not perform buffered I/O on reading input from standard input. This is because in the case of reading input from standard input, these commands need to respond right away when a new line is seen; otherwise, for example, the command below will behave differently:
prompt$ cat
text <Enter> <----- user type 'text' followed by Enter key
text <----- \cat should echo back 'text' right away
In the example above, if \cat handles standard input using buffered I/O then 'text' won't be echoed back right away.
---[Quote 2 from #7586 - end] ---
Above is the reason why AbstractCommandContext.readLineOfBytes() is still used by \cat, \grep, etc.; it should remain performing non-buffered I/O.
---[Quote 3 from #7586 - begin] 9/14/2012 comment ---
... Command (c3) takes time to complete, if you press 'Control-C' while (c3) is still running, you are not likely able to kill it. Why? In fluid shell, 'Control-C' is a key event which is handled by Swing EDT thread. As explained in the previous paragraph, it only takes \cat command (running in a non-Swing thread) 1 second to read in file contents and dumps it to Swing EDT, at the time when 'Control-C' is pressed, \cat likely has done its job and lots of painting requests have been submitted and queued in Swing EDT queue; Swing won't be able to process that 'Control-C' key event until all of queued request are processed...
---[Quote 3 from #7586 - end] ---
To minimize Control-C problem described above, \cat command performs a non-buffered I/O when "reading input from file and writing out FS terminal". This is not mentioned in #7586 but described in the \cat source code as:
// special handling on writing output to editor:
// do not use reader; otherwise, CTRL-C on file with large size won't respond immediately
I guess the previous paragraph should have answered your original question:
"As described in the comments from #7586, the biggest part of the 30 seconds in case of the \cat command is taken by the "Swing EDT thread which is responsible for painting text on screen", however don't \grep and \tail use also the same thread for painting?"
I have logged 3 separate performance related issues:
issue #7874 - Performance problem on redirecting command output to file
issue #7875 - Performance problem on command pipeline
issue #7876 - Performance problem on command reading standard input using non-buffered I/O
I have logged 3 separate performance related issues:
issue #7874 - Performance problem on redirecting command output to file
issue #7875 - Performance problem on command pipeline
issue #7876 - Performance problem on command reading standard input using non-buffered I/O
Issue #7874,#7875,#7876 logged as a result of the discussions in this issue are fixed
Issue #7874,#7875,#7876 logged as a result of the discussions in this issue are fixed
Issue #7803 |
Closed |
Won't Fix |
Resolved |
Completion |
No due date |
No fixed build |
No time estimate |
1 issue link |
relates to #7586
Issue #7586Performance problem on commands that read in file contents in byte[] |
1. Download and unzip "enwiki-20121001-pages-articles-multistream-index.txt.bz2" from http://dumps.wikimedia.org/enwiki/20121001/
2. Execute the following ...
:$ cat enwiki-20121001-pages-articles-multistream-index.txt2 > enwiki-index.txt
// 44 seconds in FluidShell.
// 3 second in Ubuntu/Bash
:$ cp enwiki-20121001-pages-articles-multistream-index.txt2 enwiki-index.txt
// 3 seconds in FluidShell.
// 3 second in Ubuntu/Bash
:$ cat enwiki-20121001-pages-articles-multistream-index.txt2 | grep oracle
// 1m 26 seconds in FluidShell.
// 2 second in Ubuntu/Bash
:$ cat enwiki-20121001-pages-articles-multistream-index.txt2 | exec grep oracle
// 23 seconds in FluidShell