Performance problem on the following commands that read in file contents in byte[]:
\cat, \grep, \more
See issue #6787, 9/12/12 and 9/13/12 comments for details.
\tail command has been fixed by issue #6787.
Modified \cat, \grep, and \more using buffered I/O on reading files. SVN r29273.
Using the 50 MB text file mentioned in issue #6787 as an example, the execution time to read that file is:
before r29273: ~55 seconds
after r29273: ~1 to 2 seconds
Also modified \grep so that dest_file generated by the command below would be exactly the same as source_file.
prompt$ grep "" source_file > dest_file
Note that \cat, \grep and \more do not perform buffered I/O on reading input from standard input. This is because in the case of reading input from standard input, these commands need to respond right away when a new line is seen; otherwise, for example, the command below will behave differently:
prompt$ cat
text <Enter> <----- user type 'text' followed by Enter key
text <----- \cat should echo back 'text' right away
In the example above, if \cat handles standard input using buffered I/O then 'text' won't be echoed back right away.
So, different I/O handling mechanisms are used by \cat, \grep, \more, as well as \tail on reading input, depends on the source of input. This will produce quite different execution time, depends on the way how these commands are executed. See examples below:
Again, using the 50 MB text file mentioned in issues #6787:
(c1) prompt$ \cat 50MB.txt > tmp.txt
(c2) prompt$ \cat 50MB.txt | cat > tmp.txt
(c3) prompt$ \cat 50MB.txt
Execute commands above on my desktop, execution time is
(t1) c1 takes ~1 to 2 seconds
(t2) c2 takes ~18 seconds
(t3) c3 takes ~75 seconds
From (t1), we know it only takes \cat 1 second to read that 50 MB file into memory and then write it out to disk.
In (c2), there are two \cats command involved, where is the additional 17 seconds spent? Likely because the second \cat did not read input using buffered I/O.
For (c3), why it is so slow? 75 seconds?
Command (c3), i.e. '\cat 50MB.txt', is to write the contents of 50MB.txt to fluid terminal. In fluid shell, the way how writing text to fluid terminal works is that text is always passed to Swing EDT thread which is responsible for painting text on screen. From (t1), we know only 1 second is used to read in 50 MB file (in a non-Swing thread), the rest of 74 seconds likely is spent in the Swing thread.
Command (c3) takes time to complete, if you press 'Control-C' while (c3) is still running, you are not likely able to kill it. Why? In fluid shell, 'Control-C' is a key event which is handled by Swing EDT thread. As explained in the previous paragraph, it only takes \cat command (running in a non-Swing thread) 1 second to read in file contents and dumps it to Swing EDT, at the time when 'Control-C' is pressed, \cat likely has done its job and lots of painting requests have been submitted and queued in Swing EDT queue; Swing won't be able to process that 'Control-C' key event until all of queued request are processed.
Modified \cat, \grep, and \more using buffered I/O on reading files. SVN r29273.
Using the 50 MB text file mentioned in issue #6787 as an example, the execution time to read that file is:
before r29273: ~55 seconds
after r29273: ~1 to 2 seconds
Also modified \grep so that dest_file generated by the command below would be exactly the same as source_file.
prompt$ grep "" source_file > dest_file
Note that \cat, \grep and \more do not perform buffered I/O on reading input from standard input. This is because in the case of reading input from standard input, these commands need to respond right away when a new line is seen; otherwise, for example, the command below will behave differently:
prompt$ cat
text <Enter> <----- user type 'text' followed by Enter key
text <----- \cat should echo back 'text' right away
In the example above, if \cat handles standard input using buffered I/O then 'text' won't be echoed back right away.
So, different I/O handling mechanisms are used by \cat, \grep, \more, as well as \tail on reading input, depends on the source of input. This will produce quite different execution time, depends on the way how these commands are executed. See examples below:
Again, using the 50 MB text file mentioned in issues #6787:
(c1) prompt$ \cat 50MB.txt > tmp.txt
(c2) prompt$ \cat 50MB.txt | cat > tmp.txt
(c3) prompt$ \cat 50MB.txt
Execute commands above on my desktop, execution time is
(t1) c1 takes ~1 to 2 seconds
(t2) c2 takes ~18 seconds
(t3) c3 takes ~75 seconds
From (t1), we know it only takes \cat 1 second to read that 50 MB file into memory and then write it out to disk.
In (c2), there are two \cats command involved, where is the additional 17 seconds spent? Likely because the second \cat did not read input using buffered I/O.
For (c3), why it is so slow? 75 seconds?
Command (c3), i.e. '\cat 50MB.txt', is to write the contents of 50MB.txt to fluid terminal. In fluid shell, the way how writing text to fluid terminal works is that text is always passed to Swing EDT thread which is responsible for painting text on screen. From (t1), we know only 1 second is used to read in 50 MB file (in a non-Swing thread), the rest of 74 seconds likely is spent in the Swing thread.
Command (c3) takes time to complete, if you press 'Control-C' while (c3) is still running, you are not likely able to kill it. Why? In fluid shell, 'Control-C' is a key event which is handled by Swing EDT thread. As explained in the previous paragraph, it only takes \cat command (running in a non-Swing thread) 1 second to read in file contents and dumps it to Swing EDT, at the time when 'Control-C' is pressed, \cat likely has done its job and lots of painting requests have been submitted and queued in Swing EDT queue; Swing won't be able to process that 'Control-C' key event until all of queued request are processed.
Verified using ADStudio 12 RC 23.
\cat, \grep & \more commands took under 5 seconds (about 1-2 seconds) to execute on a 50MB file without using Standard Input (\cat 50mb > out)
More than 5 seconds otherwise:
* about 8-9 seconds if doing a pipe and then writing the result to file (\cat 50mb | cat > out)
* up to 30 seconds if writing the content in FS tab in case of \cat 50mb, however it was a lot faster if using \grep "" 50mb, about 8-9 seconds --- added as #7803 - Speed diff in writing large files to screen using \cat vs \grep & \tail.
Closed.
Verified using ADStudio 12 RC 23.
\cat, \grep & \more commands took under 5 seconds (about 1-2 seconds) to execute on a 50MB file without using Standard Input (\cat 50mb > out)
More than 5 seconds otherwise:
* about 8-9 seconds if doing a pipe and then writing the result to file (\cat 50mb | cat > out)
* up to 30 seconds if writing the content in FS tab in case of \cat 50mb, however it was a lot faster if using \grep "" 50mb, about 8-9 seconds --- added as #7803 - Speed diff in writing large files to screen using \cat vs \grep & \tail.
Closed.
Issue #7586 |
Closed |
Fixed |
Resolved |
Completion |
No due date |
Fixed Build trunk/29273 |
No time estimate |
3 issue links |
relates to #7803
Issue #7803Speed diff in writing large files to screen using \cat vs \grep & \tail |
relates to #6787
Issue #6787Performance issue when I tail files that return large amount of data to screen |
relates to #7174
Issue #7174more, tail and exec with output redirection using binary files |
Need to validate all of test cases described in issue #7174 after change is made.