#7875: Performance problem on command pipeline

funfun reported 2012-10-27T23:47:38Z · last modified 2012-11-07T00:22:22Z

Performance problem on command pipeline

Dev	funfun
QA	NielsGron

Priority	Low
Complexity	Unknown

Component	App - FluidShell
Version	12.0

This issue is to follow up the test case below mentioned in issue #7803:
   :$ cat enwiki-20121001-pages-articles-multistream-index.txt2 | grep oracle
   // 1m 26 seconds in FluidShell.
   // 2 second in Ubuntu/Bash
I replaced the test case above as
   :$ cat enwiki-20121001-pages-articles-multistream-index.txt2 | grep pattern-match-nothing
so that nothing will be written to FS terminal.

I tried above test case on my desktop (file size is 567,746,536 bytes) and measured CPU statistics using Java VisualVM; result shows I/O on pipeline consumes half of the test time; why? we need to look into this.

Profiling result shows 1/3 of the overall time is consumed by \grep on read, \grep uses non-buffered I/O to read input from standard input which can be imporved with buffered I/O if \grep is executed inside a pipeline.

1 attachment

Java VisualVM profiling

2012-10-27T23:47:38Z

97 KB

All Comments (2) Change History

funfun 2012-10-28T00:08:06Z

> Profiling result shows 1/3 of the overall time is consumed by \grep on read, \grep uses non-buffered I/O to read input from standard input which can be improved with buffered I/O if \grep is executed inside a pipeline.

This is logged as issue #7876.

funfun 2012-11-04T01:15:20Z

The slowness of pipeline is mainly because the way how command reads input from the standard input (logged as issue #7876). JDK PipedInputStream creates a 1K buffer by default, FS now uses a 4K buffer by default which seems to help in some cases. SVN trunk/r30034.

> cat enwiki-20121001-pages-articles-multistream-index.txt | grep oracle
This command used to take ~1 minute 27 seconds to complete on my desktop, the execution time goes down to ~25 seconds after r30034 is applied.

Here is a test case for showing how PipedInputStream's buffer size might affect command's execution time:
> cat enwiki-20121001-pages-articles-multistream-index.txt | cat > tmp.txt
   Buffer size: 1K - execution time: ~14 seconds.
   Buffer size: 4K - execution time: ~6 seconds.
   Buffer size: 8K - execution time: ~5 seconds.
FS currently creates a 4K buffer.

Search Tips

Aqua Data Studio / nhilam

Performance problem on command pipeline

1 attachment

Issue #7875

Completion

2 issue links

Issue #9070

Issue #7876

Search Tips

Aqua Data Studio / nhilam

Title

Performance problem on command pipeline

1 attachment

Issue #7875

Completion

2 issue links

Issue #9070

Issue #7876