cw - Count Words

A fast wc clone in Rust.

Synopsis

-% cw --help
cw 0.5.0
Thomas Hurst <tom@hur.st>
Count Words - word, line, character and byte count

USAGE:
    cw [FLAGS] [OPTIONS] [input]...

FLAGS:
    -c, --bytes              Count bytes
    -m, --chars              Count UTF-8 characters instead of bytes
    -h, --help               Prints help information
    -l, --lines              Count lines
    -L, --max-line-length    Count bytes (default) or characters (-m) of the longest line
    -V, --version            Prints version information
    -w, --words              Count words

OPTIONS:
        --files0-from <files0_from>    Read input from the NUL-terminated list of filenames in the given file.
        --files-from <files_from>      Read input from the newline-terminated list of filenames in the given file.
        --threads <threads>            Number of counting threads to spawn [default: 1]

ARGS:
    <input>...    Input files

-% cw Dickens_Charles_Pickwick_Papers.xml
 3449440 51715840 341152640 Dickens_Charles_Pickwick_Papers.xml

Performance

Counts of multiple files may be accelerated by use of the --threads option:

  'xargs <files cw --threads=12' ran
    2.01 ± 0.03 times faster than 'xargs <files cw --threads=4'
    7.07 ± 0.09 times faster than 'xargs <files cw'
   11.55 ± 0.15 times faster than 'xargs <files wc'
   17.31 ± 0.23 times faster than 'xargs <files gwc'

Line counts are optimized using the bytecount crate:

  'cw -l Dickens_Charles_Pickwick_Papers.xml' ran
    3.44 ± 0.04 times faster than 'wc -l Dickens_Charles_Pickwick_Papers.xml'
    4.17 ± 0.05 times faster than 'gwc -l Dickens_Charles_Pickwick_Papers.xml'

Line counts with line length are optimized using the memchr crate:

  'cw -lL Dickens_Charles_Pickwick_Papers.xml' ran
    1.73 ± 0.01 times faster than 'wc -lL Dickens_Charles_Pickwick_Papers.xml'
   15.07 ± 0.07 times faster than 'gwc -lL Dickens_Charles_Pickwick_Papers.xml'

Note without -m cw only operates on bytes, and it never cares about your locale.

  'cw Dickens_Charles_Pickwick_Papers.xml' ran
    1.45 ± 0.01 times faster than 'wc Dickens_Charles_Pickwick_Papers.xml'
    2.05 ± 0.00 times faster than 'gwc Dickens_Charles_Pickwick_Papers.xml'

-m enables UTF-8 processing, with a fast-path for just character length, again using bytecount:

  'cw -m Dickens_Charles_Pickwick_Papers.xml' ran
   30.21 ± 0.39 times faster than 'gwc -m Dickens_Charles_Pickwick_Papers.xml'
   70.36 ± 0.91 times faster than 'wc -m Dickens_Charles_Pickwick_Papers.xml'

  'cw -m test-utf-8.html' ran
   84.74 ± 1.12 times faster than 'wc -m test-utf-8.html'
  124.21 ± 1.64 times faster than 'gwc -m test-utf-8.html'

And another path for character and line length:

  'cw -mlL Dickens_Charles_Pickwick_Papers.xml' ran
    3.88 ± 0.01 times faster than 'gwc -mlL Dickens_Charles_Pickwick_Papers.xml'
    9.05 ± 0.02 times faster than 'wc -mlL Dickens_Charles_Pickwick_Papers.xml'

  'cw -mlL test-utf-8.html' ran
    9.42 ± 0.01 times faster than 'wc -mlL test-utf-8.html'
   18.95 ± 0.03 times faster than 'gwc -mlL test-utf-8.html'

And a slow path for everything else:

  'cw -mLlw Dickens_Charles_Pickwick_Papers.xml' ran
    1.35 ± 0.00 times faster than 'gwc -mLlw Dickens_Charles_Pickwick_Papers.xml'
    3.15 ± 0.00 times faster than 'wc -mLlw Dickens_Charles_Pickwick_Papers.xml'

These tests are on FreeBSD 12 on a 2.1GHz Westmere Xeon. gwc is from GNU coreutils 8.30 - note its performance here is rather pessimised in some areas by FreeBSD's rather weak memchr implementation. YMMV.

For best results build with:

cargo build --release --features runtime-dispatch-simd

This enables SIMD optimizations for line and character counting. It has no effect if you count anything else.

Future

Test suite.
Factor internals out into a library. (#1)
Improve multibyte support.
Possibly implement locale.
Replace clap/structopt with something lighter.

cw
Release 0.1.0

Release 0.1.0

0.6.0

0.5.0

0.4.0

0.3.0

0.2.0

0.1.0

Documentation

cw - Count Words

Synopsis

Performance

Future

See Also

uwc

rwc

linecount

Stats

Development practices

Releases

Maintainers

Contributors

cw Release 0.1.0

Release 0.1.0 Toggle Dropdown 0.6.0 0.5.0 0.4.0 0.3.0 0.2.0 0.1.0

Documentation

cw - Count Words

Synopsis

Performance

Future

See Also

uwc

rwc

linecount

Stats

Development practices

Releases

Maintainers

Contributors

cw
Release 0.1.0

Release 0.1.0

0.6.0

0.5.0

0.4.0

0.3.0

0.2.0

0.1.0