wrld: Avoid writing loops in shell one-liners
You may think that wrld
is some abbreviated form of "world". This is
not the case. The world is lame. What isn't lame is iterating on stdin
.
Probably my favorite thing to do. In the shell, the sanest way to do
this is with a while read line; do
loop. Forget the world. wrld
is the future of iteration.
Raise your hand if you have ever written this loop:
find -name '*foo.bar' -type f|while read line; do
mv "$line" "$(echo "$line"|sed 's/pat/rep/')"
done
Or the related loop:
for i in *foo.bar; do
cp ... # I'm too lazy even to finish this example.
done
- Note:
- if you have ever written a loop that starts with the words
for i in $(ls ...
, you're doing it wrong. Do one of the above instead. (also, thewhile read line; do
version can also fail if there are filenames with newlines, which you might have if you're iterating on filenames generated by an idiot.)
With wrld
, you can write like this: find -name '*foo.bar' -type f
| wrld mv {} '@sed "s/pat/rep/"'
. You can do something similar with
globs as well: wrld mv {} '@sed "s/pat/rep/"' -f *foo.bar
This is
manifestly better for one-liners in the shell.
You could also think of it as xargs -I{}
or the -exec
flag from
find
on steroids, because it iterates on stdin, but it also allows
inlining arbitrary shell commands.
$ ls|wrld mv {} '@awk "{print $2, $1}"'
mv 'Arnold Palmer' 'Palmer Arnold'
mv 'Jane Doe' 'Doe Jane'
mv 'John Doe' 'Doe John'
mv 'John Wayne' 'Wayne John'
mv 'Lucy Lawless' 'Lawless Lucy'
mv 'Ricki Lake' 'Lake Ricki'
As you can see, inlined commands have the current line piped to their
stdin. If you want to use some poorly-designed command that doesn't read
from stdin as the filter, you can also substitute {}
for the current
line. Use \{}
if you need a literal '{}'. However, if you can't do
it with sed or awk, there's always perl -pe
, and if you can't do it
with perl -pe
, I don't want to know about it. You can also see that
wrld
echos back the commands it constructs. You can shut it up with
-q
/--no-echo
. You can also do a "test run" to see what the
generated commands will be without actually running them, using
-t
/--test
flags.
Because POSIX stupidly allows newlines in file names, this is actually a "dangerous" example unless can guarantee there are no idiot newlines in the file names. For this reason, you may instead specify a list of file names to iterate over (like, preferably with a glob) with the -f/--file-list flag:
$ wrld mv {} '@awk "{print $2, $1}"' -f *
mv 'Doe Jane' 'Jane Doe'
mv 'Doe John' 'John Doe'
mv 'Lake Ricki' 'Ricki Lake'
mv 'Lawless Lucy' 'Lucy Lawless'
mv 'Palmer Arnold' 'Arnold Palmer'
mv 'Wayne John' 'John Wayne'
If you're using a proper shell like fish or zsh, you can do recursive globbing and get quite a lot done this way.
One day, in the far distant future, wrld may support splitting stdin on the null byte for compatibility withfind -print0
. It is a little know fact that any task which a computer is capable of preforming may be prefomed with thefind
command, so compatibility is key.
flags
wrld is stupid about flags with the command it wraps. However, courtesy
of argparse
, you can tell wrld that no further flags are coming with
the --
argument.
$ find -type d | wrld -- mv -v {} /some/dir
...
optimize
- Note:
- I/O bound tasks will not benefit much from these optimizations.
As you may note, wrld is capable of spawning a lot of processes. If it's
some quick thing, who cares? If your iterating over a million files, it
might be bad. wrld offers some internal goodies to speed things along,
but they are written in python, so don't expect any miracles! (kind of
kidding. A few lines of python is way faster than spawning a new
process, but it would be much slower than piping a million lines strait
through sed
or whatever optimized C utility).
These builtins are for certain common file operations: they have names like "move", "copy", "hlink" and "slink".
-
move
moves files recursively. It's likemv
without any options. -
copy
copies files recursively. It's likecp -R
. -
hlink
creates hard links. Hard links basically give the same chunk of data more than one name on the filesystem. It's called a "hard" link because of the physiological responce many people experience when they realize how powerful this idea can be. -
slink
creates soft links. These are about like shortcuts on the great and glorious Windows operating system. They are called "soft" links because of what happens to you when you realize the original file has moved and all your links are broken. You never have this problem with "hard" links, but you can't use them across different partitions/devices or on directories, so, eh. -
srlink
expand relative paths to absolute paths when soft linking. Likeln -sr
. -
remove
remove stuff. recursively. take care. -
makedir
makes directories... works likemkdir -p
Other builtins may be added as they occur to me or users ask for them.
mv
, cp
and ln
are commands I frequently find myself needing
in these kinds of loops.
Another way to optimize is by using |
as a prefix to your filters,
rather than @
; i.e. wrld move {} '|awk "{print $2, $1}"' -f *
.
This opens a single process of awk
, filters stdin through that, and
then zips the results together with the main loop. This will create
problems if the filter produces no output for certain lines of input
(like grep
would, though I don't know why you'd use grep in a
context like this...), or if you have filenames with newlines, like a
freak. So, it will work in most cases. One day I may implement this
properly with asyncronous piping, so this won't be a problem.
Note that, until this becomes an asyncronous pipe, this is a speed enhancement, but piping in this way consumes additional memory, which may make it infeasable for very large tasks in a low memory environment.
There are also two buitin filters. @py
allows you to use arbitrary
python expressions as a filter. The current line or filename is
available in the execution context as i
.
$ wrld move {} '@py i.upper()' -f *
move 'Arnold Palmer' 'ARNOLD PALMER'
move 'Jane Doe' 'JANE DOE'
move 'John Doe' 'JOHN DOE'
move 'John Wayne' 'JOHN WAYNE'
move 'Lucy Lawless' 'LUCY LAWLESS'
move 'Ricki Lake' 'RICKY LAKE'
@py
uses a little namespace magic that will import any module you
happen to use in your expression on demand. Note that only expressions
and not statements are supported. @py
combined with -f
should
also do the right thing with newlines in file names.
The other builtin filter is s
. The syntax looks a bit like sed
,
but it's python regex, so refer to the relevant docs if you're not
already familiar with it. It's based on Perl, like the regex in most
popular programming langauges (and unlike sed), but it has a few of its
own quirks.
$ wrld move {} 's/[aeiou]/λ/g' -f *
move 'Arnold Palmer' 'Arnλld Pλlmλr'
move 'Jane Doe' 'Jλnλ Dλλ'
move 'John Doe' 'Jλhn Dλλ'
move 'John Wayne' 'Jλhn Wλynλ'
move 'Lucy Lawless' 'Lλcy Lλwlλss'
move 'Ricki Lake' 'Rλckλ Lλkλ'
It accepts any flags that can be used in a python regex in the contex of
(?[flags])
, so, aiLmsux
. In addition, the g
flag is
supported, to make it more similar to sed and Perl. While /
is used
as the delimiter by convention, any non-alphanumeric character may be
used.
If the replacement is prefixed with \e
, a python expresison can be
used, where m
is the re.match object for each match, so that offers
some interesting possibilities.
I can neither confirm nor deny that there may be another filter in my
mind for doing awk-like things based on python's str.filter
method.