cmdfor

Run cmd for every line of input


License
MIT
Install
pip install cmdfor==0.1.5

Documentation

cmdfor

nd for every line of input

Features

A shell utility (and package) which runs a command for every line of input.

It allows for spawning an arbitrary number of concurrent threads, and control over where to keep each commands output.

In my daily work, I have to run all manner of commands on huge batches of items. These things are usually not CPU bound, so it makes sense to multithread these tasks.

Thus, I find myself doing bash commands such as the following, which takes an input file of items, splits it into equal(ish) parts, and then spawns a worker for each part, all the while keeping granular logs and return codes:

lines=`wc -l domains.txt | awk '{print $1}'`; threads=10; split=$(((lines/threads)+1)); mkdir -p in out; split -d -l ${split} domains.txt in/part. ; ls in/ | while read -r f; do cat in/${f} | while read -r d; do host -t a "${d}" > out/${d} 2>&1; echo -e "${d}\t$?"; done > log.${f} & echo ${!}; done > pids

That gets pretty tiring to type all the time. Why not use xargs -P you say? Well that works perfectly fine for cases where I don't need to make very complicated commands, and don't need to log all return codes. Maybe I can do all of that with xargs, but I wanted to make this anyway as a learning experience.

How-To

The program can take input from STDIN or from a file passed with the -i option.

All arguments that aren't options are considered the subcommand to run. All wildcards {} are replaced with the corresponding positional field from the input data.

To delete a list of files, basically the same behaviour as xargs:

cat files.txt | cmdfor rm {}

To run the fictional command imaplogin for every line of a csv that contains <email>,<password> fields, logging each individual command's output to an file in the directory ./out:

cat email_users.csv | cmdfor -d, -o ./out -- imaplogin -u {} -p {}

To look up the IP addresses of a huge amount of hostnames, using 10 concurrent threads, and storing each individual commands stdout and stderr in seperate files in the directory ./results, with each file being named after the hostname on which the query was performed:

cat hostnames.txt | cmdfor -t 10 -Eo ./results -l 1 -- host -t a {}

To-Do

1. Come up with a real test case. Since this is a shell utility and really only deals with shell subcommands, I don't know what will work and what won't on travis.ci (can I run a shell command there?) 2. By default, it suppresses all output from subprocesses, and writes a message to STDOUT for each process spawn and reap. This output is too verbose for the default behaviour, and so it should be toggled with -v. The default should be quitier and simpler. Perhaps just the returncodes of each task. 3. Refactoring some stuff to be a little less messy. The function signatures are huge, and there are messages generated in odd places. I think it would be better to pass a context object.