This is a tool I put together to allow manipulation of very large CSV data files, I have to deal with at Work. The files have 111 fields and are often almost 500k records....
The data has subsets of data so grabbing what I need by field and time makes life easier.
It is also possible to retrieve a range/list of records, using -r (line number) instead of -t (time range). Using the -all will output all records according to the other parameters set.
You can use the tool with Stdin and Stdout, to pipe from one tool to another or you can specify an input and or output file.
Command switches are as follows:
Tool Usage: -all provide all records to output -blanks int Ignore records if this column is blank (default -1) -c string Which columns to export, eg 1-5 or 1,3-10 etc -comment string Specifiy the delimiter to use (default "#") -delimiter string Specifiy the delimiter to use (default ",") -header include header row -help help for guidance on usage -i string Input CSV file -loose Use strict rules for length of a record -o string Output CSV file -r string Span index of records to export, eg 1-5 or 1,3-10 etc -specific int Limit search to a specific column x, default all (slow) (default -1) -t string Span of time records, eg 10:00:00-16:00:00
for example, the following will read the file and output to the specified file, with a header ignoring the record lengths, columns 0,32 to 85, 96 to 110. In addition it will only match the time on column 0 and use column 32 for ignoring blank lines. Provided the data is between the time span.
./csvtool -i 502_00409D8C3071_20160524.csv -t "24/05/2016 06:00:00.000 +1000-24/05/2016 18:59:59.999 +1000" -loose -o subsecondDataTraction.csv -header -specific 0 -c 0,32-85,96-110 -blanks 32