s3-file-concat

Minimal library to concatenate S3 files of different data-types.


Keywords
AWS, S3, append, concat, concatenate, utility, Amazon Web Services, CSV, DOC, text, plain, JSON, string, object, merge, upload, Files, File, file-concatenation, javascript, nodejs, npm-module
License
MIT
Install
npm install s3-file-concat@1.0.7

Documentation

S3 File Concat

Minimal javascript library to concatenate/merge/append AWS S3 files of different data-types. 📦🔗📤

license npm version travis build size code coverage downloads total

Why?

If you are already using S3, you would know that the files are immutable in S3. Meaning you cannot edit or modify the contents of a file. Instead, you can only create a new file. Out of its many use-cases, S3 is prevalently used for dumping small to large data files/logs. Let's assume that you want to append all the data files inside a folder into one file, you cannot do it with an S3 API. That's where this minimal library comes into the picture.

Files ?? What kind of files can you append/concatenate?

This can append all the text-based files that support utf-8 format and also JSON files.

  • .doc/.docx
  • .csv/.xls/.xslx
  • .txt/plain
  • .json

How does it handle different data-types?

The library makes use of JSON.parse API, which would parse all object based data-types and throws for strings. The data returned from the API is sanitized, and the output is either concatenated or merged.

  • Files containing large lists of strings/number/utf8 char codes are
    concatenated.
  • Files containing large arrays (of strings and objects) are
    concatenated into one file consisting of the concatenated array.
  • Files containing objects are the only exception here. Pure objects are
    merged instead (the last file overrides all the other keys).

Important Note: Merging two files of different data-types will result in an error. Eg: you cannot merge a text file and a JSON consisting only an array/object. That's assumed to be a wrong kind of concatenation/merge.

Requirements

  • Node version (8.9.1) and above.
  • AWS-SDK.

Installation

Use the package manager npm to install dependencies.

npm  install  s3-file-concat --save  

Usage

const  S3FileConcat  =  require('s3-file-concat');  
  
// this libray uses factory function composition against using classes for better encapsulation.  
// so its free from using `new` and `this` :)  
  
const  {  concatFiles  }  =  S3FileConcat({  
  "accessKeyId":  "<your AWS access key>",  
  "secretAccessKey":  "<your secret key>",  
  "region":  "<your region>",  
  "bucket":  "<your bucket>"  
});  
  
/** concatFiles method accepts two arguments and returns a promise.  
1) Keys of the S3 files we want to concatenate/append/merge.  
2) Options (optional object):  
i) targetFileKey: "<mybucket/myfolder/myConcatendatedFile>"
ii) deleteSourceFiles: '<true or false>' // defaulted to false.  
*/

// Appending arrays files
let keys = [
  'testdata/arrays/array1.json',
  'testdata/arrays/array2.json',
  'testdata/arrays/array3.json'
];
// this creates a new file on s3 with appended content.
let targetFileKey = 'testdata/arrays/array_append.json';

// Appending CSV files
keys = [
  'testdata/csv_files/csv1.csv',
  'testdata/csv_files/csv2.csv',
  'testdata/csv_files/csv3.csv'
];
targetFileKey = 'testdata/csv_files/csv_append.csv';

// Appending doc files
keys = [
  'testdata/doc_files/doc1.doc',
  'testdata/doc_files/doc2.doc',
  'testdata/doc_files/doc3.doc'
];
targetFileKey = 'testdata/doc_files/doc_append.doc';

// Merging objects files
keys = [
  'testdata/objects/object1.json',
  'testdata/objects/object2.json',
  'testdata/objects/object3.json'
];
targetFileKey = 'testdata/objects/object_append.json';

// Appending plain files
keys = [
  'testdata/plain/plain1',
  'testdata/plain/plain2',
  'testdata/plain/plain3'
];
targetFileKey = 'testdata/plain/plain_append';

// Appending txt files
keys = [
  'testdata/txt_files/file1.txt',
  'testdata/txt_files/file2.txt',
  'testdata/txt_files/file3.txt',
  'testdata/txt_files/file4.txt'
];
targetFileKey = 'testdata/txt_files/file_append.txt';
  
// Note: when targetFileKey is empty, then we assume that 
// the user wants to use the appended/merged data and 
// return him appended/merged data in the promise resolve.  
  
s3FileConcat.concatFiles(keys,  {  
  targetFileKey  
})
// when no options are passed output consists of the appended
// file contents, sometimes a  huge dataset waiting to be used.  
.then(output  =>  console.log('All done ',  output)) 
.catch(error  =>  console.log('error occured oops ',  error));  

P.S: If the readability is not great, please check the example folder.

Example Usage

If the usage above is not clear, fear not. I have made a detailed usage scenario with different file types. Here you will find how you can pass options pass and the s3 file Keys in detail for all the file most common file types including arrays, objects and text files.

Testing

  • Jest
  • aws-sdk-mock

Unit tests are implemented using Jest, test-cases for concatenating/merging all types of files are covered. Since AWS S3 requires a network operation, they are mocked to buffer our local test-data. You can safely assume that the test data files are present on your target AWS S3 Bucket.

The AWS test data is present here, at s3BucketTestFiles.

Roadmap

  • 100% code coverage.
  • Support legacy versions.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT © Rajesh Babu