agg

Aggregate CSV files


Keywords
aggregate, csv-files, python-library
License
Apache-2.0
Install
pip install agg==0.3.1

Documentation

Agg

Supported Python Versions Last commit pypi version

A Python library to aggregate files and data. This release supports merging two or more csv files.

Documentation

merge_csv(files_to_merge: tuple,
          output_file: Union[str, pathlib.Path],
          first_line_is_header: Optional[bool] = None) -> dict:

The method merge_csv merges multiple CSV files in the order they are specified. It will overwrite any existing file with the same name.

Parameters:

  • files_to_merge: A tuple containing paths to a files in the order they are to be merged.
  • output_file: The path to the result file. The folder must already exist. An existing file with the same name will be overwritten.
  • first_line_is_header: if True agg will remove the first line of all csv files except for the first. If not set agg will guess if the first line is a header or not.

Its return value is a dictionary containing:

  • a SHA256 hash of the result file,
  • the name of the result file,
  • its absolute path,
  • a boolean indicating whether the first line is a header or not,
  • its size in bytes,
  • its number of lines (including the header),
  • a list of the files merged (absolute path).

Example

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import agg

# tuples are ordered:
my_files = ('file_01.csv', 'file_02.csv')

# Merge the CSV-files - in the order specified by the tuple - into a new file
# called "merged_file". Meanwhile copy the header / first line only once from
# first file.
merged_file = agg.merge_csv(my_files, 'merged_file', True)
# The return value is a dictionary!


print(merged_file)

# {'sha256hash': 'fff30942d3d042c5128062d1a29b2c50494c3d1d033749a58268d2e687fc98c6',
#  'file_name': 'merged_file',
#  'file_path': '/home/exampleuser/merged_file',
#  'first_line_is_header': True,
#  'file_size_bytes': 76,
#  'line_count': 8,
#  'merged_files': ['/home/exampleuser/file_01.csv',
#                  '/home/exampleuser/file_02.csv']
# }

print(merged_file['file_path'])
# '/home/exampleuser/merged_file'