lf-backup is a tool for backing up large files to object storage , e.g. swift.


Keywords
openstack, swift, cloud, storage
License
Other
Install
pip install lf-backup==0.2

Documentation

LF Backup

LF Backup stands for large file backup.

Installation

RHEL / CentOS:

sudo yum -y install epel-release
sudo yum -y install python34 python34-setuptools python34-devel gcc postgresql-devel
sudo easy_install-3.4 pip

Debian/Ubuntu:

sudo apt-get install -y python3-dev libpq-dev

and then install lf-backup

sudo pip3 install --upgrade pip
sudo pip3 install lf-backup

Configuration

create a new swift container called "large-file-backup"

add the export statements for variables starting with ST_ and the postgres authentication to config file .lf-backuprc and set the permissions to 600. Optionally export PGSQL to override the built-in SQL query.

> nano ~/.lf-backuprc
> chmod 600 ~/.lf-backuprc
> cat ~/.lf-backuprc
export ST_AUTH=https://swiftcluster.domain.org/auth/v1.0
export ST_USER=swift_account
export ST_KEY=RshBXXXXXXXXXXXXXXXXXXXXX
export PGHOST=pgdb.domain.org
export PGPORT=32048
export PGDATABASE=storcrawldb
export PGUSER=xxxxxxxx
export PGPASSWORD= 

create a cron job /etc/cron.d/ running as root starting ca 7pm:

> cat /etc/cron.d/lf-backup
## enabled on hostname xxx on 11-01-2016
55 18 * * * root /usr/local/bin/lf-backup --prefix /fh/fast \
           --container large-file-backup-fast --sql >> /var/tmp/lf-backup-fast 2>&1

Examples

lf-backup -C frobozz -c filelist.csv

Read list of files from 1st column of 'filename.csv' and backup to Swift container 'frobozz' using environment for authentication.

lf-backup -C grue -s

Query the database specified in the environment for the files and backup to Swift container 'grue' using environment for authentication.

lf-backup -C flathead -r 7 --prefix /fh/fast/restore42

Restore all objects in Swift container 'flathead' newer than 7 days back to current environment. The optional --prefix parameter specifies a destination path where objects will be restored.

Test & Dev

For modifications and change testing install a new system and install from local git folder

> git clone https://github.com/FredHutch/lf-backup
> rm -rf /usr/local/lib/python3.5/dist-packages/*; rm -rf /usr/local/bin/*
> pip3 install -e ./lf-backup

make changes in lf-backup and run again:

> rm -rf /usr/local/lib/python3.5/dist-packages/*; rm -rf /usr/local/bin/*
> pip3 install -e ./lf-backup

The script had the following original feature requests:

  • take a file list from CSV file or SQL DB and backup each file to object storage (e.g. swift)

  • restore files from object storage newer than specified number of days (>1)

  • if the file has an atime within the last x days (configurable) take an md5sum of that file and store the md5sum in an attribute / meta data called md5sum (not yet implemented)

  • check if the file is already in object store and do not upload if the file size and mtime is identical

  • notify a list of email-addresses after finishing. attach list of files that were uploaded; create one file list per file owner (username)

  • log every file that was uploaded to syslog, detailed logging of success and failure to enable storage team to monitor success / failure via splunk

  • bash script lf-backup is a wrapper for python script lfbackup.py, lf-backup sources and sets env vars with credentials and lfbackup.py only reads environments vars

  • main script lfbackup.py only uses swift functions in lflib.py.

  • segment size should be 1GB, segment container name should be .segments-containername, object type is slo, not dlo

  • backup with full path but replace prefix, for example a file /fh/fast/lastname_f/project/file.bam would be copied to container/bucket bam-backup in account Swift__ADM_IT_backup. The target path would be /bam-bucket/lastname_f/project/file.bam a --prefix=/fh/fast removes the fs root path from the destination