filehole
Python library to find missing files in a scheduled delivery.
Simple and quick solution to help finding missing files in a scheduled delivery, particularly when dealing with large amount of files or a long history of file delivery.
Dependencies
- Python 3.8.9 or higher
- Numpy 1.23.0
- Holidays 0.14.2
Install
The latest stable version can always be installed or updated via pip:
$ pip install filehole
Usage
filehole(
path_to_files: str,
file_system: Globable,
date_pattern: str,
date_format: str,
country: str,
subdivision: str = None,
start_date: str = f"{date.today().year}-01-01",
end_date: str = date.today().strftime("%Y-%m-%d"),
week_schedule: str = "1111100",
frequency: str = "D",
repetition: int = 1,
position: int = 1,
)
Parameters:
-
path_to_files
: Wild card enabled string to search for files -
file_system
: Modules that have aglob
function such asglob
in a local environment oradls
in a cloud environment. -
date_pattern
: Regular expression reflecting the pattern in which the date is written in files or directories. -
date_format
: Standard date format of the date written in files or directories. -
country
: Country name or abbreviation for the selection of the holidays calendar. For the exhaustive list of available holidays calendars, please refer to the documentation of theholidays
python library (https://pypi.org/project/holidays/). -
subdivision
: Province, state, ... for the selection of the holidays calendar. The available option can be found in the documentation of theholidays
python library (https://pypi.org/project/holidays/). -
start_date
: Start of the search period. Format:'%Y-%m-%d'
. Default is set to the first day of the current year. -
end_date
: End of the search period. Format:'%Y-%m-%d'
. Default is set to the current date. -
week_schedule
: String of 7 digits of 0 and 1. 1 represents a working day and 0 a non-working day. Week starts on Monday. By default, the working week is set from Monday to Friday included ->'1111100'
. -
frequency
: Takes'D'
for daily delivery,'W'
for weekly delivery and'M'
for monthly delivery. -
repetition
: Default value:1
. Used only for weekly and monthly file delivery. e.g.:repetition=1
-> every week/month,repetition=2
-> every two weeks/months... -
position
: Takes1
for first business day of the month or-1
for last business day of the month.
Description:
Retrieve list of files from a given location.
Extract dates from filenames.
Create a calendar of holidays.
Create a set of expected dates and compare them to the extracted dates.
Return a set of missing dates.
Example:
- Daily file delivery for the month of July according to the french holiday calendar, assuming that files from the 12 and 13 of July are missing:
> filehole(
path_to_files="my_file_path/*.txt",
file_system=glob,
date_pattern=r"[0-9]{8}",
date_format="%Y%m%d",
country="FR",
start_date="2022-07-01",
end_date="2022-07-31",
week_schedule="1111100",
frequency="D",
)
> {datetime.date(2022, 7, 12), datetime.date(2022, 7, 13)}
Limitations
- All files are expected to be at the same level.
- The files or the directory containing the files should contain a date in their name.
- Current version works only with daily, weekly and monthly file delivery.
License
This project is under the MIT License.