โฌโโโโโโฌ โฌโโโโโโโฌ โฌ โโโโโฌโโโโโฌโโ
โโดโโโค โโฌโโ โ โโ โ โโโค โ โ โโโฌโ
โด โดโโโ โด โโโโโโโดโโโดโโโด โด โด โโโโดโโ
Compares text in a file to reference/glossary/key-items/dictionary.
https://pypi.org/project/keycollator/0.0.3/
๐๏ธ Structure
.
โ
โโโ assets
โ โโโ images
โ โโโ coverage.svg
โ
โโโ docs
โ โโโ cli.md
โ โโโ index.md
โ
โโโ src
โ โโโ __init__.py
โ โโโ cli.py
โ โโโ keycollator.py
โ โโโ test_keycollator.py
โ โโโ extractonator.py
โ โโโ requirements.txt
โ โโโdata
โ โโโ (placeholder)
โ โโโ (placeholder)
โ
โโโ tests
โ โโโ test_keycollator
โ โโโ __init__.py
โ โโโ test_keycollator.py
โ
โโโ COD_OF_CONDUCT.md
โโโ CONTRIBUTING.md
โโโ LICENSE
โโโ make-venv.sh
โโโ Makefile
โโโ pyproject.toml
โโโ README.README
โโโ README.rst
โโโ setup.cfg
โโโ setup.py
๐ Features
- Extract text from file to dictionary
- Extract keys from file to dictionary
- Find matches of keys in text file
- Apply fuzzy matching
๐งฐ Installation
๐ฅ๏ธ Install from Pypi using pip3
pip3 install keycollator
๐ Documentation
Official documentation can be found here:
https://github.com/davidprush/keycollator/tree/main/docs
๐ช Supported File Formats
- TXT/CSV files (Mac/Linux/Win)
- Plans to add PDF and JSON
๐ Usage
๐ฅ๏ธ Import keycollator it into Python Projects
from keycollator import ZTimer, KeyKrawler
๐ฅ๏ธ CLI
keycollator uses the CLI
to change default parameters and functions
python3 src/keycollator.py --help
Usage: keycollator.py [OPTIONS] COMMAND [ARGS]...
keycollator is an app that finds occurances of keys in a text file
Options:
-t, --text-file PATH Path/file name of the text to be searched
for against items in the key file
-k, --key-file PATH Path/file name of the key file containing a
dictionary, key items, glossary, or
reference list used to search the text file
-O, --output-file PATH Path/file name of the output file that
will contain the results (CSV or TXT)
-R, --limit-results INTEGER Limit the number of results
-f, --fuzzy-matching INTEGER RANGE
Set the level of fuzzy matching (default=99)
to validate matches using
approximations/edit distances, uses
acceptance ratios with integer values from 0
to 99, where 99 is nearly identical and 0 is
not similar [0<=x<=99]
-U, --ubound-limit INTEGER RANGE
Ignores items from the results with matches
greater than the upper boundary (upper-
limit); reduce eroneous matches
[1<=x<=99999]
-L, --lbound-limit INTEGER RANGE
Ignores items from the results with matches
less than the lower boundary (lower-limit);
reduce eroneous matches [0<=x<=99999]
-v, --set-verbose Turn on verbose
-l, --set-logging Turn on logging
-Z, --log-file PATH Path/file name to be used for the log file
--help Show this message and exit.
๐ฅ๏ธ Turn on verbose output
currently provides only one level for verbose, future versions will implement multiple levels (DEBUG, INFO, WARN, etc.)
keycollator --verbose
๐ฅ๏ธ Apply fuzzy matching
fuzzy matching uses approximate matches (edit distances) whereby 0 is the least strict and accepts nearly anything as a match and more strictly 99 accepts only nearly identical matches; by default the app uses level 99 only if regular matching finds no matches
keycollator --fuzzy-matching=[0-99]
๐ฅ๏ธ Set the key file
each line of text represents a key which will be used to match with items in the text file
keycollator --key-file="/path/to/key/file/keys.txt"
๐ฅ๏ธ Set the text file
text file whereby each line represents an item that will be compared with the items in the keys file
keycollator --text-file="/path/to/key/file/text.txt"
๐ฅ๏ธ Specify the output file
currently uses CSV but will add additional file formats in future releases (PDF/JSON/DOCX)
keycollator --output-file="/path/to/results/result.csv"
๐ฅ๏ธ Set limit results for console and output file
Limit the number of results
keycollator --limit-results=30
๐ฅ๏ธ Set upper bound limit
rejects items with matches over the integer value set, helps with eroneous matches when using fuzzy matching
keycollator --ubound-limit
๐ฅ๏ธ Turn on logging:
turn on logging whereby if no log file is supplied by user it will create one using the default log.log
keycollator --set-logging
๐ฅ๏ธ Create a log file
set the name of the log file to be used by logging
keycollator --log-file="/path/to/log/file/log.log"
Example Output
python3 src/keycollator.py --set-logging --limit-results=30
โ Extracted text.txt items.[[0.16]seconds]
โ Extracted keys.txt items.[[0.25]seconds]
โ Matched keys.txt items to text.txt items.[[76.45]seconds]
โ results.csv Complete.[[76.52]seconds]
โญโโโโโโฌโโโโโโโโโโโโโโโโฌโโโโโโโโฎ
โ No. โ Key โ Count โ
โโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโค
โ 1 โ manage โ 73 โ
โโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโค
โ 2 โ develop โ 62 โ
โโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโค
โ 3 โ report โ 58 โ
โโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโค
โ 4 โ support โ 46 โ
โโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโค
โ 5 โ process โ 43 โ
โโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโค
โ 6 โ analysis โ 36 โ
โโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโค
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
โโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโค
โ 28 โ dashboards โ 11 โ
โโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโค
โ 29 โ sales โ 10 โ
โโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโค
โ 30 โ create โ 10 โ
โฐโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโฏ
โญโโโโโโโโโโโโโโฌโโโโโโโโโฎ
โ Statistic โ Total โ
โโโโโโโโโโโโโโโผโโโโโโโโโค
โ Keys โ 701 โ
โโโโโโโโโโโโโโโผโโโโโโโโโค
โ Text โ 695 โ
โโโโโโโโโโโโโโโผโโโโโโโโโค
โ Matches โ 1207 โ
โโโโโโโโโโโโโโโผโโโโโโโโโค
โ Comparisons โ 376855 โ
โโโโโโโโโโโโโโโผโโโโโโโโโค
โ Logs โ 0 โ
โโโโโโโโโโโโโโโผโโโโโโโโโค
โ Runtime โ 76.60 โ
โฐโโโโโโโโโโโโโโดโโโโโโโโโฏ
๐ฏ Todo ๐
โ Update requirements.txt
โ Add proper error handling
โ Add CHANGELOG.md
โ Update requirements.txt
โ Add functions/methods to handle STOP_WORDS
โ Verify python3 -m nltk.downloader punkt is properly immported
โ
Separating project into multiple files
โ
Add progress inicator using halo when extracting and comparing
โ
Create a logger class (for some reason logging is broken)
โ
KeyKrawler matching is broken
โ
Update README.md(.rst) with correct CLI
โ Create method to KeyKrawler to select and _create missing files_
โ Update CODE_OF_CONDUCT.md
โ Update CONTRIBUTING.md
โ
Format KeyCrawler console results as a table
โ Create ZLog class in extractonator.py (parse out __logit method)
โ Cleanup verbose output (conflicts with halo)
โ Update all comments
โ Migrate click functionality to cli.py
โ
Refactor all methods and functions
โ Test ALL CLI options
๐ Project Resource Acknowledgements
๐ผ Deployment Features
๐ Releases
Currently stage: testing
๐ก License
This project is licensed under the terms of the MIT license. See LICENSE for more details.
@misc{keycollator,
author = {David Rush},
title = {Compares text in a file to reference/glossary/key-items/dictionary file.},
year = {2022},
publisher = {Rush Solutions, LLC},
journal = {GitHub repository},
howpublished = {\url{https://github.com/davidprush/keycollator}}
}