gutenfetch

Bulk downloads from project gutenberg.


License
MIT
Install
pip install gutenfetch==0.0.5

Documentation

Quill NLP Tools and Datasets

Notebooks, scrapers, corpora, and utilities built and maintained Quill.org.

Structure

.
├── LICENSE
├── README.md
├── __init__.py
├── data
├── models
├── notebooks
├── scrapers
├── tests
└── utils

Here is some information about each.

  • data. structured and unstructured documents
  • models. generated tensorflow models
  • notebooks. jupyter notebooks
  • scrapers. data collection tools
  • tests. high level tests
  • utils. useful tools and scripts including document parsing

Show version control how to deal with ipynb files

$ # ensure you are in the top level of the project before running these commands
$
$ source activate <YOUR CONDA ENV>
$ conda install -c conda-forge nbstripout
$ nbstripout --install
$ nbstripout --install --attributes .gitattributes

Running the above commands will ensure generated output from the notebooks is not versioned, but that regular code changes will still be reflected.

Note: this means that switching branches could mean changes to notebook state. Be aware of this and don't be alarmed.