Save the category of site tululu.org offline
Description
The program downloads from tululu.org books in text format and their covers. The following information is also downloaded to the json file:
- title;
- author;
- image path;
- book path;
- comments;
- genres.
After downloading the necessary data, the offline version of the site will be generated (you can see an example here).
Table of content
Installation
Install using poetry:
git clone https://github.com/velivir/tululu-offline
cd tululu-offline
make install
How to use
poetry run python3 tululu_offline/app.py [OPTIONS]
Options
-
category_url
- the category url tululu.org; -
--start_page
- which page to start downloading; -
--end_page
- on which page to finish downloading; -
--dest_folder
- path to the directory with parsing results: pictures, books, JSON; -
--skip_txt
- do not download books; -
--skip_imgs
- do not download images; -
--json_path
- specify your path to *.json file with results; -
--number_of_books_per_page
- number of books per page.
Example run
Run the script with the necessary parameters. For example:
poetry run python3 tululu_offline/app.py http://tululu.org/l55/ --start_page 1 --end_page 3 --skip_txt true --skip_imgs true --number_of_books_per_page 15
The first page of the library will be available at pages/index1.html
.
For developers
How to install with dev dependencies
Install using poetry:
git clone https://github.com/velivir/tululu-offline
cd tululu-offline
make install_dev
Start render website
Run the file render_website.py
with the following options:
-
category_url
- the category url tululu.org; -
--dest_folder
- path to the directory with parsing results: pictures, books, JSON; -
--json_path
- specify your path to *.json file with results; -
--number_of_books_per_page
- number of books per page.
Example:
poetry run python3 tululu_offline/render_website.py http://tululu.org/l55/ --number_of_books_per_page 10 --json_path result/books.json --dest_folder result
How to run lint files
make lint
How to run tests
make test
License
Tululu-offline is licensed under the MIT License. See LICENSE for more information.
Project goal
The code is written for educational purposes in an online course for web developers dvmn.org.