scrape_files

`scrape_files` is a tool to help scrape things online to your local machine.


Keywords
python, scraping, html, markdown, image, downloader, images
License
MIT
Install
pip install scrape_files==0.1.5

Documentation

scrape_files is a tool to help scrape things online to your local machine. Currently, it supports scraping and converting htmls to well-formatted markdowns for easy reading as well as scraping and downloading images of various formats in a web page.

Scraping htmls to your local machine

The html parsing logic is similar to a browser's easyread extension's, which trims off all the unnecessary decorations from a web page, only keeping the title and the article content. The main difference is that the file is downloaded as pretty formatted markdown.

Also support scraping links under the <p> tag in the current page concurrently.

Terminal usage:

scrape html <url>     # specify a url for scraping
scrape html <url> -d  # specify a directory name for saving files in current folder
scrape html <url> -l  # specify a level: 1 by default for the current page; 2 for links in the current page

Scraping images to your local machine

Images are scraped and downloaded concurrently. Supported formats: jpg, png, gif, svg, jpeg, webp; defaults to all supported formats.

Terminal usage:

scrape image <url>     # specify a url for scraping
scrape image <url> -d  # specify a diretory name for saving files in current folder 
scrape image <url> -f  # specify image formats separated with space 

Installation

pip install scrape_files