webdow

A python pacakge to download htmlSource for a webpage, even when the webpage is dynamically loading.


Keywords
python
License
GPL-3.0
Install
pip install webdow==1.2.8

Documentation

Get package:

You can download from Github Or use:

pip install webdow

Who is this package for ?

  1. If you want to download a html source from one or multiple webpage.

  2. If you want to download the html source of a webpage that is continusly loaded as you scroll down.

NOTE: I HAVE TESTED THIS PACKAGE ON LINUX AND IT MAY WORK ON OTHER OS.

Requirments:

cd /path/to/webdow/
sudo ./install_requirements.sh

Run the above code to automatically install all requirments or follow the steps below. In case you get error "sudo: ./Install_requirements.sh: command not found" Then make the install_requirments.sh excecutalble.

If you have not installed Google Chrome:
sudo apt-get install google-chrome-stable	
If you have installed Google Chrome:

If you don't know the current version run this.

sudo apt-get upgrade google-chrome-stable
All these are mandatory (Ignore if installed):
sudo apt-get install xvfb
sudo install python-pip
sudo -H pip install pyvirtualdisplay
sudo -H pip install selenium

# For chrome-driver(If 32-bit system use "https://chromedriver.storage.googleapis.com/2.30/chromedriver_linux32.zip"):

wget -N https://chromedriver.storage.googleapis.com/2.30/chromedriver_linux64.zip -P ~/
unzip ~/chromedriver_linux64.zip -d ~/
rm ~/chromedriver_linux64.zip
sudo mv -f ~/chromedriver /usr/local/share/
sudo chmod +x /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver

How to Use?

Importing this package into your script:
from Webdow import ExtractPage

'''
Get the source code from the webpage.
url: The url from which you need to get the source code
scroll_time: The time taken for the webpage to load when you scroll to the bottom (This depends on you internet  speed). By default it is 10 sec.
'''
src = ExtractPage.gethtml("url/to/webpage", scroll_time = 5)

'''
Writes the html contents to a file.
src: The source of Html file.
filePath: the path of the file where the file has to be written. 
NOTE: The path has to include the filename with '.html' extention.
'''
ExtractPage.write_html(src,filePath):

Author:

Name: arvind

Email: arvindsinc2@hotmail.com

Terms and Condition:

Anyone can use this anywhere by giving credits.