Get package:
You can download from Github Or use:
pip install webdow
Who is this package for ?
-
If you want to download a html source from one or multiple webpage.
-
If you want to download the html source of a webpage that is continusly loaded as you scroll down.
NOTE: I HAVE TESTED THIS PACKAGE ON LINUX AND IT MAY WORK ON OTHER OS.
Requirments:
cd /path/to/webdow/
sudo ./install_requirements.sh
Run the above code to automatically install all requirments or follow the steps below. In case you get error "sudo: ./Install_requirements.sh: command not found" Then make the install_requirments.sh excecutalble.
If you have not installed Google Chrome:
sudo apt-get install google-chrome-stable
If you have installed Google Chrome:
If you don't know the current version run this.
sudo apt-get upgrade google-chrome-stable
All these are mandatory (Ignore if installed):
sudo apt-get install xvfb
sudo install python-pip
sudo -H pip install pyvirtualdisplay
sudo -H pip install selenium
# For chrome-driver(If 32-bit system use "https://chromedriver.storage.googleapis.com/2.30/chromedriver_linux32.zip"):
wget -N https://chromedriver.storage.googleapis.com/2.30/chromedriver_linux64.zip -P ~/
unzip ~/chromedriver_linux64.zip -d ~/
rm ~/chromedriver_linux64.zip
sudo mv -f ~/chromedriver /usr/local/share/
sudo chmod +x /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
How to Use?
Importing this package into your script:
from Webdow import ExtractPage
'''
Get the source code from the webpage.
url: The url from which you need to get the source code
scroll_time: The time taken for the webpage to load when you scroll to the bottom (This depends on you internet speed). By default it is 10 sec.
'''
src = ExtractPage.gethtml("url/to/webpage", scroll_time = 5)
'''
Writes the html contents to a file.
src: The source of Html file.
filePath: the path of the file where the file has to be written.
NOTE: The path has to include the filename with '.html' extention.
'''
ExtractPage.write_html(src,filePath):
Author:
Name: arvind
Email: arvindsinc2@hotmail.com
Terms and Condition:
Anyone can use this anywhere by giving credits.