souperscraper

A simple web scraper base combining Beautiful Soup and Selenium


Keywords
web-scraping, scraping, easy, beautifulsoup4, beautifulsoup, bs4, selenium, selenium-webdriver, web-sc
License
Other
Install
pip install souperscraper==1.0.2

Documentation

SouperScraper

A simple web scraper base that combines BeautifulSoup and Selenium to scrape dynamic websites.

Setup

  1. Install with pip
pip install souperscraper
  1. Download the appropriate ChromeDriver for your Chrome version using getchromedriver.py (command below) or manually from the ChromeDriver website.

To find your Chrome version, go to chrome://settings/help in your browser.

getchromedriver
  1. Create a new SouperScaper object using the path to your ChromeDriver
from souperscraper import SouperScraper

scraper = SouperScraper('/path/to/your/chromedriver')
  1. Start scraping using BeautifulSoup and/or Selenium methods
scraper.goto('https://github.com/LucasFaudman')

# Use BeautifulSoup to search for and extract content
# by accessing the scraper's 'soup' attribute
# or with the 'soup_find' / 'soup_find_all' methods
repos = scraper.soup.find_all('span', class_='repo')
for repo in repos:
    repo_name = repo.text
    print(repo_name)

# Use Selenium to interact with the page such as clicking buttons
# or filling out forms by accessing the scraper's
# find_element_by_* / find_elements_by_* / wait_for_* methods
repos_tab = scraper.find_element_by_css_selector("a[data-tab-item='repositories']")
repos_tab.click()

search_input = scraper.wait_for_visibility_of_element_located_by_id('your-repos-filter')
search_input.send_keys('souper-scraper')
search_input.submit()

BeautifulSoup Reference

Selenium Reference