simplescraper

A simple python web scraper


Keywords
python, scraper, webscraping
License
MIT
Install
pip install simplescraper==0.1.3

Documentation

Python Scraper

A simple web scraper made in python with love:

Install with pip:

pip install simplescraper

Usage:

Make a simple call to a web page, lets call 'www.test.com'

from simplescraper import SimpleScraper

test = SimpleScraper()
result = test.get_scraped_data('www.test.com')
print result

output:

{
    url: 'http://www.test.com',
    source: 'www.test.com',
    image: 'http://www.test.com/some/random/image.png',
    title: 'Just a test page'
}

You can also call it without using the 'www' as 'test.com' if the web page has a redirection. If the web page needs the usage of the https protocol to be accessed you can call it as: 'https://www.test.com', this is not mandatory since the scraper checks if the protocol is needed.

Get iframe:

result = test.get_scraped_data('https://www.youtube.com/watch?v=dQw4w9WgXcQ')

output:

{
    description: 'Rick Astley - Never Gonna Give You Up (Official Music Video) - Listen On Spotify: http://smarturl.it/AstleySpotify Download Rick\'s Number 1 album "50" - http...',
    title: 'Rick Astley - Never Gonna Give You Up',
    url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
    image: 'https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg',
    source: 'www.youtube.com',
    iframe: '<iframe src="https://www.youtube.com/embed/dQw4w9WgXcQ" height="720" width="1280"></iframe>'
}

This is on GitHub so let me know if I've broked it somewhere.

Stuff used to make this: