wepana: A Web Page Analyzer
__ _____ _ __ __ _ _ __ __ _
\ \ /\ / / _ \ '_ \ / _` | '_ \ / _` |
\ V V / __/ |_) | (_| | | | | (_| |
\_/\_/ \___| .__/ \__,_|_| |_|\__,_|
|_|
Wepana is an analyzer for web page content powered by Python.
Requirement
- Python 3
Features
- Auto load content from url.
- Load content from file.
- Load content from string value.
- Get image urls.
- Get html link src target urls.
- Get meta information.
- Get keyword information.
Usage
Load
Import analyzer
from wepana import WebPageAnalyzer
Load from url.
# load with init
site = WebPageAnalyzer(url='http://github.com')
# load before init
site.connect('http://github.com')
Load from file.
analyzer = WebPageAnalyzer()
analyzer.read_file('/path/to/the/file.html')
Load from text.
analyzer = WebPageAnalyzer()
analyzer.read_text('text content')
Check analyzer is ready.
analyzer.read()
Reset analyzer.
analyzer.reset()
Analyze
Get title.
analyzer.get_title()
Get keywords.
analyzer.get_keywords()
Get images.
analyzer.get_images()
Get links.
analyzer.get_links()
Contributing
- Fork it.
- Create your feature branch. (
$ git checkout feature/my-feature-branch
) - Commit your changes. (
$ git commit -am 'What feature I just added.'
) - Push to the branch. (
$ git push origin feature/my-feature-branch
) - Create a new Pull Request
Authors
License
The MIT License (MIT). For detail see LICENSE.