In.parse
0.1.0
Open Collaborative AI Driven Parser builder for Web Scraping, Data Extraction and Crawling,Knowledge Graph
Installing
Install and update using pip
:
pip install -U inparse
Parser Generator
Motivation
- Most painful thing of Web Data Extraction is to write parser rule. the Inparse try to generate the parser by AI according the training web pages.
- Commercial Universal Parser work good in Statistics, but failed in my case .And blackbox to user. Inparser create parser for special website ,web page category。And be correctable and improvable online by yourself.
- Open and free to create parser .Parser rule can be cached locally without remote server if you have concern.
- You will not be charged by usage. Run parser in your own CPU.
Example
from inparse import Inparse
p=Inparse('b45beddc', #parser no is generator by inparse.com parser builder.
'd50cb533f69b6a78892afbd093f95fc1') #access token can be found in your user page .
d=p.parse_url('https://qz.com/india/1413291/trulymadly-ceo-on-how-dating-apps-like-bumble-india-must-localise/')
Inparse.pretty_print(d)
Or parse in raw html
from inparse import Inparse
import requests
p=Inparse('b45beddc', #parser no is generator by inparse.com parser builder.
'd50cb533f69b6a78892afbd093f95fc1') #access token can be found in your user page .
html=requests.get('https://qz.com/india/1413291/trulymadly-ceo-on-how-dating-apps-like-bumble-india-must-localise/').text
d=p.parse(html)
Inparse.pretty_print(d)
Below is output of Article data extraction
{ 'article_body': '<div><p>Last week, American dating app <a '
'href="https://qz.com/india/1413051/priyanka-chopra-invests-in-dating-app-bumble-to-rival-tinder/">Bumble '
'It’s from smaller cities. And varied people are coming '
'from different backgrounds. So that’s really '
'encouraging.</p></div>',
'author': 'Kuwar Singh',
'publish_date': None,
'title': 'Young Indians are using dating apps for so much more than just '
'dating',
'top_image': [ 'https://cms.qz.com/wp-content/uploads/2018/10/AP_900509923043-e1538971405267.jpg?quality=75&strip=all&w=410&h=231']
}
More about Inparse
Contributing
You are welcome to port this SDK to Java, Go ,or any other programming languages.