HtmlUniParser
Universal html parser which can parse every kind of html page
Installation
To install this plugin use composer:
$ composer require kosuha606/html-uni-parser
Usage
There is four available types of parsing html.
Example:
$results = HtmlUniParser::create([
'pageUrl' => 'http://example.com',
'xpathOnCard' => [
'h1' => '//h1',
'description' => 'HTML//p'
]
])->parseCard();
Examples
For more examples see the examples/
direcotry
Description of configurable properties
Property | Description |
---|---|
catalogUrl | The url address for parsing by catalog strategy parseCatalog()
|
searchUrl | The url what used to search on goal site. parseSearch()
|
pageUrl | The url what used to parse one page. parseCard()
|
urlGenerator | Callback function what can be used to generate links to parse parseGenerator()
|
encoding | The encoding of goal site |
siteBaseUrl | Base url for process links after parse |
resultLimit | Here you can limit the results count |
sleepAfterRequest | Number of seconds to sleep after each request |
goIntoCard | Wheather need to go into card when parse catalog links |
xpathItem | Xpath query what can be used for parse items in list |
xpathLink | Xpath query what can be used for parse link inside parsed item |
xpathOnCard | Array of xpath queries, every key will be key in result array |
typeMech | Type of parsing mechanizm, for example: wget , curl , phantomjs , filegetcontents
|
forceOuterHtml | Force parser to use outer html for xpaths |
Available methods
Method | Description |
---|---|
parseCatalog | To parse catalog links and parse every link this function reutrn results as array of parsed links |
parseSearch | This method takes an argument of query string for search page and after building search link it behave like parseCatalog |
parseCard | To parse one page of site |
parseGenerator | To parse links what was generated by urlGenerator callback |
Run tests
To run tests you can use this command:
./vendor/bin/phpunit