DOM is native HTML parsing library for PHP. Supports XPath syntax.


Keywords
php, dom, Xpath, native, ejz
License
WTFPL

Documentation

DOM Travis Status for Ejz/DOM

DOM is native HTML parsing library for PHP. Supports XPath syntax.

Quick start

$ mkdir myproject && cd myproject
$ curl -sS 'https://getcomposer.org/installer' | php
$ php composer.phar require ejz/dom:~1.0

Let's begin:

<?php

define('ROOT', __DIR__);
require(ROOT . '/vendor/autoload.php');

use Ejz\DOM;

$yahoo = file_get_contents("http://yahoo.com/");
$dom = new DOM($yahoo);
echo $dom -> find('//title', 0), "\n";
echo $dom -> find('//title/text()', 0), "\n";
<title>Yahoo</title>
Yahoo

Whatever you select by XPath, library returns string or array of strings. No objects!

CLI

Library is adopted for command-line interface (CLI) usage.

$ curl -sSL 'https://raw.githubusercontent.com/Ejz/DOM/master/i.sh' | sudo bash

After installation you can execute:

$ echo "<a href=''>Link</a>" | cli-dom '//a/text()' -
Link
$ echo "<a class='findme'>Find me</a>" | cli-dom '//a[class(findme)]/text()' -
Find me

You can use library to prettify some HTML output:

$ cli-dom -f '//head' 'https://php.net/'
<head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>
        PHP: Hypertext Preprocessor
    </title>
    <link rel="shortcut icon" href="https://php.net/favicon.ico"></link>
</head>

Examples

http://ejz.ru/91/obrabotka-html-dannykh
https://gist.github.com/Ejz/6de001e06c1b1797a7bd

CI: Codeship

Codeship Status for Ejz/DOM

CI: Travis

Travis Status for Ejz/DOM