python-html-assert

partial matching of html using a tree-based specification


License
MIT
Install
pip install python-html-assert==0.2.1

Documentation

python-html-assert

Utility for asserting the structure and content of HTML in python.

Important Notes:

  • Only works on Python 3
  • This is very much in alpha, there will be bugs for now.

Installation

pip install git+git://github.com/robjohncox/python-html-assert.git

Philosophy

Test that assert HTML are often fragile, with minor changes breaking tests. Furthermore, they are often difficult to maintain because they require non-trivial x-path style queries to navigate through the HTML. More often than not, the changes that break tests are caused by changes you don't even care about from the perspective of your test assertions.

Therefore, the goal of this utility is to provide a way to assert just the parts of an HTML document that you care about, in a way that helps make your test less fragile. The basis of this is the specification (from here onwards spec) which describes what you want to assert. This is a tree structure that mimics HTML, where you only include the pieces of the document you want to assert. This is then matched against the HTML document to verify that the items you specify appear in the HTML, and that they follow the structure you specify, whilst ignoring all the other parts of the HTML that you do not care about.

Example

Lets say we have the following HTML document (either static, or generated from a web framework like Django):

<html>
  <head>
    <title>My Document</title>
  </head>
  <body>
    <h1>My Document</h1>
    <div class="main-content">
      <p>This is some text.</p>
      <table id="important-table">
        <tr><td>A</td><td>One</td></tr>
        <tr><td>B</td><td>Two</td></tr>
      </table>
    </div>
  </body>
</html>

We assert the items that we care about:

def test_my_html(self):
    html_src = "..."

    spec = html(
        heading('My Document'),
        div(
            text('This is some text.'),
            elem('table', id='important-table')))
    result = html_match(spec, html_src)
    self.assertTrue(result.passed)

Now, even if there are changes to the HTML that we don't care about, our test may not break. For example, lets say that the following happens:

  • New attributes are added to some elements.
  • Elements we care about get nested inside additional div elements.
  • A change in the underlying test data adds more rows to the table.

Even though these impact the structure of the document, when we match the spec to the document it will be reported as OK, because the items that we care about still appear correctly and in the right place. However, if changes happen to the HTML document such that:

  • The content of the paragraph or the heading changes.
  • The table gets moved out of the main content div.
  • The HTML element wrapping the document goes missing.

Then the assertion will fail.

Running the Test Suite

The test suite can be run with the following command, in an environment where the requirements have been installed.

py.test