aiounfurl

Making site preview


Keywords
async, embed, preview
License
BSD-3-Clause
Install
pip install aiounfurl==0.2.0

Documentation

Build Status Coverage Status

aiounfurl

Using this library you can extract meta information from web pages and create site preview. The library uses four sources of information:

  1. oEmbed
  2. Open Graph
  3. Twitter Cards
  4. HTML meta tags

Requirements

  • python 3.5
  • aiohttp
  • beautifulsoup4
  • html5lib

Installation

pip install aiounfurl

Example of using

To extract all site data:

import asyncio
import aiohttp
from pprint import pprint
from aiounfurl.views import get_preview_data, fetch_all


async def get_links_data(links, loop):
    results = []
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_all(session, l, loop) for l in links]
        results = await asyncio.gather(*tasks, loop=loop, return_exceptions=True)
    return [{'link':l, 'data': d} for l, d in zip(links, results)]


links = [
    'https://habrahabr.ru/post/314606/',
    'https://www.youtube.com/watch?v=9EftQMnuhvU',
    'https://medium.freecodecamp.com/million-requests-per-second-with-python-95c137af319'
]
loop = asyncio.get_event_loop()
result = loop.run_until_complete(get_links_data(links, loop))
loop.close()
pprint(result)

Server example.

Full example you can find here.

Install required packages for running example:

pip install -r example/requirements.txt

Run python srv.py runserver, then open http://127.0.0.1:8080/

Running the example in Docker

I added a docker image with the example in http://hub.docker.com/ to run the sample as a separate independent service.

Running in the background:

docker run --name aiounfurl -p 8080:8080 -d tigorc/aiounfurl

then you can open our example http://127.0.0.1:8080/.

Using the list of oEmbed providers (a json file with a list of providers /path_to_file/providers.json has to be preliminarily created):

docker run --name aiounfurl -p 8080:8080 -e "OEMBED_PROVIDERS_FILE=/srv/app/providers.json" -v /path_to_file/providers.json:/srv/app/providers.json -d tigorc/aiounfurl

Tests

Install the tox package and run command:

tox