innmind/crawler-app

Crawl the web and publish the graph to an api


License
Other

Documentation

Crawler Robot

Build Status codecov Type Coverage

This is an app to crawl internet and publish resource attributes to a Library.

Installation

composer install
docker-compose up -d

Copy config/.env.dist to config/.env and adapt the url of the amqp server to your need.

Usage

bin/crawler consume crawler

This will launch a consumer to read the urls to crawl

bin/console crawl http://the.url/to/crawl https://innmind_library.host/

This will crawl http://the.url/to/crawl, extract the resource attributes and publish them to the library https://innmind_library.host/. It will automatically detect the api resource to publish to.