Crawlbug

Crawlbug is a web-crawler NPM module that is designed to be integrated with a Firebase realtime database. Though this was mainly a project for myself, feel free to contact me if you'd like to contribute.

Basic use:

Require crawlbug


var crawlbug = require("crawlbug");

Start a crawl


crawlbug.config({
    apiKey: "myVeryLongAp1key",
    authDomain: "probablysomething.firebaseapp.com",
    databaseURL: "https://probablysomething.firebaseio.com",
    projectId: "projectId",
    storageBucket: "probablysomethingelse.appspot.com",
    messagingSenderId: "numbers"
});

This allows the crawler to write crawl data to your database. Run a database test to see if it is working:


crawlbug.databaseTest();

Then set paths for the crawl data (finished site data, sites to visit data) and start a crawl with ( a root url, whether you're crawling for only unique base URLs, whether you want to crawl relative links, maxUrlNumber)


exports.pathSet("sites", "sitesToVisit");

exports.spider("https://google.com", false, false);

Have fun!

Post an issue if you have problems.

Stats

Dependencies

Dependent packages

Dependent repositories

Total releases

Latest release

Oct 5, 2017

First release

Jul 25, 2017

Stars

Forks

Watchers

Contributors

Repository size

125 KB

SourceRank

crawlbug
Release 5.0.2

Release 5.0.2

5.0.2

5.0.1

5.0.0

4.0.3

4.0.2

4.0.1

4.0.0

3.1.10

3.1.9

3.1.8

Documentation

Crawlbug

Stats

Releases

Contributors

crawlbug Release 5.0.2

Release 5.0.2 Toggle Dropdown 5.0.2 5.0.1 5.0.0 4.0.3 4.0.2 4.0.1 4.0.0 3.1.10 3.1.9 3.1.8

Documentation

Crawlbug

Stats

Releases

Contributors

crawlbug
Release 5.0.2

Release 5.0.2

5.0.2

5.0.1

5.0.0

4.0.3

4.0.2

4.0.1

4.0.0

3.1.10

3.1.9

3.1.8