Frege indexer library

An indexer library for a Frege project at Jagiellonian University

How to install

This library is published in pip, so you should be able to install it by:

pip3 install fregeindexerlib

and upgrade by:

pip3 install --upgrade fregeindexerlib

How to use

Example usage is available in file example.py.

Basically, you need to implement an abstract fregeindexerlib.indexer.Indexer class: to be more precise only a crawl_next_repository method is needed.

List of methods that can be implemented:

crawl_next_repository(self, prev_repository_id: Optional[str]) -> Optional[CrawlResult]

This method get a string with previously crawled repository id (or None if there was no previously crawled repository) and should return a fregeindexerlib.crawl_result.CrawlResult dataclass filled with proper information about crawled repository or None if there are no more repositories to crawl.

If something unexpected happen then this method should throw a fregeindexerlib.indexer_error.IndexerError exception.

The repository id is an id returned by a code hosting API (not the one generated by an Indexer - generation of proper id for this project is a responsibility of that lib).
before_crawl(self, prev_repository_id: Optional[str])

Method invoked right before a crawl_next_repository method. Get the previously indexed repository id (like a crawl_next_repository method). Default implementation is empty.
after_crawl(self, crawl_result: CrawlResult)

Method invoked right after a crawl_next_repository method. Get a CrawlResult returned by a crawl_next_repository call. Default implementation is empty.
on_successful_process(self, crawl_result: CrawlResult)

Method invoked after successful save a crawl result into a database and successful send a message to the download queue. Get a CrawlResult returned by a crawl_next_repository call. Default implementation is empty.
on_error(self, exception: Exception)

Method invoked when exception occur during a crawling, saving into a database or pushing a message to a queue. Get an exception that occur. Default implementation is empty.

Probably there is no need to handle this situation, because library itself handle it.

When you implement a method(s) from a Indexer in your own class then create an instance of this class, passing a proper parameters to its constructor. Here is its definition:

__init__(self, indexer_type: IndexerType, rabbitmq_parameters: RabbitMQConnectionParameters,
                 database_parameters: DatabaseConnectionParameters, rejected_publish_delay: int)

where:

indexer_type is an fregeindexerlib.indexer_type.IndexerType enum - choose a proper indexer that you implement.
rabbitmq_parameters is a fregeindexerlib.rabbitmq_connection.RabbitMQConnectionParameters dataclass
database_parameters is a fregeindexerlib.database_connection.DatabaseConnectionParameters dataclass
rejected_publish_delay is a number of seconds between tries when queue is full

Finally, invoke a run method on this class instance.

fregeindexerlib
Release 0.4.0

Release 0.4.0

0.4.0

0.3.1

0.3

0.2

0.1

Documentation

Frege indexer library

How to install

How to use

Stats

Development practices

Releases

fregeindexerlib Release 0.4.0

Release 0.4.0 Toggle Dropdown 0.4.0 0.3.1 0.3 0.2 0.1

Documentation

Frege indexer library

How to install

How to use

Stats

Development practices

Releases

fregeindexerlib
Release 0.4.0

Release 0.4.0

0.4.0

0.3.1

0.3

0.2

0.1