fregeindexerlib

Library for a indexers in a Frege project


Keywords
Jagiellonian, University, Frege, Indexer
License
GPL-3.0
Install
pip install fregeindexerlib==0.4.0

Documentation

Frege indexer library

An indexer library for a Frege project at Jagiellonian University

How to install

This library is published in pip, so you should be able to install it by:

pip3 install fregeindexerlib

and upgrade by:

pip3 install --upgrade fregeindexerlib

How to use

Example usage is available in file example.py.

Basically, you need to implement an abstract fregeindexerlib.indexer.Indexer class: to be more precise only a crawl_next_repository method is needed.

List of methods that can be implemented:

  • crawl_next_repository(self, prev_repository_id: Optional[str]) -> Optional[CrawlResult]

    This method get a string with previously crawled repository id (or None if there was no previously crawled repository) and should return a fregeindexerlib.crawl_result.CrawlResult dataclass filled with proper information about crawled repository or None if there are no more repositories to crawl.

    If something unexpected happen then this method should throw a fregeindexerlib.indexer_error.IndexerError exception.

    The repository id is an id returned by a code hosting API (not the one generated by an Indexer - generation of proper id for this project is a responsibility of that lib).

  • before_crawl(self, prev_repository_id: Optional[str])

    Method invoked right before a crawl_next_repository method. Get the previously indexed repository id (like a crawl_next_repository method). Default implementation is empty.

  • after_crawl(self, crawl_result: CrawlResult)

    Method invoked right after a crawl_next_repository method. Get a CrawlResult returned by a crawl_next_repository call. Default implementation is empty.

  • on_successful_process(self, crawl_result: CrawlResult)

    Method invoked after successful save a crawl result into a database and successful send a message to the download queue. Get a CrawlResult returned by a crawl_next_repository call. Default implementation is empty.

  • on_error(self, exception: Exception)

    Method invoked when exception occur during a crawling, saving into a database or pushing a message to a queue. Get an exception that occur. Default implementation is empty.

    Probably there is no need to handle this situation, because library itself handle it.

When you implement a method(s) from a Indexer in your own class then create an instance of this class, passing a proper parameters to its constructor. Here is its definition:

__init__(self, indexer_type: IndexerType, rabbitmq_parameters: RabbitMQConnectionParameters,
                 database_parameters: DatabaseConnectionParameters, rejected_publish_delay: int)

where:

  • indexer_type is an fregeindexerlib.indexer_type.IndexerType enum - choose a proper indexer that you implement.
  • rabbitmq_parameters is a fregeindexerlib.rabbitmq_connection.RabbitMQConnectionParameters dataclass
  • database_parameters is a fregeindexerlib.database_connection.DatabaseConnectionParameters dataclass
  • rejected_publish_delay is a number of seconds between tries when queue is full

Finally, invoke a run method on this class instance.