Frege indexer library
An indexer library for a Frege project at Jagiellonian University
How to install
This library is published in pip, so you should be able to install it by:
pip3 install fregeindexerlib
and upgrade by:
pip3 install --upgrade fregeindexerlib
How to use
Example usage is available in file example.py.
Basically, you need to implement an abstract fregeindexerlib.indexer.Indexer
class: to be more precise only a crawl_next_repository
method is needed.
List of methods that can be implemented:
-
crawl_next_repository(self, prev_repository_id: Optional[str]) -> Optional[CrawlResult]
This method get a string with previously crawled repository id (or None if there was no previously crawled repository) and should return a
fregeindexerlib.crawl_result.CrawlResult
dataclass filled with proper information about crawled repository orNone
if there are no more repositories to crawl.If something unexpected happen then this method should throw a
fregeindexerlib.indexer_error.IndexerError
exception.The repository id is an id returned by a code hosting API (not the one generated by an Indexer - generation of proper id for this project is a responsibility of that lib).
-
before_crawl(self, prev_repository_id: Optional[str])
Method invoked right before a
crawl_next_repository
method. Get the previously indexed repository id (like acrawl_next_repository
method). Default implementation is empty. -
after_crawl(self, crawl_result: CrawlResult)
Method invoked right after a
crawl_next_repository
method. Get a CrawlResult returned by acrawl_next_repository
call. Default implementation is empty. -
on_successful_process(self, crawl_result: CrawlResult)
Method invoked after successful save a crawl result into a database and successful send a message to the
download
queue. Get a CrawlResult returned by acrawl_next_repository
call. Default implementation is empty. -
on_error(self, exception: Exception)
Method invoked when exception occur during a crawling, saving into a database or pushing a message to a queue. Get an exception that occur. Default implementation is empty.
Probably there is no need to handle this situation, because library itself handle it.
When you implement a method(s) from a Indexer
in your own class then create an instance of this class,
passing a proper parameters to its constructor. Here is its definition:
__init__(self, indexer_type: IndexerType, rabbitmq_parameters: RabbitMQConnectionParameters,
database_parameters: DatabaseConnectionParameters, rejected_publish_delay: int)
where:
-
indexer_type
is anfregeindexerlib.indexer_type.IndexerType
enum - choose a proper indexer that you implement. -
rabbitmq_parameters
is afregeindexerlib.rabbitmq_connection.RabbitMQConnectionParameters
dataclass -
database_parameters
is afregeindexerlib.database_connection.DatabaseConnectionParameters
dataclass -
rejected_publish_delay
is a number of seconds between tries when queue is full
Finally, invoke a run
method on this class instance.