Kemeny-Young method for rank aggregation of incomplete rankings with ties


License
GPL-2.0
Install
pip install corankco==7.2.0

Documentation

Python 3.8 PyPI version

corankco

corankco (COnsensus RANKing COmputation) is a Python package dedicated to the aggregation of incomplete rankings with ties. Users can choose the Kemeny-Young method (exact algorithm and several heuristics available), the Copeland method, and in some cases, the Borda method. The formalism to handle ties and/or incompleteness is detailed in the preprint: P. Andrieu, S. Cohen-Boulakia, M. Couceiro, A. Denise, A. Pierrot. A Unifying Rank Aggregation Model to Suitably and Efficiently Aggregate Any Kind of Rankings. Available at SSRN: https://ssrn.com/abstract=4353494. As of our latest release, the API for corankco is now stable and finalized.

Installation

Before installing corankco, make sure your system meets the following requirements:

  • Python >= 3.8
  • For the usage of the exact algorithm on big datasets (with many elements to rank), the installation of IBM ILOG CPLEX Optimization Studio is recommended. A free academic version can be downloaded from the IBM website. After downloading, follow the instructions to install CPLEX. README.md To install corankco from PyPI, you can use pip:
pip3 install corankco

Documentation

Visit our official documentation. For examples of use, you can jump directly to our [Usage section below]{#usage}.

Contact/Support

For any queries or support related to corankco, feel free to reach us at pierre.andrieu@lilo.org.

Contributing

We welcome contributions to corankco. If you'd like to contribute, feel free to fork the repository and submit your changes via a pull request.

License

corankco is licensed under the GPL-2.0 License. You can read more about it in the LICENSE file.

Updates

New in 7.2.0

  • Several algorithms have been sped up.
  • BioConsert heuristic no longer requires the C extension. As a consequence, corankco is now available for any platform. Note that the computation time of BioConsert has been reduced due to major code improvements.
  • Minor bug fix in BioConsert heuristic.
  • Several tests added

Example usage

from typing import List
import corankco as crc

# create a ranking from a list of sets
ranking1: crc.Ranking = crc.Ranking([{1}, {2, 3}])

# or from a string
ranking2: crc.Ranking = crc.Ranking.from_string("[{3, 1}, {4}]")

# also in this format
ranking3: crc.Ranking = crc.Ranking.from_string("[[1], [5], [3], [2]]")

# now, create a Dataset object. A Dataset is a list of rankings
dataset: crc.Dataset = crc.Dataset([ranking1, ranking2, ranking3])

# or, from raw rankings that is a list of list of sets of either ints,or strs
dataset2: crc.Dataset = crc.Dataset.from_raw_list([[{2, 1}, {4}], [{3, 1, 2}, {4}, {5}], [{1}, {2}, {3}, {4}]])

# or, create a Dataset object from a file where your rankings are stored
# format file: each line is a list of either set, or list of int / str.
dataset3: crc.Dataset = crc.Dataset.from_file(path="./dataset_examples/dataset_example")

# print information about the dataset
print(dataset.description())

# get all datasets in a folder
list_datasets: List[crc.Dataset] = crc.Dataset.get_datasets_from_folder(path_folder="./dataset_examples")
for dataset_folder in list_datasets:
    print(dataset_folder.description())

# choose your scoring scheme
sc: crc.ScoringScheme = crc.ScoringScheme([[0., 1., 1., 0., 1., 1.], [1., 1., 0., 1., 1., 0.]])

print("scoring scheme : " + str(sc))
# scoring scheme description
print(sc.description())

print("\n### Consensus computation ###\n")

# list of rank aggregation algorithms to use among  BioConsert, ParCons, ExactAlgorithm, KwikSortRandom,
# RepeatChoice, PickAPerm, MedRank, BordaCount, BioCo, CopelandMethod

algorithms_to_execute = [crc.ExactAlgorithm(optimize=False),
                         crc.KwikSortRandom(),
                         crc.BioConsert(starting_algorithms=None),
                         crc.BioConsert(starting_algorithms=[crc.CopelandMethod()]),
                         crc.ParCons(bound_for_exact=90, auxiliary_algorithm=crc.BioConsert()),
                         crc.CopelandMethod(),
                         crc.BioCo(),
                         crc.BordaCount(),
                         ]

for alg in algorithms_to_execute:
    print(alg.get_full_name())
    consensus = alg.compute_consensus_rankings(dataset=dataset,
                                               scoring_scheme=sc,
                                               return_at_most_one_ranking=False)
    # to get the consensus rankings : consensus.consensus_rankings
    # description() will display supplementary information
    print(consensus.description())
    # if you want the consensus ranking only : print(consensus)

    # get the Kemeny score associated with the consensus:
    print(consensus.kemeny_score)

# compute a Kemeny score between a ranking and a list of rankings (dataset object):
ranking_test: crc.Ranking = crc.Ranking([{1, 2}, {4}, {3}])
dataset_test: crc.Dataset = crc.Dataset.from_raw_list([[{1}, {2}, {3}, {4}], [{1, 4}, {3}]])
scoring_scheme: crc.ScoringScheme = crc.ScoringScheme([[0., 1., 1., 0., 1., 0.], [1., 1., 0., 1., 1., 0.]])
kemeny_obj: crc.KemenyComputingFactory = crc.KemenyComputingFactory(scoring_scheme)
score: float = kemeny_obj.get_kemeny_score(ranking=ranking_test, dataset=dataset_test)
print("\nscore = ", score)

# Partitioning

# consistent with at least one optimal consensus
one_opt: crc.OrderedPartition = crc.OrderedPartition.parcons_partition(dataset_test, scoring_scheme)
print(one_opt)

# consistent with all the optimal consensus
all_opt: crc.OrderedPartition = crc.OrderedPartition.parfront_partition(dataset_test, scoring_scheme)
print(all_opt)

Jupiter Notebook for code execution and more examples

More detailed examples and use cases, please refer to our Jupyter Notebook.