SpaGoG
An implementation of "SpaGoG: Graph of Graphs to classify tabular data with large fractions of missing data". SpaGoG (Sparse data classification using Graph of Graphs) is a model for classifying tabular data with large missing rates. SpaGoG represents the tabular data as a graph of graphs and involves multiple graph-data classification techniques to classify the samples from different points of view. This implementation is done with python 3.8 using PyTorch software.
How to use?
Installation
SpaGoG source code is available as a PyPI package (see https://pypi.org/project/SpaGoG/):
pip install spagog
Usage Example
Given train (both train_X
, test_y
) and test (at least test_X
) sets of type pandas.DataFrame
, SpaGoG can be executed, for example, using the following command:
from spagog.gog_model import gog_model
test_y = gog_model(train_X=train_X, train_Y=train_Y, test_X=test_X, model="gc", verbosity=1, to_numpy=False, evaluate_metrics=False)
Argument List
Here are all the arguments accepted by spagog.gog_model.gog_model
:
- model: str
The SpaGoG model to run. Options: ["gc"
, "gnc"
, "gc+nc"
].
- train_X: pandas.DataFrame
The features of the training set.
- train_y: pandas.DataFrame
The labels of the training set.
- test_X: pandas.DataFrame
The features of the test set.
- test_y: pandas.DataFrame
The labels of the test set. If set to None
, the evaluate_metrics
should be set to False
. Default: None
.
- val_X: pandas.DataFrame
The features of the validation set. If set to None
, it will be derived from the training set with 80:20 ratio. Default: None
.
- val_y: pandas.DataFrame
The labels of the validation set. Default: None
.
- evaluate_metrics: bool
Whether to evaluate and return the accuracy score on the data sets. If set to True
, the test_y
argument must not be None
. Default: True
.
- dataset_name: str
The name of the data set to run, for a cleaner output text. Default: ""
.
- feature_selection: int
Number of significant features to run the data on. The feature seslction process is executed only if 1 <= feature_selection <= num_features
. Default: 100
.
- edges: pandas.DataFrame
Edge list between the different samples (train, val and test), if there are any. If set to None
, the edeges are calculated as a K-Nearest-Neighbors graph. Default: None
.
- probs: bool
Whether to return soft labels for the test set predicrtions. Default: False
.
- to_numpy: bool
Whether to return the test set predicrtions as a numpy.array
. If set to False
, the predictions type will be torch.Tensor
. Default: False
.
- verbosity: int
Verbosity level of the running process. Set 0
for no output, 1
for evaluation metrics and timing report, and 2
to track the full learning process. Options: [0
, 1
, 2
]. Defuault: 0
.