mipred

Prediction using Multiple Imputation


License
GPL-3.0

Documentation

mipred

Build Status

The goal of mipred is to calibrate a prediction rule using generalized linear models or Cox regression modeling, using multiple imputation to account for missing values in the predictors as described by Mertens, Banzato and de Wreede (2018) (https://arxiv.org/abs/1810.05099). Imputations are generated using the R package mice without using outcomes on observations for which the prediction is generated. Two options are provided to generate predictions. The first is prediction-averaging of predictions calibrated from single models fitted on single imputed datasets within a set of multiple imputations. The second is application of the Rubin’s rules pooled model. For both implementations, unobserved values in the predictor data of new observations for which the predictions are derived are automatically imputed. The package contains two basic workhorse functions, the first is ‘mipred’ which generates predictions of outcome on new observations (when outcomes will by definition usually not be available at the time of calibration of the prediction rule). The second is the function ‘mipred.cv’ which generates cross-validated predictions with the methodology on existing data for which outcomes have already been observed. This allows users to assess predictive potential of the prediction models which are calibrated. The present version of the package is preliminary (development) and has only been thoroughly checked for application on binary-outcome logistic regression for now. The vignette which is included documents application of the functions for binary outcome data. Although we did not check extensively, the package should also work for continuous and counting outcomes. We are working to expand the functionality to censored survival outcomes.

Installation

You can install the current version into R from GitHub using devtools:

library(devtools)
devtools::install.github("BartJAMertens/mipred")

You may need to install and load the devtools package first before using the above command. See the book “R packages” (online version) by Hadley Wickham, chapter “Git and Github”.

Main functions

There are currently two key functions

mipred() # prediction calibration with multiple imputation for missing predictors
mipred.cv() # cross-validation for prediction calibration with multiple imputation for missing predictors

The first function calibrates predictions for new observations and accounts for missing values in the predictor data (of either the calibration or new validation sample) through multiple imputation. The second function implements cross-validation of the same approach.

Example

Let dataset be a data.frame consisting of a vector of binary outcomes outcome and two predictors x1 and x2. Likewise, let newdataset be a data.frame with new observations for which the same predictors x1 and x2 are observed and for which we want to predict outcome, using a model fitted to the old data in dataset.

We can generate predictions using the command

preds <- mipred(outcome ~ x1 + x2, family=binomial, data=dataset, newdata=newdataset, nimp=100)

This will use the logistic regression model and 100 imputations.

If we wanted to generate cross-validated predictions within the set dataset, then we can generate these with the same model using

preds.cv <- mipred.cv(outcome ~ x1 + x2, family=binomial, data=dataset,  nimp=100,  folds=10)

This will generate cross-validated predictions from the same model and 100 imputataions for each predicted observation, using 10-folds.

Please refer to the example included with the package. The package also includes a vignette which documents use for binary outcome data.