rfPermute

Estimate Permutation p-Values for Random Forest Importance Metrics


Licenses
CNRI-Python-GPL-Compatible/CNRI-Python-GPL-Compatible

Documentation

CRAN version CRAN last day downloads CRAN last week downloads CRAN last month downloads CRAN total downloads
R-CMD-check AppVeyor Build Status

rfPermute DOI

Description

rfPermute estimates the significance of importance metrics for a Random Forest model by permuting the response variable. It will produce null distributions of importance metrics for each predictor variable and p-values of observed importances. The package also includes several summary and visualization functions for randomForest and rfPermute results. See rfPermuteTutorial() in the package for a guide on running, summarizing, and diagnosing rfPermute and randomForest models.

Contact

Installation

To install the stable version from CRAN:

install.packages('rfPermute')

To install the latest version from GitHub:

# make sure you have devtools installed
if (!require('devtools')) install.packages('devtools')

# install from GitHub
devtools::install_github('EricArcher/rfPermute')

Current Functions

Variable importance p-value estimation, summary, and visualization

  • rfPermute Estimate Permutation p-values for Random Forest Importance Metrics
  • importance Extract rfPermute Importance Scores and p-values
  • plotNull Plot Random Forest Importance Null Distributions
  • plotImpPreds Distribution of Important Variables

Random Forest model summary

  • summary Summarize rfPermute and randomForest models
  • confusionMatrix Confusion Matrix
  • casePredictions Return predictions and votes for training cases
  • pctCorrect Percent Correctly Classified

Random Forest model visualization and diagnostics

  • plotInbag Distribution of sample inbag rates
  • plotPredictedProbs Distribution of prediction assignment probabilities
  • plotProximity Plot Random Forest Proximity Scores
  • plotTrace Trace of cumulative error rates in forest
  • plotVotes Vote Distribution

Miscellaneous functions

  • combineRP Combine rfPermute models
  • balancedSampsize Balanced Sample Size
  • cleanRFdata Clean Random Forest Input Data

Changelog

version 2.5.1 (devel)

  • added pct.correct argument to plotTrace(). Default is now to have y-axis as 1 - OOB error rate.

version 2.5 (CRAN)

NOTE: v2.5 is a large redevelopment of the package. The structure of rfPermute model objects has changed make them incompatible with previous versions. Also, the name and functionality of several functions has changed to make them more consistent with one another. A tutorial (under construction) is available within the package as rfPermuteTutorial().

version 2.2 (on CRAN)

  • moved value of OOB expected error rate to end of output vector in exptdErrRate
  • changed default of threshold argument in classConfInt and confusionMatrix to NULL
  • added new grouping and labelling options to proximityPlot()
  • added binomial test for priors in exptdErrRate and confusionMatrix

version 2.1.81

  • Fixed bug in pctCorrect
  • Added casePredictions
  • Updated parallel code

version 2.1.7

  • Fixed bug in parallel processing code.

version 2.1.6

  • Added plotConfMat, plotOOBtimes, plotRFtrace, and plotInbag, and plotImpVarDist visualizations.
  • Changed confusionMatrix so it will work when randomForest model doesn't have a $confusion element, like when model is result of combine-ing multiple models.
  • Improved efficiency and stability of parallel processing code. Changed default value of num.cores to NULL.

version 2.1.5

  • Added type argument to plotVotes to choose between area and bar charts.
  • Changed plot.rfPermute to plotNull to avoid clashes and maintain functionality of randomForest::plot.randomForest.
  • Changed name of proximity.plot to proximityPlot, exptd.err.rate to exptdErrRate, and clean.rf.data to cleanRFdata to make camelCase naming scheme more consistent in package.
  • Changed plotNull from base graphics to ggplot2.
  • Added symb.metab data set.

version 2.1.1

  • Added n argument to impHeatmap.
  • Added functions: classConfInt, confusionMatrix, plotVotes, pctCorrect.

version 2.0.1

  • Fixed bug in plot.rfPermute that was reporting the p-value incorrectly at the top of the figure.
  • Fixed multi-threading in rfPermute so it works on Windows too.
  • Added impHeatmap function.
  • Switched proximity.plot to use ggplot2 graphics.

version 2.0

  • Fixed bug with calculation of p-values not respecting importance measure scaling (division by standard deviations). New format of output of rfPemute has separate $null.dist and $pval elements, each with results for unscaled and scaled importance mesures. See ?rfPermute for more information.
  • rp.importance and plot.rfPermute now take a scale argument to specify whether or not importance values should be scaled by standard deviations.
  • If nrep = 0 for rfPermute, a randomForest object is returned.

version 1.9.3

  • Fixed import declarations to avoid grid name clashes.
  • Fixed logic error in clean.rf.data where fixed predictors were not removed.
  • Fixed error in use of main argument in plot.rp.importance.

version 1.9.2

  • Added this NEWS.md
  • Added README.md
  • Added num.cores argument to rfPermute to take advantage of multi-threading

version 1.9.1

  • Added internal keyword to calc.imp.pval to keep it from indexing
  • Updated imports to match new CRAN policies