High Dimensional Discriminant Analysis using Multiple Hypothesis Testing • multiDA

multiDA ======================================================

High Dimensional Discriminant Analysis using Multiple Hypothesis Testing

Overview

multiDA is a Discriminant Analysis (DA) algorithm capable for use in high dimensional datasets, providing feature selection through multiple hypothesis testing. This algorithm has minimal tuning parameters, is easy to use, and offers improvement in speed compared to existing DA classifiers.

Publication to appear in JCGS. See our preprint - available on arXiv, here.

This package is part of a suite of discriminant analysis packages we have authored for large-scale/complex datasets. See also our package genDA, a statistical ML method for Multi-distributional Discriminant Analysis using Generalised Linear Latent Variable Modelling.

Installation

# Install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("sarahromanes/multiDA")

Usage

The following example trains the multiDA classifier using the SRBCT dataset, and finds the resubstitution error rate.

y   <- SRBCT$y
X   <- SRBCT$X
res  <- multiDA(X, y, penalty="EBIC", equal.var=TRUE, set.options="exhaustive")
vals <- predict(res, newdata=X)$y.pred          #y.pred returns class labels
rser <- sum(vals!=y)/length(y)

A case study and overview of the statistical processes behind multiDA can be found here.

Authors

Sarah Romanes - @sarah_romanes
John Ormerod - @john_t_ormerod

License

This project is licensed under the GPL-2 license - see the LICENSE.md file for details

Acknowledgements

I am grateful to everyone who has provided thoughtful and helpful comments to support me building my first package - especially members of the Sydney University Statistical Bioinformatics group and also the NUMBATS group at Monash University. You guys rock!

Overview

Installation

Usage

Authors

License

Acknowledgements

Links

License

Developers