multiDA and genDA

class: split-40 hide-slide-number with-thick-border border-white
background-image: url("bkg/bg1.png")
background-size: cover

]]
.column.shade_main[.content.vmiddle[

# .large[multiDA and genDA]
## Discriminant Analysis Methods for Large Scale 
## and Complex Datasets

### Sarah Romanes sarah_romanes

### useR!2019

### &nbsp;bit.ly/SR-useR-2019

]]

---

.column.bg-main2[.content[

# Acknowledgements

### - This work is in collaboration with my supervisor A/Prof John Ormerod.

### - Big thanks to my research group at the University of Sydney, as well as NUMBATS from Monash for input in package design.

### - Slides are made in `rmarkdown` using `xaringan` (Yihui Xie) and *ninja* theme (Emi Tanaka).

]]
.column[.content.vmiddle.center[

]]

---

class: split-70 hide-slide-number
background-image: url("bkg/bg2.png")
background-size: cover

.column.slide-in-left[
.sliderbox.vmiddle.shade_main.center[
.font5[Discriminant Analysis]]]
.column[
]

---
class: split-two white

.column.bg-main2[.content[

# **What is Discriminant Analysis?**

### - Discriminant Analysis (Fisher, 1936) is a ML technique that seeks to find a linear combination of features that separates classes of objects.

### -  It *strictly* assumes the conditional distribution of the data, given class grouping, is .orange[multivariate normal].

### - Available through `MASS` package in with functions `lda` (common covariance) and `qda`.

]]
.column[.content.vmiddle.center[

]]

---

.column.bg-main2[.content[

# **Issues with DA**

## **Does not work in high dimensions**

## DA does not work when p>n due to a required covariance inverse being singular.

## .orange[**Solution?** *multiDA*]

]]
.column.bg-main5[.content[

# .black[**Does not work for non-Gaussian response**]

## .black[Cannot use for count/ skewed/ binary/ mixed response data, etc.]

## .orange[**Solution?** *genDA*]
]]

---

class: split-70 hide-slide-number
background-image: url("bkg/bg2.png")
background-size: cover

.column.slide-in-left[
.sliderbox.vmiddle.shade_main.center[
.font5[multiDA]]]
.column[
]
---

.column.bg-main2[.content[

# **SRBCT data**

### - The SRBCT dataset (Khan et al., 2001) looks at classifying 4 classes of different childhood tumours sharing similar visual features during routine histology.

### - Data contains 83 microarray samples with 1586 features.

## .orange[**Q:** How can we use DA to find important features that discriminate between the classes, and use them to predict cancer type?]

]]
.column[.content.vmiddle.center[

]]

---

---

.column.bg-main2[.content[
  .split-two[
 .row[.content[

# **What defines a discriminative feature?**

## Suppose we have 3 classes to model. If we assume the features are independent,  within each feature we can group them as:

]]
  .row[.content.vmiddle.center[

## .orange[**One group**] (NOT a discriminative feature)

]]

]
]]
.column[.content.vmiddle.center[

]]

---

.column.bg-main2[.content[
  .split-two[
 .row[.content[

# **What defines a discriminative feature?**

## Suppose we have 3 classes to model. If we assume the features are independent,  within each feature we can group them as:

]]
  .row[.content.vmiddle.center[

## .orange[**Two groups**] (Groups 2 and 3, against 1)

]]

]
]]

]]

---

.column.bg-main2[.content[
  .split-two[
 .row[.content[

# **What defines a discriminative feature?**
 
 
 
 
## Suppose we have 3 classes to model. If we assume the features are independent, within each feature we can group them as:

]]
  .row[.content.vmiddle.center[

## .orange[**Two groups**] (Groups 1 and 3, against 2)

]]

]
]]
.column[.content.vmiddle.center[

]]

---

.column.bg-main2[.content[
  .split-two[
 .row[.content[

# **What defines a discriminative feature?**

## Suppose we have 3 classes to model. If we assume the features are independent,  within each feature we can group them as:

]]
  .row[.content.vmiddle.center[

## .orange[**Two groups**] (Groups 1 and 2, against 3)

]]

]
]]
.column[.content.vmiddle.center[

]]

---

.column.bg-main2[.content[
  .split-two[
 .row[.content[

# **What defines a discriminative feature?**

## Suppose we have 3 classes to model. If we assume the features are independent,  within each feature we can group them as:

]]
  .row[.content.vmiddle.center[

## .orange[**Three groups**] (All groups are different)

]]

]
]]
.column[.content.vmiddle.center[

<img src="index_files/figure-html/unnamed-chunk-6-1.png" width="504" />
]]

---

# .black[A Penalised LRT is used to **estimate** the best fit for each feature]

---

.column.bg-main2[.content[

# `multiDA` - syntax

```r
*res <- multiDA(y = y,
 X = X,
 penalty = "EBIC",
 equal.var = TRUE,
 set.options = "exhaustive")
```

]]
.column.bg-main5[.content.vmiddle.center[

### `y` - vector of factor class values (for training)

]]

---

.column.bg-main2[.content[

# `multiDA` - syntax

```r
res <- multiDA(y = y, 
* X = X,
 penalty = "EBIC",
 equal.var = TRUE,
 set.options = "exhaustive")
```

]]
.column.bg-main5[.content.vmiddle.center[

### `X` - 	matrix containing the training data. The rows are the sample observations, and the columns are the features.
]]
---

.column.bg-main2[.content[

# `multiDA` - syntax

```r
res <- multiDA(y = y, 
 X = X,
* penalty = "EBIC",
 equal.var = TRUE,
 set.options = "exhaustive")
```

]]
.column.bg-main5[.content.vmiddle.center[

### `penalty` - default `EBIC`,  which penalises based on the number of features and degrees of freedom of groupings. If option `penalty="BIC"` is specified, the penalty reverts back to the BIC.

]]
---

.column.bg-main2[.content[

# `multiDA` - syntax

```r
res <- multiDA(y = y, 
 X = X,
 penalty = "EBIC",
* equal.var = TRUE,
 set.options = "exhaustive")
```

]]
.column.bg-main5[.content.vmiddle.center[

### `equal.var` - indicating whether group specific variances should be equal or allowed to vary.

]]

---

.column.bg-main2[.content[

# `multiDA` - syntax

```r
res <- multiDA(y = y, 
 X = X,
 penalty = "EBIC",
 equal.var = TRUE,
* set.options = "exhaustive")
```

]]
.column.bg-main5[.content.vmiddle.center[

### `set.options` - this determines what groupings we want to consider. For example, for 15 classes, there are 1, 382, 958, 545 ways the classes can be grouped together. However if we consider `set.options = "onevsrest"`, we only consider groupings of one class vs the others, resulting in 16 possible combinations.

]]

---

.column.bg-main2[.content[

# `multiDA` - syntax

```r
res <- multiDA(y = y, 
 X = X,
 penalty = "EBIC",
 equal.var = TRUE,
 set.options = "exhaustive") 
```

```r
*obj <- predict(res, newdata = newdata)
```

]]
.column.bg-main5[.content.vmiddle.center[

### Finally, a generic S3 `predict` method is used for prediction. `obj$y.pred` returns class labels, and `obj$probabilities` returns probablities of class membership for each class.

]]

---

.column.bg-main2[.content[

# Application to `SRBCT` data

```r
res <- multiDA(y = SRBCT$y, 
 X = SRBCT$X,
 penalty = "EBIC",
 equal.var = TRUE,
 set.options = "exhaustive") 
```

### We first have a look at a summary of the model by using `print()`:

```r
*print(res)
```

]]
.column[.content.vmiddle[

```r
Sample Size:
[1] 63
Number of Features:
[1] 1586
Classes:
[1] 4
Equal Variance Assumption:
[1] TRUE
Number of Significant Features:
[1] 215
Summary of Significant Features:
  rank feature.ID gamma.hat partition
1     1      V1172 0.9997741         8
2     2      V1232 0.9997576         4
3     3      V1233 0.9997563         3
4     4      V1324 0.9997506         3
5     5       V706 0.9997441         5
6     6       V434 0.9997437         3
7     7       V527 0.9997424         3
8     8      V1189 0.9997368         3
9     9      V1166 0.9997347         7
10   10       V148 0.9997342         4
```

]]

---
class: split-two white

.column.bg-main2[.content[

# Application to `SRBCT` data

```r
res <- multiDA(y = SRBCT$y, 
 X = SRBCT$X,
 penalty = "EBIC",
 equal.var = TRUE,
 set.options = "exhaustive") 
```

### We can then examine the class groupings using the `plot()` method for `multiDA`:

```r
*plot(res, ranks= 1)
```

]]
.column[.content.vmiddle.center[

]]

---
class: middle center white

# .black[100 Trial, 5 Fold CV]

---

class: split-70 hide-slide-number
background-image: url("bkg/bg2.png")
background-size: cover

.column.slide-in-left[
.sliderbox.vmiddle.shade_main.center[
.font5[genDA]]]
.column[
]
---

.column.bg-main2[.content[

# **Urban Cover data**

### - The study area is an urban area in Deerfield Beach, FL, USA, with a 30cm resolution colour infrared aerial orthoimagery of the study area acquired.

### - Contains 9 different types of landcover.

### - Data consists of .orange[n = 168] image segments to be classified with .orange[m = 147] features associated with each image segment such as Area, Brightness, etc measured at different resolutions.

]]
.column[.content.vmiddle.center[

##### .pink[Source: Johnson, 2013].

]]

---

.column.bg-main2[.content[

# **Urban Cover data**

# Data is *highly correlated* with a noticible difference in correlation structure between classes.

]]
.column[.content.vmiddle.center[

]]

---

.column.bg-main2[.content[

# **Urban Cover data**

# EDA shows that data is .orange[not normal], with a mix of positive skew and count data.

## .orange[**Q:** How can we build a DA model to discriminate between segment types, for data of mixed response types?]

]]
.column[.content.vmiddle[

```r
# A tibble: 168 x 148
 class BrdIndx Area Round Bright Compact ShpIndx 
 <fct> <dbl> <int> <dbl> <dbl> <dbl> <dbl> 
 1 car 1.27 91 0.97 231. 1.39 1.47
 2 concrete 2.36 241 1.56 216. 2.46 2.51 
 3 concrete 2.12 266 1.47 232. 2.07 2.21
 4 concrete 2.42 399 1.28 230. 2.49 2.73 
 5 concrete 2.15 944 1.73 193. 2.28 4.1 
 6 tree 3.11 169 1.47 172. 2.49 3.35 
 7 car 1.2 44 0.79 209. 1.14 1.36 
 8 car 1 88 0.22 235. 1.11 1.12 
 9 building 1.59 1737 0.67 220. 1.3 1.64 
10 tree 2.37 153 1.3 120. 2.85 2.59
```

]]

---

.column.bg-main2[.content[

# ** GLLVMs **

## We can fit a DA model to this data as follows. By first deciding on a family of distributions for each column,

]]
.column[.content.vmiddle.center[

]]

---
class: split-33

.column.bg-main2[.content[

# ** GLLVMs **

## we use a .orange[GLLVM] to model the distribution for each column, taking into account class information and capturing the correlation between features.

]]
.column[.content.vmiddle.center[

]]

---
class: split-33

.column.bg-main2[.content[

# ** GLLVMs **

# `$\Large{\tau_i}$` and `$\Large{\beta_{0j}}$`  represent row and column intercepts.

]]
.column[.content.vmiddle.center[

]]

---

.column.bg-main2[.content[

# ** GLLVMs **

# Class information is captured in `$\Large{\boldsymbol{x_i}}$`, with corresponding coefficients `$\Large{\boldsymbol{\beta_j}}$`.

]]
.column[.content.vmiddle.center[

]]

---

.column.bg-main2[.content[

# ** GLLVMs **

# Correlation structure is captured in `$\Large{\boldsymbol{u_i}}$` (latent), with corresponding coefficients `$\Large{\boldsymbol{\lambda_j}}$`.

]]
.column[.content.vmiddle.center[

]]
---

.column.bg-main2[.content[

# ** GLLVMs **

## We can model .orange[differing] correlation structures by fitting multiple GLLVMs for different classes.

]]
.column[.content.vmiddle.center[

]]

---
class: split-33

.column.bg-main2[.content[

# ** GLLVMs **

## We can model .orange[differing] correlation structures by fitting multiple GLLVMs for different classes.

]]
.column[.content.vmiddle.center[

]]

---

.column.bg-main2[.content[

# ** GLLVMs **

## We can model .orange[differing] correlation structures by fitting multiple GLLVMs for different classes.

]]
.column[.content.vmiddle.center[

]]

---

.column.bg-main2[.content[

# **Model Estimation**

### - We formulate a Bayesian GLLVM approach to train our classifier.

### - Marginalising over the latent variables proves to be intractable - so a Variational Approximation is used.
 
### - In the optimisation process, we utilise the `TMB` package, which allows us to perform .orange[Automatic Differentiation], implemented in `C++`. This allows our fitting process to be fast.

#### Read more about the mathematics [here](https://sarahromanes.github.io/talks/ACEMS/ACEMS_SarahRomanes.pdf).

]]
.column[.content.vmiddle.center[

]]

---

.column.bg-main2[.content[

# `genDA` - syntax

```r
*res <- genDA(Y = Y,
 class = class,
 num.lv = 2,
 family = family,
 common.covariance = TRUE,
 row.eff = FALSE,
 standard.errors = FALSE)
```

]]
.column.bg-main5[.content.vmiddle.center[

### `Y` - 	a (n x m) `matrix` or `data.frame` of responses.

]]

---

.column.bg-main2[.content[

# `genDA` - syntax

```r
res <- genDA(Y = Y, 
* class = class,
 num.lv = 2,
 family = family,
 common.covariance = TRUE,
 row.eff = FALSE,
 standard.errors = FALSE)
```

]]
.column.bg-main5[.content.vmiddle.center[

### `class` - a factor vector of class information.

]]

---

.column.bg-main2[.content[

# `genDA` - syntax

```r
res <- genDA(Y = Y, 
 class = class,
* num.lv = 2,
 family = family,
 common.covariance = TRUE,
 row.eff = FALSE,
 standard.errors = FALSE)
```

]]
.column.bg-main5[.content.vmiddle.center[

### `num.lv` - number of latent variables in GLLVM model. Non-negative integer, less than the number of response variables (m). Defaults to 2.

]]

---

.column.bg-main2[.content[

# `genDA` - syntax

```r
res <- genDA(Y = Y, 
 class = class,
 num.lv = 2,
* family = family,
 common.covariance = TRUE,
 row.eff = FALSE,
 standard.errors = FALSE)
```

]]
.column.bg-main5[.content.vmiddle.center[

### `family` - character vector to describe the distribution of each column. Columns can be of different family types. Family options are `"poisson"` (with log link), `"ZIP"` (Zero Inflated Poisson), `"negative-binomial"` (with log link), `"binomial"` (with logit link), `"gaussian"`, and `"log-normal"`.

]]

---

.column.bg-main2[.content[

# `genDA` - syntax

```r
res <- genDA(Y = Y, 
 class = class,
 num.lv = 2,
 family = family,
* common.covariance = TRUE,
 row.eff = FALSE,
 standard.errors = FALSE)
```

]]
.column.bg-main5[.content.vmiddle.center[

### `common.covariance` - default `TRUE`. Specifies whether different covariance structures should be fit for each class.

]]

---
class: split-two white

.column.bg-main2[.content[

# `genDA` - syntax

```r
res <- genDA(Y = Y, 
 class = class,
 num.lv = 2,
 family = family,
 common.covariance = TRUE,
* row.eff = FALSE,
 standard.errors = FALSE)
```

]]
.column.bg-main5[.content.vmiddle.center[

### `row.eff` - default `FALSE`. Specifies whether row effects should be included in model.

]]

---

.column.bg-main2[.content[

# `genDA` - syntax

```r
res <- genDA(Y = Y, 
 class = class,
 num.lv = 2,
 family = family,
 common.covariance = TRUE,
 row.eff = FALSE,
* standard.errors = FALSE)
```

]]
.column.bg-main5[.content.vmiddle.center[

### `standard.errors` - default `FALSE`. Indicates whether standard errors for training parameter estimates are calculated.

]]

---

.column.bg-main2[.content[

# `genDA` - syntax

```r
res <- genDA(Y = Y, 
 class = class,
 num.lv = 2,
 family = family,
 common.covariance = TRUE,
 row.eff = FALSE,
 standard.errors = FALSE) 
```

```r
*predict(res, newdata = newdata)
```

]]
.column.bg-main5[.content.vmiddle.center[

### And finally, just as in `multiDA`, a generic S3 `predict` method is used for prediction.

]]

---

# .black[100 Trial, 5 Fold CV]

---
class: split-three white

.column.bg-main3[.content.center[

### : sarah_romanes
### : sarahromanes.github.io

]]

.column.bg-main2[.content.center[

# genDA

## .white[]

### [sarahromanes/genDA](https://github.com/sarahromanes/genDA)

]]

.column.bg-main5[.content.center[

# .black[multiDA]

## .black[]

### [sarahromanes/multiDA](https://github.com/sarahromanes/multiDA)

]]