Fitting models

Mikis Stasinopoulos

Bob Rigby

Fernanda De Bastiani

Gillian Heller

Niki Umlauf

Introduction

questions

do my data support a GAMLSS model?
do I need GAMLSS to answer my questions?
size of the data
questions of interest

this talk

the basic GAMLSS algorithm
different statistical approaches for fitting a GAMLSS model
different machine learning techniques for fitting a GAMLSS model.

Algorithms

GAMLSS

\[\begin{split} y_i & \stackrel{\small{ind}}{\sim }& {D}( \theta_{1i}, \ldots, \theta_{ki}) \nonumber \\ g(\theta_{1i}) &=& b_{10} + s_1({x}_{1i}) + \ldots, s_p({x}_{pi}) \nonumber\\ \ldots &=& \ldots \nonumber\\ g({\theta}_{ki}) &=& b_0 + s_1({x}_{1i}) + \ldots, s_p({x}_{pi}) \end{split} \qquad(1)\]

GAMLSS + ML

\[\begin{split} y_i & \stackrel{\small{ind}}{\sim }& {D}( \theta_{1i}, \ldots, \theta_{ki}) \nonumber \\ g({\theta}_{1i}) &=& {ML}_1({x}_{1i},{x}_{2i}, \ldots, {x}_{pi}) \nonumber \\ \ldots &=& \ldots \nonumber\\ g({\theta}_{ki}) &=& {ML}_1({x}_{1i},{x}_{2i}, \ldots, {x}_{pi}) \end{split} \qquad(2)\]

Parameters for Estimation

distribution parameters

coefficient parameters

  - **coefficients**

  - **random effects**

hyper-parameters

flowchart

flowchart LR
  A[GAMLSS] --> L[Parametric]
  A --> N[Smooth]
  L --> B(Likelihood)
  L --> D(Bayesian)
  L --> E(Boosting)
  N --> C(Penalised Likelihood)
  N --> D(Bayesian)
  N --> E(Boosting)
  B --> G[beta's]
  C --> F[beta's, gamma's, & lambda's]
  E --> F
  D --> F
  D --> G
  E --> G

Figure 1: Different ways to estimate the parameters in a GAMLSS model.

The Basic Algorithm for GAMLSS

For specified distribution family, data and machine learning (ML) model.

Initialize: set starting values for all distribution parameters \((\hat{\boldsymbol{\mu}}^{(0)}, \hat{\boldsymbol{\sigma}}^{(0)}, \hat{\boldsymbol{\nu}}^{(0)}, \hat{\boldsymbol{\tau}}^{(0)} )\).

For i in \(1,2,\ldots\)

fix \(\hat{\boldsymbol{\sigma}}^{(i-1)}\),\(\hat{\boldsymbol{\nu}}^{(i-1)}\), and \(\hat{\boldsymbol{\tau}}^{(i-1)}\) and fit a ML model to iterative response \(\textbf{y}_{\mu}\) using iterative weights \(\textbf{w}_{\mu}\) to obtain the new \(\hat{\boldsymbol{\mu}}^{(i)}\)
fix \(\hat{\boldsymbol{\mu}}^{(i)}\),\(\hat{\boldsymbol{\nu}}^{(i-1)}\), and \(\hat{\boldsymbol{\tau}}^{(i-1)}\) and fit a ML model to iterative response \(\textbf{y}_{\sigma}\) using iterative weights \(\textbf{w}_{\sigma}\) to obtain the new \(\hat{\boldsymbol{\sigma}}^{(i)}\)
fix \(\hat{\boldsymbol{\mu}}^{(i)}\),\(\hat{\boldsymbol{\sigma}}^{(i)}\), and \(\hat{\boldsymbol{\tau}}^{(i-1)}\) and fit a ML model to iterative response \(\textbf{y}_{\nu}\) using iterative weights \(\textbf{w}_{\nu}\) to obtain the new \(\hat{\boldsymbol{\nu}}^{(i)}\)
fix \(\hat{\boldsymbol{\mu}}^{(i)}\),\(\hat{\boldsymbol{\sigma}}^{(i)}\), and \(\hat{\boldsymbol{\nu}}^{(i)}\) and fit a ML model to iterative response \(\textbf{y}_{\tau}\) using iterative weights \(\textbf{w}_{\tau}\) to obtain the new \(\hat{\boldsymbol{\nu}}^{(i)}\)
check if the global deviance, \(-2(\textit{log-likelihood})\) does not change repeat otherwise exit

Important

assumed distribution (the log-likelihood is needed and the first and second derivatives)
hierarchy of the paramerers (i.e. a good location model is required before model the scale)
orthogonality of the parameters (leads to better convergence)

Checking orthogonality

Specify a distribution family, and select values for all the parameters.
generate a sample of of N observations (default \(N=10000\))
fit the distribution to the generated sample.
display the correlation coefficients of the estimated predictors, \(\eta_{\theta}\).

Checking orthogonality (con.)

source("~/Dropbox/GAMLSS-development/ggplot/family_cor.R")
library(gamlss)
library(ggplot2)
family_cor("BCTo", mu=1, sigma=0.11, nu=1, tau=5, no.sim=10000)

Figure 2: Paramater orthogonality for BCTo distribution

Machine Learning Models

properties

are the standard errors available?
do the x’s need standardization.

about the algorithm

 - stability
 - speed 
 - convergence

nonlinear terms
interactions
dataset type
is the selection of x’s automatic?
interpretation

Linear Models

mlinear <- gamlss2(rent~area+poly(yearc,2)+location+bath+kitchen+
              cheating|area+yearc+location+bath+kitchen+cheating, 
              family=BCTo, trace=FALSE, data=da)
summary(mlinear)

Linear Models

Call:
gamlss2(formula = rent ~ area + poly(yearc, 2) + location + bath + 
    kitchen + cheating | area + yearc + location + bath + kitchen + 
    cheating, data = da, family = BCTo, ... = pairlist(trace = FALSE))
---
Family: BCTo 
Link function: mu = log, sigma = log, nu = identity, tau = log
*--------
Parameter: mu 
---
Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     5.0132864  0.0275445 182.007  < 2e-16 ***
area            0.0106293  0.0002329  45.640  < 2e-16 ***
location2       0.0878750  0.0104118   8.440  < 2e-16 ***
location3       0.1983303  0.0383017   5.178 2.39e-07 ***
bath1           0.0415152  0.0206509   2.010   0.0445 *  
kitchen1        0.1129557  0.0236130   4.784 1.80e-06 ***
cheating1       0.3304758  0.0240475  13.743  < 2e-16 ***
poly(yearc, 2)1 5.0471663  0.3301520  15.287  < 2e-16 ***
poly(yearc, 2)2 3.3575299  0.2751069  12.204  < 2e-16 ***
*--------
Parameter: sigma 
---
Coefficients:
              Estimate Std. Error   t value Pr(>|t|)    
(Intercept)  1.047e+01  6.245e-02   167.650  < 2e-16 ***
area         1.218e-03  6.099e-04     1.996   0.0460 *  
yearc       -5.972e-03  8.078e-07 -7392.764  < 2e-16 ***
location2    5.971e-02  2.911e-02     2.051   0.0403 *  
location3    2.176e-01  9.224e-02     2.359   0.0184 *  
bath1        5.069e-03  5.966e-02     0.085   0.9323    
kitchen1     3.750e-02  7.106e-02     0.528   0.5977    
cheating1   -2.431e-01  4.766e-02    -5.100  3.6e-07 ***
*--------
Parameter: nu 
---
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.65342    0.05454   11.98   <2e-16 ***
*--------
Parameter: tau 
---
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.18974    0.01758   181.4   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
*--------
n = 3082 df =  19 res.df =  3063
Deviance = 38254.617 Null Dev. Red. = 5.95%
AIC = 38292.617 elapsed =  0.32sec

Additive Models

madditive <- gamlss2(rent~s(area)+s(yearc)+location+bath+kitchen+
                cheating|s(area)+s(yearc)+location+bath+kitchen+cheating, 
              family=BCTo, data=da, trace=F)
GAIC(mlinear, madditive)

               AIC      df
madditive 38196.95 32.2364
mlinear   38292.62 19.0000

Additive Models (con.)

summary(madditive)

Call:
gamlss2(formula = rent ~ s(area) + s(yearc) + location + bath + 
    kitchen + cheating | s(area) + s(yearc) + location + bath + 
    kitchen + cheating, data = da, family = BCTo, ... = pairlist(trace = F))
---
Family: BCTo 
Link function: mu = log, sigma = log, nu = identity, tau = log
*--------
Parameter: mu 
---
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.71524    0.02082 274.541  < 2e-16 ***
location2    0.08808    0.01102   7.994 1.83e-15 ***
location3    0.20998    0.04965   4.229 2.41e-05 ***
bath1        0.05792    0.04280   1.353  0.17605    
kitchen1     0.10859    0.03642   2.982  0.00289 ** 
cheating1    0.34594    0.02021  17.114  < 2e-16 ***
---
Smooth terms:
    s(area) s(yearc)
edf  6.6271   6.4275
*--------
Parameter: sigma 
---
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.15569    0.03695 -31.277  < 2e-16 ***
location2    0.05293    0.02264   2.338   0.0195 *  
location3    0.21218    0.08616   2.463   0.0138 *  
bath1        0.03440    0.08318   0.414   0.6792    
kitchen1     0.01861    0.07591   0.245   0.8063    
cheating1   -0.23431    0.03754  -6.242 4.91e-10 ***
---
Smooth terms:
    s(area) s(yearc)
edf  1.1077   4.0742
*--------
Parameter: nu 
---
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.69275    0.04548   15.23   <2e-16 ***
*--------
Parameter: tau 
---
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.26960    0.07549   43.31   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
*--------
n = 3082 df =  32.24 res.df =  3049.76
Deviance = 38132.48 Null Dev. Red. = 6.25%
AIC = 38196.9528 elapsed =  0.47sec

Additive Models (con.)

plot(madditive)

Regression Trees

source("~/Dropbox/GAMLSS-development/nnet/reg.tree.R")
mregtree <- gamlss2(rent~tree(~area+yearc+location+bath+kitchen+cheating)|
                       tree(~area+yearc+location+bath+kitchen+cheating),
                  family=BCTo, data=da, trace=FALSE) 

GAIC(mlinear, madditive, mregtree)

               AIC      df
madditive 38196.95 32.2364
mlinear   38292.62 19.0000
mregtree  38754.64 30.0000

Regression Trees (con.)

pp<- specials(mregtree, model = "mu", elements = "model")
plot(pp)
text(pp)

Regression Trees (continue)

pp<-specials(mregtree, model = "sigma", elements = "model")
plot(pp)
text(pp)

Neural Networks (fit)

f <- rent ~ n(~area+yearc+location+bath+kitchen, size=10)|      n(~area+yearc+location+bath+kitchen, size=3)
mneural <- gamlss2(f,family=BCTo, data=da)

GAMLSS-RS iteration  1: Global Deviance = 38208.8675 eps = 0.289873     
GAMLSS-RS iteration  2: Global Deviance = 38178.0402 eps = 0.000806     
GAMLSS-RS iteration  3: Global Deviance = 38177.034 eps = 0.000026     
GAMLSS-RS iteration  4: Global Deviance = 38176.766 eps = 0.000007

gamlss2::GAIC(mlinear, madditive, mneural, mregtree, k=2)

               AIC       df
madditive 38196.95  32.2364
mlinear   38292.62  19.0000
mneural   38396.77 110.0000
mregtree  38754.64  30.0000

Neural Networks (plot)

source("~/Dropbox/GAMLSS-development/nnet/plot_NN.R")
 plot(specials(mneural, model = "mu", elements = "model"))

Neural Networks (plot)

source("~/Dropbox/GAMLSS-development/nnet/plot_NN.R")
plot(specials(mneural, model = "sigma", elements = "model"))

random forest

f <- rent ~ cf(~area+yearc+location+bath+kitchen)|    cf(~area+yearc+location+bath+kitchen)
mcf <- gamlss2(f,family=BCTo, data=da)

GAMLSS-RS iteration  1: Global Deviance = 38465.948 eps = 0.285095     
GAMLSS-RS iteration  2: Global Deviance = 38295.835 eps = 0.004422     
GAMLSS-RS iteration  3: Global Deviance = 38289.9702 eps = 0.000153     
GAMLSS-RS iteration  4: Global Deviance = 38288.2421 eps = 0.000045     
GAMLSS-RS iteration  5: Global Deviance = 38287.9638 eps = 0.000007

gamlss2::GAIC(mlinear, madditive, mneural, mregtree, mcf, k=2)

               AIC       df
madditive 38196.95  32.2364
mlinear   38292.62  19.0000
mneural   38396.77 110.0000
mcf       38695.96 204.0000
mregtree  38754.64  30.0000

LASSO-RIDGE

there is function ri() in gamlss using P-splines
there is also package gamlss.lasso for gamlss connecting with package glmnet
but LASSO is non implemented for gamlss2 yet

Principal Componet Regression

functions pc() and pcr() in gamlss.foreach package for gamlss
PCR is non implemented for gamlss2 yet

summary

Table 1: Summary of properties of the machine learning algorithms

ML Models	coef. s.e.	stand. of x’s	algo. stab., speed, conv.	non-linear terms	inter- actions	data type	auto sele-ction	interpre- tation
linear	yes	no	yes, fast, v.good	poly	declare	\(n>r\)	no	v. easy
additive	no	no	yes, slow, good	smooth	declare	\(n>r\)	no	easy
RT	no	no	no, slow, bad	trees	auto	\(n>r\)??	yes	easy

summary (continue)

Table 2: Summary of properties of the machine learning algorithms

ML Models	coef. s.e.	stand. of x’s	algo. stab., speed, conv.	non-linear terms	inter- actions	data type	auto sele-ction	interpre- tation
NN	no	0 to 1	no, \(\,\,\) ok, \(\,\,\) ok	auto	auto	both?	yes	v. hard
RF	no	no	no, slow, slow	YES	no	both	auto	hard
LASSO	no	yes	yes, fast, good	poly	declare	both	auto	easy

summary (continue)

Table 3: Summary of properties of the machine learning algorithms

ML Models	coef. s.e.	stand. of x’s	algo. stab., speed, conv.	non-linear terms	inter- actions	data type	auto sele-ction	interpre- tation
Boost	no	no	yes, fast, good	smooth trees	declare	\(n<<r\)	yes	easy
MCMC	yes	no	good, ok, \(\,\,\) ok	smooth	declare	\(n>r\)	no	easy
PCR	yes	yes	yes, fast, good	poly	declare	both	auto	hard

end

back to the index

The Books