Selection of terms

Mikis Stasinopoulos

Bob Rigby

Fernanda De Bastiani

Gillian Heller

Niki Umlauf

Introduction

step-wise selection
boosting
modelling interactions

selection

flowchart TB
  A[Models] --> B(automatic \n selection) 
  A --> C(declared\n selection)
  B --> G[automatic \n interaction]
  B --> H[set up \n interaction]
  G --> D(NN, RT)
  H --> E(LASSO, \n Ridge, \n Elastic Net, \n PCR )
  S --> M[step-wise]
  S --> N[boost]
  C --> S(LM, AM )

Figure 1: Different methods of selecting features.

Stepwise Selection (procedures)

forward,
backwards,
stepwise

Stepwise Selection (con.)

3-models

current model \(C\)
lower model \(L\)
- could be the null model
upper model, \(U\). \(U\)
- could be the saturated model

Stepwise Selection in GAMLSS strategy A.

steps	Lower	Direction	Current	Direction	Upper	Creates	Given
1 (\(\mu\))	\(L_{\mu}\)	\(\leftarrow\)	\(C_{\mu}\)	\(\rightarrow\)	\(U_{\mu}\)	\(F_{\mu}^{(1)}\)	\(C_{\sigma}, C_{\nu}, C_{\tau}\)
2 (\(\sigma\))	\(L_{\sigma}\)	\(\leftarrow\)	\(C_{\sigma}\)	\(\rightarrow\)	\(U_{\sigma}\)	\(F_{\sigma}^{(2)}\)	\(F_{\mu}^{(1)}, C_{\nu}, C_{\tau}\)
3 (\(\nu\))	\(L_{\nu}\)	\(\leftarrow\)	\(C_{\nu}\)	\(\rightarrow\)	\(U_{\nu}\)	\(F_{\nu}^{(3)}\)	\(F_{\mu}^{(1)},F_{\sigma}^{(2)}, C_{\tau}\)
4 (\(\tau\))	\(L_{\tau}\)	\(\leftarrow\)	\(C_{\tau}\)	\(\rightarrow\)	\(U_{\tau}\)	\(F_{\tau}^{(4)}\)	\(F_{\mu}^{(1)}, F_{\sigma}^{(2)}, F_{\nu}^{(3)}\)
5 (\(\nu\))	\(L_{\nu}\)	\(\leftarrow\)	\(F_{\nu}^{(3)}\)	\(\rightarrow\)	\(U_{\nu}\)	\(F_{\nu}^{(5)}\)	\(F_{\mu}^{(1)}, F_{\sigma}^{(2)}, F_{\tau}^{(4)}\)
6 (\(\sigma\))	\(L_{\sigma}\)	\(\leftarrow\)	\(F_{\sigma}^{(2)}\)	\(\rightarrow\)	\(U_{\sigma}\)	\(F_{\sigma}^{(6)}\)	\(F_{\mu}^{(1)}, F_{\nu}^{(5)}, F_{\tau}^{(4)}\)
7 (\(\mu\))	\(L_{\mu}\)	\(\leftarrow\)	\(F_{\mu}^{(1)}\)	\(\rightarrow\)	\(U_{\mu}\)	\(F_{\mu}^{(7)}\)	\(F_{\sigma}^{(6)}, F_{\nu}^{(5)}, F_{\tau}^{(4)}\)

Strategy A

library(gamlss2)
f1 <-  rent~(area+yearc+location+bath+kitchen+cheating)|
            area+yearc+location+ bath+kitchen+cheating|
            area+yearc+location+ bath+kitchen+cheating|
            area+yearc+location+ bath+kitchen+cheating
 m1 <- gamlss2(f1,
              family=BCTo,  data=da, trace=TRUE, n.cyc=20,
              c.crit=0.01)
 
mfA <- gamlss2(m1, scope=list(lower=~1,
       upper = ~poly(area,3)+poly(yearc,3)+(area+yearc+location+bath
          +kitchen + cheating)^2),
         trace=TRUE, parallel="snow", ncpus=10, k=log(3032),
         direction=rep("both",7) )

Linear model

\[ \begin{split} \texttt{msLinear:} \qquad &\texttt{rent} \sim \text{BCTo}(\mu, \sigma, \nu, \tau ) \\ &\mu \sim \texttt{poly(area,3)}+ \texttt{poly(yearc,3)} \\ & \qquad +\texttt{location}+ \texttt{bath}+\texttt{cheating}+ \texttt{bath}\\ \log\,&\sigma \sim \texttt{yearc}+\texttt{kitchen}+\texttt{yearc*kitchen}+\\ & \qquad +\texttt{poly(yeatc,3)} \\ & \nu \sim \texttt{yearc} + \texttt{kitchen} \\ \log\,&\tau \sim \texttt{yearc} + \texttt{cheating}. \\ \end{split} \]

Additive smooth model

\[ \begin{split} \texttt{msAdditive:} \qquad &\texttt{rent} \sim \text{BCTo}(\mu, \sigma, \nu, \tau ) \\ &\mu \sim \texttt{pb(area)}+ \texttt{pb(yearc)} \\ & \qquad +\texttt{location}+ \texttt{bath}+\texttt{cheating}+ \texttt{bath}\\ \log\,&\sigma \sim \texttt{yearc}+\texttt{kitchen}+\texttt{yearc*kitchen}+\\ & \qquad +\texttt{pb(yeatc)} \\ & \nu \sim \texttt{yearc} + \texttt{kitchen} \\ \log\,&\tau \sim \texttt{yearc} + \texttt{cheating}. \\ \end{split} \]

Boosting

library(gamboostLSS)
mfboost  <- gamboostLSS(list(
   mu = rent ~ bbs(area)+bbs(yearc)+
  (area+yearc+location+kitchen+bath+cheating),
sigma = rent ~ bbs(area)+bbs(yearc)+
  (area+yearc+location+kitchen+bath+cheating),
   nu = rent ~ bbs(area)+bbs(yearc)+
  (area+yearc+location+kitchen+bath+cheating),
  tau = rent ~ bbs(area)+bbs(yearc)+
  (area+yearc+location+kitchen+bath+cheating)),
        data = da, families = as.families("BCTo"),
        control=boost_control(mstop=1000, center=TRUE),
                                  method = "noncyclic")

Boosting (continuous)

cvr <- cvrisk(mfboost)

Starting cross-validation...

mstop(cvr)

[1] 989

mstop(mfboost) <- mstop(cvr)

model

\[ \begin{split} \texttt{mfboost:} \qquad &\texttt{rent} \sim \text{BCTo}(\mu, \sigma, \nu, \tau ) \\ &\mu \sim s(\texttt{area})+ s(\texttt{yearc}) +\texttt{location} \\ & \qquad +\texttt{bath}+\texttt{kitchen}+\texttt{cheating}\\ \log\,&\sigma \sim s(\texttt{area})+s(\texttt{yearc})+\texttt{location}\\ & \qquad +\texttt{bath}+ \texttt{cheating} \\ & \nu \sim s(\texttt{area})+ s(\texttt{yearc}) +\texttt{location} \\ & \qquad +\texttt{kitchen}+ \texttt{cheating} \\ \log\,&\tau \sim s(\texttt{yearc}). \\ \end{split} \]

Neural Network

set.seed(213)
msneural <- gamlss2(rent~n(~area+yearc+location+bath+kitchen+
                         cheating, size=10)|
      n(~area+yearc+location+bath+kitchen+cheating, size=3)| 
      n(~area+yearc+location+bath+kitchen+cheating, size=3)| 
      n(~area+yearc+location+bath+kitchen+cheating, size=3),
              family=BCTo, data=da)

GAMLSS-RS iteration  1: Global Deviance = 37885.661 eps = 0.295880     
GAMLSS-RS iteration  2: Global Deviance = 37819.3966 eps = 0.001749     
GAMLSS-RS iteration  3: Global Deviance = 37808.0326 eps = 0.000300     
GAMLSS-RS iteration  4: Global Deviance = 37802.7383 eps = 0.000140     
GAMLSS-RS iteration  5: Global Deviance = 37800.4138 eps = 0.000061     
GAMLSS-RS iteration  6: Global Deviance = 37799.374 eps = 0.000027     
GAMLSS-RS iteration  7: Global Deviance = 37799.2628 eps = 0.000002

Model

\[ \begin{split} \texttt{msNeural:} \qquad &\texttt{rent} \sim \text{BCTo}(\mu, \sigma, \nu, \tau ) \\ & \boldsymbol{\mu} = NN_{\mu}(\textbf{X}) \\ \log\,&\boldsymbol{\sigma} \sim NN_{\sigma}(\textbf{X}) \\ & \nu \sim NN_{\nu}(\textbf{X}) \\ \log\,&\tau \sim NN_{\tau}(\textbf{X}) \\ \end{split} \]

practical 4

end

back to the index

The Books