Selection of terms

Mikis Stasinopoulos
Bob Rigby
Fernanda De Bastiani
Gillian Heller
Niki Umlauf

Introduction

  • step-wise selection

  • boosting

  • modelling interactions

selection

flowchart TB
  A[Models] --> B(automatic \n selection) 
  A --> C(declared\n selection)
  B --> G[automatic \n interaction]
  B --> H[set up \n interaction]
  G --> D(NN, RT)
  H --> E(LASSO, \n Ridge, \n Elastic Net, \n PCR )
  S --> M[step-wise]
  S --> N[boost]
  C --> S(LM, AM )
Figure 1: Different methods of selecting features.

Stepwise Selection (procedures)

  • forward,
  • backwards,
  • stepwise

Stepwise Selection (con.)

3-models

  • current model \(C\)

  • lower model \(L\)

    • could be the null model
  • upper model, \(U\). \(U\)

    • could be the saturated model

Stepwise Selection in GAMLSS strategy A.

steps Lower Direction Current Direction Upper Creates Given
1 (\(\mu\)) \(L_{\mu}\) \(\leftarrow\) \(C_{\mu}\) \(\rightarrow\) \(U_{\mu}\) \(F_{\mu}^{(1)}\) \(C_{\sigma}, C_{\nu}, C_{\tau}\)
2 (\(\sigma\)) \(L_{\sigma}\) \(\leftarrow\) \(C_{\sigma}\) \(\rightarrow\) \(U_{\sigma}\) \(F_{\sigma}^{(2)}\) \(F_{\mu}^{(1)}, C_{\nu}, C_{\tau}\)
3 (\(\nu\)) \(L_{\nu}\) \(\leftarrow\) \(C_{\nu}\) \(\rightarrow\) \(U_{\nu}\) \(F_{\nu}^{(3)}\) \(F_{\mu}^{(1)},F_{\sigma}^{(2)}, C_{\tau}\)
4 (\(\tau\)) \(L_{\tau}\) \(\leftarrow\) \(C_{\tau}\) \(\rightarrow\) \(U_{\tau}\) \(F_{\tau}^{(4)}\) \(F_{\mu}^{(1)}, F_{\sigma}^{(2)}, F_{\nu}^{(3)}\)
5 (\(\nu\)) \(L_{\nu}\) \(\leftarrow\) \(F_{\nu}^{(3)}\) \(\rightarrow\) \(U_{\nu}\) \(F_{\nu}^{(5)}\) \(F_{\mu}^{(1)}, F_{\sigma}^{(2)}, F_{\tau}^{(4)}\)
6 (\(\sigma\)) \(L_{\sigma}\) \(\leftarrow\) \(F_{\sigma}^{(2)}\) \(\rightarrow\) \(U_{\sigma}\) \(F_{\sigma}^{(6)}\) \(F_{\mu}^{(1)}, F_{\nu}^{(5)}, F_{\tau}^{(4)}\)
7 (\(\mu\)) \(L_{\mu}\) \(\leftarrow\) \(F_{\mu}^{(1)}\) \(\rightarrow\) \(U_{\mu}\) \(F_{\mu}^{(7)}\) \(F_{\sigma}^{(6)}, F_{\nu}^{(5)}, F_{\tau}^{(4)}\)

Strategy A

library(gamlss2)
f1 <-  rent~(area+yearc+location+bath+kitchen+cheating)|
            area+yearc+location+ bath+kitchen+cheating|
            area+yearc+location+ bath+kitchen+cheating|
            area+yearc+location+ bath+kitchen+cheating
 m1 <- gamlss2(f1,
              family=BCTo,  data=da, trace=TRUE, n.cyc=20,
              c.crit=0.01)
 
mfA <- gamlss2(m1, scope=list(lower=~1,
       upper = ~poly(area,3)+poly(yearc,3)+(area+yearc+location+bath
          +kitchen + cheating)^2),
         trace=TRUE, parallel="snow", ncpus=10, k=log(3032),
         direction=rep("both",7) )

Linear model

\[ \begin{split} \texttt{msLinear:} \qquad &\texttt{rent} \sim \text{BCTo}(\mu, \sigma, \nu, \tau ) \\ &\mu \sim \texttt{poly(area,3)}+ \texttt{poly(yearc,3)} \\ & \qquad +\texttt{location}+ \texttt{bath}+\texttt{cheating}+ \texttt{bath}\\ \log\,&\sigma \sim \texttt{yearc}+\texttt{kitchen}+\texttt{yearc*kitchen}+\\ & \qquad +\texttt{poly(yeatc,3)} \\ & \nu \sim \texttt{yearc} + \texttt{kitchen} \\ \log\,&\tau \sim \texttt{yearc} + \texttt{cheating}. \\ \end{split} \]

Additive smooth model

\[ \begin{split} \texttt{msAdditive:} \qquad &\texttt{rent} \sim \text{BCTo}(\mu, \sigma, \nu, \tau ) \\ &\mu \sim \texttt{pb(area)}+ \texttt{pb(yearc)} \\ & \qquad +\texttt{location}+ \texttt{bath}+\texttt{cheating}+ \texttt{bath}\\ \log\,&\sigma \sim \texttt{yearc}+\texttt{kitchen}+\texttt{yearc*kitchen}+\\ & \qquad +\texttt{pb(yeatc)} \\ & \nu \sim \texttt{yearc} + \texttt{kitchen} \\ \log\,&\tau \sim \texttt{yearc} + \texttt{cheating}. \\ \end{split} \]

Boosting

library(gamboostLSS)
mfboost  <- gamboostLSS(list(
   mu = rent ~ bbs(area)+bbs(yearc)+
  (area+yearc+location+kitchen+bath+cheating),
sigma = rent ~ bbs(area)+bbs(yearc)+
  (area+yearc+location+kitchen+bath+cheating),
   nu = rent ~ bbs(area)+bbs(yearc)+
  (area+yearc+location+kitchen+bath+cheating),
  tau = rent ~ bbs(area)+bbs(yearc)+
  (area+yearc+location+kitchen+bath+cheating)),
        data = da, families = as.families("BCTo"),
        control=boost_control(mstop=1000, center=TRUE),
                                  method = "noncyclic")

Boosting (continuous)

cvr <- cvrisk(mfboost)
Starting cross-validation...
mstop(cvr)
[1] 989
mstop(mfboost) <- mstop(cvr)

model

\[ \begin{split} \texttt{mfboost:} \qquad &\texttt{rent} \sim \text{BCTo}(\mu, \sigma, \nu, \tau ) \\ &\mu \sim s(\texttt{area})+ s(\texttt{yearc}) +\texttt{location} \\ & \qquad +\texttt{bath}+\texttt{kitchen}+\texttt{cheating}\\ \log\,&\sigma \sim s(\texttt{area})+s(\texttt{yearc})+\texttt{location}\\ & \qquad +\texttt{bath}+ \texttt{cheating} \\ & \nu \sim s(\texttt{area})+ s(\texttt{yearc}) +\texttt{location} \\ & \qquad +\texttt{kitchen}+ \texttt{cheating} \\ \log\,&\tau \sim s(\texttt{yearc}). \\ \end{split} \]

Neural Network

set.seed(213)
msneural <- gamlss2(rent~n(~area+yearc+location+bath+kitchen+
                         cheating, size=10)|
      n(~area+yearc+location+bath+kitchen+cheating, size=3)| 
      n(~area+yearc+location+bath+kitchen+cheating, size=3)| 
      n(~area+yearc+location+bath+kitchen+cheating, size=3),
              family=BCTo, data=da)
GAMLSS-RS iteration  1: Global Deviance = 37885.661 eps = 0.295880     
GAMLSS-RS iteration  2: Global Deviance = 37819.3966 eps = 0.001749     
GAMLSS-RS iteration  3: Global Deviance = 37808.0326 eps = 0.000300     
GAMLSS-RS iteration  4: Global Deviance = 37802.7383 eps = 0.000140     
GAMLSS-RS iteration  5: Global Deviance = 37800.4138 eps = 0.000061     
GAMLSS-RS iteration  6: Global Deviance = 37799.374 eps = 0.000027     
GAMLSS-RS iteration  7: Global Deviance = 37799.2628 eps = 0.000002     

Model

\[ \begin{split} \texttt{msNeural:} \qquad &\texttt{rent} \sim \text{BCTo}(\mu, \sigma, \nu, \tau ) \\ & \boldsymbol{\mu} = NN_{\mu}(\textbf{X}) \\ \log\,&\boldsymbol{\sigma} \sim NN_{\sigma}(\textbf{X}) \\ & \nu \sim NN_{\nu}(\textbf{X}) \\ \log\,&\tau \sim NN_{\tau}(\textbf{X}) \\ \end{split} \]

practical 4

end

back to the index

The Books