The R software

Mikis Stasinopoulos
Bob Rigby
Gillian Heller
Fernanda De Bastiani
Niki Umlauf

Introduction

  • residuals in GAMLSS

  • an example; the rent99 data

  • R packages

Residuals

Introduction

  • GAMLSS uses as residuals the

    • normalised quantile residuals\(\equiv\) z-scores

PIT and z-scores residuals

let \(y_i\) and \(F(y_i, \hat(\theta)_i)\) be the ith observation and its fitted cdf respectively. Then the Probability Integral Transformed (PIT) residuals are

\[u_i = F(y_i, \hat(\theta)_i) \] and the z-scores residuals are

\[z_i = \Phi^{-1}(y_i, \hat(\theta)_i) \]

properties

If the distribution of \(y_i\) is specified correctly then PIT are uniform;

i.e \[u_i \sim U(0,1)\]

and z-scores are normally distributed

i.e. \[z_i \sim NO(0,1)\]

PIT

z-scores

diagnostics plots

  • residuals plots against other variables

    index 
    x-variable
    parameters 
    quantiles
  • qqplots

  • worm plots

  • density plots

  • bucket plots

  • skewness plots

Example: rent

Data

Table 1: The Table of Data
obs number y x1 x2 x3 xr-1 xr
1 y1 x11 x12 x13 x1r-1 x1r
2 y2 x21 x22 x23 x2r-1 x2r
3 y3 x31 x32 x33 x3r-1 x3r
n-1 yn-1 xn-11 xn-12 xn-12 xn-1r-1 xn-1r
n yn xn1 xn2 xn3 xnr-1 xnr

The rent 1999 Munich data

Table 2: Variables in Munich rent data
R Fl A B H L loc
693.3 50 1972 0 0 0 2
422.0 54 1972 0 0 0 2
736.6 70 1972 0 0 0 2
732.2 50 1972 0 0 0 2
1295.1 55 1893 0 0 0 2
1195.9 59 1893 0 0 0 2

Fitting

r1 <- gamlss2(R~pb(Fl)+pb(A)+H+loc|pb(Fl)+pb(A)+H+loc, 
           family=GA, data=rent)
GAMLSS-RS iteration  1: Global Deviance = 27566.2472 eps = 0.090862     
GAMLSS-RS iteration  2: Global Deviance = 27564.2738 eps = 0.000071     
GAMLSS-RS iteration  3: Global Deviance = 27564.1927 eps = 0.000002     

residual plots agaist index

library(gamlss.ggplots)
resid_index(r1)

against continuous x-variables

resid_xvar(r1, A)

against factor x-variables

resid_xvar(r1,loc)

QQ-plots

resid_qqplot(r1)

worm plots

resid_wp(r1)

density plots

resid_density(r1)

bucket plots

moment_bucket(r1)

symmetry plots

resid_symmetry(r1)

ecdf plot

resid_ecdf(r1)

detrended ecdf plot

resid_dtop(r1)

all in one plots

resid_plots(r1)

all in one plots (standard)

plot(r1,which="resid")

R-packages

Older Packages

  • gamlss: the original (needs dist and data)

  • gamlss.dist: defining the gamlss.family distributions

  • gamlss.data: for extra data sets

  • gamlss.add: connect with mgcv, nnet and trees

  • gamlss.tr: for truncating gamlss.family distributions

  • gamlss.cens: for censored response variables

  • gamlss.demo: for demonstrating GAMLSS concepts

  • gamlss.mx: for fitting finite mixtures

New Packages

  • gamboostLSS for GAMLSS boosting

  • bamlss the Bayesian GAMLSS

  • gamlss2\(^*\): the new version of GAMLSS

  • gamlss.ggplots: using ggplot2 within GAMLSS

  • gamlss.foreach: for parallel computing

  • gamlss.prepdata: preparation of data before fitting

  • gamlss.lasso: for LASSO. Ridge and elastic Net regression

  • gamlss.shiny\(^*\): similar to gamlss.demo

  • topmodels distributional regression help (not necessary gamlss)

why gamlss2

  • gamlss() for very large data is slow

  • predict in gamlss is not easy to use

  • current implementation can cope with only 4 parameters \(\mu\), \(\sigma\), \(\nu\) and \(\tau\)

  • to connect different estimation statistical approaches

    • penalised likelihood
    • Bayesian
    • boosting
  • to implement extra algorithms i.e. stepwise, robust

  • to implement machine learning methodology

getting the libraries

Practical 1

end

back to the index

The Books