
Morning
why GAMLSS
Available software
Practical 1
Afternoon
Distributions
Continuous distributions
Practical 2
Morning
Discrete Distributions
Mixed distributions
Practical 3
Afternoon
Model Fitting
Model Selection
Practical 4
Morning
Centile estination
Diagnostics and ggplots
Practical 5
Afternoon
Model Comparison
Model Interpretation
Discussion
Statistical modelling
Ditributional Regression
“all models are wrong but some are useful”.
– George Box
Models should be parsimonious
Models should be fit for purpose and able to answer the question at hand
Statistical models have a stochastic component
All models are based on assumptions.
Assumptions are made to simplify things
Explicit assumptions
Implicit assumptions
it is easier to check the explicit assumptions rather the implicit ones

\[ \begin{equation} y_i= b_0 + b_1 x_{1i} + b_2 x_{2i}, \ldots, b_p x_{pi}+ \epsilon_i \end{equation} \qquad(1)\]
\[ \begin{split} y_i & \stackrel{\small{ind}}{\sim } & {N}(\mu_i, \sigma) \nonumber \\ \mu_i &=& b_0 + b_1 x_{1i} + b_2 x_{2i}, \ldots, b_p x_{pi} \end{split} \qquad(2)\]
BMI dataBMI fitted model\[ \begin{split} y_i & \stackrel{\small{ind}}{\sim } & {N}(\mu_i, \sigma) \nonumber \\ \mu_i &=& b_0 + s_1(x_{1i}) + s_2(x_{2i}), \ldots, s_p(x_{pi}) \end{split} \qquad(3)\]
\[\begin{split} y_i & \stackrel{\small{ind}}{\sim }& {N}(\mu_i, \sigma) \nonumber \\ \mu_i &=& ML(x_{1i},x_{2i}, \ldots, x_{pi}) \end{split} \qquad(4)\]
\[\begin{split} y_i & \stackrel{\small{ind}}{\sim }& {E}(\mu_i, \phi) \nonumber \\ g(\mu_i) &=& b_0 + b_1 x_{1i} + b_2 x_{2i}, \ldots, b_p x_{pi} \end{split} \qquad(5)\]
\({E}(\mu_i, \phi)\) : Exponential family
\(g(\mu_i)\) : the link function




the mean of the response is fitted fine with the linear model but the distribution is not
the distribution (implicit Normal) is not-adequate
even the explicit Gamma distribution of the GLM is not-adequate
therefore if we are interested on a statistic different from the mean we need something extra.
\[ X \stackrel{\textit{M}(\boldsymbol{\theta})}{\longrightarrow} D\left(Y|\boldsymbol{\theta}(\textbf{X})\right) \]
All parameters \(\boldsymbol{\theta}\) could functions of the explanatory variables \(\boldsymbol{\theta}(\textbf{X})\).
\(D\left(Y|\boldsymbol{\theta}(\textbf{X})\right)\) can be any \(k\) parameter distribution
\[\begin{split} y_i & \stackrel{\small{ind}}{\sim }& {D}( \theta_{1i}, \ldots, \theta_{ki}) \nonumber \\ g(\theta_{1i}) &=& b_{10} + s_1({x}_{1i}) + \ldots, s_p({x}_{pi}) \nonumber\\ \ldots &=& \ldots \nonumber\\ g({\theta}_{ki}) &=& b_0 + s_1({x}_{1i}) + \ldots, s_p({x}_{pi}) \end{split} \qquad(6)\]
\[\begin{split} y_i & \stackrel{\small{ind}}{\sim }& {D}( \theta_{1i}, \ldots, \theta_{ki}) \nonumber \\ g({\theta}_{1i}) &=& {ML}_1({x}_{1i},{x}_{2i}, \ldots, {x}_{pi}) \nonumber \\ \ldots &=& \ldots \nonumber\\ g({\theta}_{ki}) &=& {ML}_1({x}_{1i},{x}_{2i}, \ldots, {x}_{pi}) \end{split} \qquad(7)\]




Figure 1: Centile-plot of the fitted m6 model
BMI data



Distributional assumptions often needed for the response to be fitted properly
In the BMI example above we needed to model all the parameters of the distribution as function of the explanatory variable age.
Those parameters were the location parameter \(\mu\), the scale parameter, \(\sigma\), the skewness parameter, \(\nu\), and the kurtotic parameter \(\tau\)
Machine Learning methods are useful (especially for modelling interactions between variables) but they are not suitable if the interest do non lie in the mean.
The Books

www.gamlss.com