Morning
why GAMLSS
Available software
Practical 1
Afternoon
Distributions
Continuous distributions
Practical 2
Morning
Discrete Distributions
Mixed distributions
Practical 3
Afternoon
Model Fitting
Model Selection
Practical 4
Morning
Centile estination
Diagnostics and ggplots
Practical 5
Afternoon
Model Comparison
Model Interpretation
Discussion
Statistical modelling
Ditributional Regression
“all models are wrong but some are useful”.
– George Box
Models should be parsimonious
Models should be fit for purpose and able to answer the question at hand
Statistical models have a stochastic component
All models are based on assumptions.
Assumptions are made to simplify things
Explicit assumptions
Implicit assumptions
it is easier to check the explicit assumptions rather the implicit ones
\[ \begin{equation} y_i= b_0 + b_1 x_{1i} + b_2 x_{2i}, \ldots, b_p x_{pi}+ \epsilon_i \end{equation} \qquad(1)\]
\[ \begin{split} y_i & \stackrel{\small{ind}}{\sim } & {N}(\mu_i, \sigma) \nonumber \\ \mu_i &=& b_0 + b_1 x_{1i} + b_2 x_{2i}, \ldots, b_p x_{pi} \end{split} \qquad(2)\]
BMI
dataBMI
fitted model\[ \begin{split} y_i & \stackrel{\small{ind}}{\sim } & {N}(\mu_i, \sigma) \nonumber \\ \mu_i &=& b_0 + s_1(x_{1i}) + s_2(x_{2i}), \ldots, s_p(x_{pi}) \end{split} \qquad(3)\]
\[\begin{split} y_i & \stackrel{\small{ind}}{\sim }& {N}(\mu_i, \sigma) \nonumber \\ \mu_i &=& ML(x_{1i},x_{2i}, \ldots, x_{pi}) \end{split} \qquad(4)\]
\[\begin{split} y_i & \stackrel{\small{ind}}{\sim }& {E}(\mu_i, \phi) \nonumber \\ g(\mu_i) &=& b_0 + b_1 x_{1i} + b_2 x_{2i}, \ldots, b_p x_{pi} \end{split} \qquad(5)\]
\({E}(\mu_i, \phi)\) : Exponential
family
\(g(\mu_i)\) : the link
function
the mean
of the response is fitted fine with the linear model but the distribution is not
the distribution (implicit Normal
) is not-adequate
even the explicit Gamma
distribution of the GLM is not-adequate
therefore if we are interested on a statistic different from the mean
we need something extra.
\[ X \stackrel{\textit{M}(\boldsymbol{\theta})}{\longrightarrow} D\left(Y|\boldsymbol{\theta}(\textbf{X})\right) \]
All parameters \(\boldsymbol{\theta}\) could functions of the explanatory variables \(\boldsymbol{\theta}(\textbf{X})\).
\(D\left(Y|\boldsymbol{\theta}(\textbf{X})\right)\) can be any \(k\) parameter distribution
\[\begin{split} y_i & \stackrel{\small{ind}}{\sim }& {D}( \theta_{1i}, \ldots, \theta_{ki}) \nonumber \\ g(\theta_{1i}) &=& b_{10} + s_1({x}_{1i}) + \ldots, s_p({x}_{pi}) \nonumber\\ \ldots &=& \ldots \nonumber\\ g({\theta}_{ki}) &=& b_0 + s_1({x}_{1i}) + \ldots, s_p({x}_{pi}) \end{split} \qquad(6)\]
\[\begin{split} y_i & \stackrel{\small{ind}}{\sim }& {D}( \theta_{1i}, \ldots, \theta_{ki}) \nonumber \\ g({\theta}_{1i}) &=& {ML}_1({x}_{1i},{x}_{2i}, \ldots, {x}_{pi}) \nonumber \\ \ldots &=& \ldots \nonumber\\ g({\theta}_{ki}) &=& {ML}_1({x}_{1i},{x}_{2i}, \ldots, {x}_{pi}) \end{split} \qquad(7)\]
BMI
dataDistributional assumptions often needed for the response to be fitted properly
In the BMI example above we needed to model all the parameters of the distribution as function of the explanatory variable age
.
Those parameters were the location
parameter \(\mu\), the scale
parameter, \(\sigma\), the skewness
parameter, \(\nu\), and the kurtotic
parameter \(\tau\)
Machine Learning methods are useful (especially for modelling interactions between variables) but they are not suitable if the interest do non lie in the mean.
The Books
www.gamlss.com