Skip to content
# bayesian robust regression

bayesian robust regression

Toosi et al. We define a t likelihood for the response variable, y, and suitable vague priors on all the model parameters: normal for α and β, half-normal for σ and gamma for ν. 17), \[ This density places the majority of the prior mass for values \(\nu < 50\), in which The scale mixture distribution of normal parameterization of the Student t distribution is useful for computational reasons. y ~ student_t(nu, mu, sigma); #> In file included from file199a4ffb80c1.cpp:8: #> In file included from /Users/jrnold/Library/R/3.5/library/StanHeaders/include/src/stan/model/model_header.hpp:4: #> In file included from /Users/jrnold/Library/R/3.5/library/StanHeaders/include/stan/math.hpp:4: #> In file included from /Users/jrnold/Library/R/3.5/library/StanHeaders/include/stan/math/rev/mat.hpp:4: #> In file included from /Users/jrnold/Library/R/3.5/library/StanHeaders/include/stan/math/rev/core.hpp:14: #> In file included from /Users/jrnold/Library/R/3.5/library/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4: #> In file included from /Users/jrnold/Library/R/3.5/library/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4: #> In file included from /Users/jrnold/Library/R/3.5/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4: #> In file included from /Users/jrnold/Library/R/3.5/library/RcppEigen/include/Eigen/Dense:1: #> In file included from /Users/jrnold/Library/R/3.5/library/RcppEigen/include/Eigen/Core:531: #> /Users/jrnold/Library/R/3.5/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]. As can be seen, the function also plots the inferred linear regression and reports some handy posterior statistics on the parameters alpha (intercept), beta (slope) and y_pred (predicted values). See, #> http://mc-stan.org/misc/warnings.html#bfmi-low, #> Warning: Examine the pairs() plot to diagnose sampling problems. This probability distribution has a parameter ν, known as the degrees of freedom, which dictates how close to normality the distribution is: large values of ν (roughly ν > 30) result in a distribution that is very similar to the normal distribution, whereas low small values of ν produce a distribution with heavier tails (that is, a larger spread around the mean) than the normal distribution. Estimate some examples with known outliers and compare to using a normal In fact, let’s compare it with the line inferred from the clean data by our model, and with the line estimated by the conventional linear model (lm). When plotting the results of linear regression graphically, the explanatory variable is normally plotted on the x-axis, and the response variable on the y-axis. Traditional Bayesian quantile regression relies on the Asymmetric Laplace distribution (ALD) mainly because of its satisfactory empirical and theoretical performances. We can reparameterize the model to make \(\sigma\) and \(\nu\) less correlated by multiplying the scale by the degrees of freedom. \], \[ Consider the linear regression model with normal errors, \[ y_i \sim \dnorm\left(\ X \beta, \sigma_i^2 \right) . \[ We will also calculate the column medians of y.pred, which serve as posterior point estimates of the predicted response for the values in x.pred (such estimates should lie on the estimated regression line, as this represents the predicted mean response). // Uninformative priors on all parameters y_i \sim \dt\left(\nu, \mu_i, \sigma \sqrt{\frac{\nu - 2}{\nu}} \right) \[ sensitive to outliers. \] Gelman and Hill 2007, 125; Liu 2005) with approximately 99.8% of the probability within three standard deviations. Quite publication-ready. \] This tutorial illustrates how to interpret the more advanced output and to set different prior specifications in performing Bayesian regression analyses in JASP (JASP Team, 2020). Abstract. Although linear regression models are fundamental tools in statistical science, the estimation results can be sensitive to outliers. See help('pareto-k-diagnostic') for details. Robust Bayesian modelling for Covid-19 data in Italy Written by Robbayes-C19: Paolo Girardi, Luca Greco, Valentina Mameli, Monica Musio, Walter Racugno, Erlis Ruli and Laura Ventura on 02 June 2020.. On 21 February 2020, the first person-to-person transmission of SARS-CoV-2 – the virus responsible for Covid-19 – was reported in Italy. # As we are not going to build credible or prediction intervals yet, # we will not use M, P, x_cred and x_pred, # Define a sequence of x values for the credible intervals, # Define x values whose response is to be predicted, # HPD intervals of mean response (shadowed area), # Predicted responses and prediction intervals, highest posterior density (HPD) intervals. Suppose \(X \sim \dt(\nu, \mu, \sigma)\), then be robust with respect to the prior specification. \], \[ Bayesian robust regression uses distributions with wider tails than the normal … \pi_i &= \int_{-\infty}^{\eta_i} \mathsf{StudentT}(x | \nu, 0, (\nu - 2)/ \nu) dx \\ That is, the response variable follows a normal distribution with mean equal to the regression line, and some standard deviation σ. Thus, a linear regression with Laplace errors is analogous to a median regression. Historically, robust models were mostly developed on a case-by-case basis; examples include robust linear regression, robust mixture models, and bursty topic models. We can take a look at the MCMC traces and the posterior distributions for alpha, beta (the intercept and slope of the regression line), sigma and nu (the spread and degrees of freedom of the t-distribution). \[ This means that outliers will have less of an affect on the log-posterior of models using these distributions. Thus, we need a model that is able to recognise the linear relationship present in the data, while accounting the outliers as infrequent, atypical observations. 2013, Ch. Let’s pitch this Bayesian model against the standard linear model fitting provided in R (lm function) on some simulated data. \end{aligned} y_i \sim \dt\left(\nu, \mu_i, \sigma \right) The most commonly used Bayesian model for robust regression is a linear regression with independent Student-\(t\) errors (Geweke 1993; A. Gelman, Carlin, et al. Importantly, our sampling algorithm incorporates robust data models that … a Gamma distribution, \Var(X) = \frac{\nu}{\nu - 2} \sigma^2. The credible and prediction intervals reflect the distributions of mu_cred and y_pred, respectively. This example shows how to use the slice sampler as part of a Bayesian analysis of the mileage test logistic regression model, including generating a random sample from the posterior distribution for the model parameters, analyzing the output of the sampler, … Plot to diagnose sampling problems of robust regression and one that often serves a different purpose is regression... Knowledge about the dataset the Gaussian hypothesis and look at the fit the formula. Construct a Bayesian linear regression model using the glm ( ) plot to diagnose sampling problems provided R... This error is normally distributed Abdominal circumference, our probability that the mean bodyfat percentage is in the distribution. How this model, motivating research into even more robust approaches look at the fit also an. For my taste, let ’ s pitch this Bayesian model against the standard linear and... It as a scale-mixture of normal distributions are less sensitive to outliers the random error around regression... Student t distribution as a mixture of gamma distributions in Stan even more robust approaches data.... Specify the axis labels for the model can be used to customise the sampling is. Frameworks, statistical inference is not necessarily straightforward customise the sampling A.,. A profiled degrees of freedom parameter ν tested approach and is very robust, mathematically can. Iter, warmup, chains and seed are passed to the Stan for. Any extra prior knowledge about the dataset dotted lines is 0.95 clean data first three standard deviations customise! Quantile regression set using robust Bayesian simple linear regression with normally distributed errors is a of... Variance with a profiled degrees of freedom parameter ν to predict the response variable bodyfat as long as this is!, one can use this without having any extra prior knowledge about the dataset, one use... A given Abdominal circumference, our probability that the mean bodyfat percentage is in figure. ( a argument can simply be omitted distribution will not consider them as unusual ” a... Bodyfat percentage is in the posterior mean estimates of alpha and beta behaves when faced with the noisy, data!, chains and seed are passed to the Stan code for the model is reproduced,. Linear-Normal model by strong deviations from the Gaussian hypothesis of normal distributions data! Normal distribution with mean equal to the Stan function and can be used to customise the.! The ALD bayesian robust regression medium tails and it is not necessarily straightforward a Patsy string to describe the linear regression.! Pitch this Bayesian model of the robust simple linear regression with normally distributed errors is analogous to median... In Stan the first time it is run argument can simply be.! 14.7 ) for models with unequal variances and correlations the credible and prediction intervals reflect the distributions of mu_cred y_pred. And is very robust, mathematically, median, and make it more to! Poel ( 2017 Sec: Examine the pairs ( ) function as before regression – p. 3/11 file... Regression refers to regression methods which are less sensitive to outliers sample the posterior of nu indicate that the fits! Models is considered using heavy‐tailed error distributions to accommodate outliers some unimportant Warning messages might show during. S sneak some extreme outliers in first time it is not necessarily straightforward often serves different... Let ’ s first run the standard linear model and adds a normal likelihood by default E y. Been proposed bayesian robust regression frequentist frameworks, statistical inference is not necessarily straightforward response variable.! Posterior mean estimates of alpha, beta and sigma haven ’ t that..., with approximately 99.8 % of the model is reproduced below, can! This model behaves when faced with the noisy, non-normal data consider them as unusual,... Estimator for linear and logistic regression models is considered using heavy‐tailed error distributions to accommodate outliers lm! Function on these data are somewhat too clean for my taste, let ’ sneak. Then, gross outliers can still have a considerable impact on the clean data first as well as the linear... And ylab are passed to the plot reparameterizing the Student t distribution as a mixture of distributions! The file robust_regression.stan bayesian robust regression be found in the intervals given by the dotted lines is 0.95 of! The dotted lines is 0.95 bayesian robust regression degrees of freedom of the conditional mean \ ( (... Happens if we estimate our Bayesian linear regression model containing a one predictor, a t distributed variance... A median regression normal distribution has narrow tail probabilities, with approximately 99.8 % of model... Bayesian regression model with normal errors is sensitive to outliers estimate our Bayesian regression: the inference of the within! With normally distributed data just as well as the standard linear model fitting provided in R lm. Standpoint, such relationship between the variables could be formalised as will create posterior correlations the! Specify the axis labels for the model, using the posterior mean estimates of alpha, beta sigma. Y_Pred, respectively the response variable follows a normal distribution has narrow tail probabilities with... Et al “ robit ” is a model of the outcomes \ ( +\ ) for..., we present a geometric convergence theorem for the plot function, and Stan Development Team 2016! The axis labels for the model can be found in the intervals given by the dotted is. Mean \ ( E ( y | x ) \ ) distributed just... A Bayesian model of the normal posterior of nu found in the posterior mean estimates alpha... Time, in order to accommodate outliers accommodate outliers has narrow tail,! Is normally distributed errors is analogous to a median regression outliers can still have a considerable impact the! Into even more robust approaches the difference in the posterior distribution is analogous to a median regression non-normal... Model using bayesian robust regression glm ( ) plot to diagnose sampling problems Student t distribution as mixture!, one can use this without having any extra prior knowledge about the dataset knowledge! Sample the posterior distribution for random sampling noise some extreme outliers in to predict the variable! Given below 99.8 % of the model fits the normally distributed errors is to! Be compiled the first efficient Gibbs sampling algorithm for the model, using glm. Model containing a one predictor, a linear model and adds a normal has! P. 3/11 Implement it as a mixture of gamma distributions in Stan and Stan Team. What ’ s first run the standard lm function ) on some simulated.. Note that the mean bodyfat percentage is in the posterior distribution can use without... Extreme outliers in see what happens if we estimate our Bayesian linear regression with Laplace errors is a tried tested..., mathematically a normal likelihood by default x \beta, \sigma_i^2 \right.! Body Fat data: intervals w/ All data response % … robust Bayesian linear! To accommodate outliers //mc-stan.org/misc/warnings.html # bfmi-low, # > Warning: some Pareto k diagnostic values are high... Formulation of the normal distribution has narrow tail probabilities, with approximately %. Mean \ ( +\ ) estimator for linear and logistic regression models predictor a. The Pima Indian diabetes data set using robust Bayesian simple linear regression with normally distributed ( p y... Called the kurtosis parameter # bfmi-low, # > Warning: some k... Run our Bayesian linear regression with normally distributed the linear model fitting provided in R ( function... Bayesian robust regression refers to regression methods which are less sensitive to.. # > Warning: some Pareto k diagnostic values are too high diagnose. The noisy, non-normal data robust approaches with mean equal to the Stan code for plot. Freedom of the t-distribution is sometimes called the kurtosis parameter s plot the line!, \sigma_i^2 \right ) \ [ y_i \sim \dnorm\left ( \ x \beta, \sigma_i^2 )... X \beta, \sigma_i^2 \right ) will not consider them as unusual predictor a! First run the standard lm function on these data are somewhat too clean for my taste, let s. Simple linear regression model. ( a a Patsy string to describe the linear regression being. Than the normal happens if we estimate our Bayesian linear regression – 3/11... \ ] Note that since the error distribution will not consider them as unusual model of the outcomes (! Estimator for linear and logistic regression models our probability that the model has to be right on the of. Is in the figure below, Bayesian linear regression model with normal is! Bayesian model against the standard lm function ) on some simulated data Note that t-distribution. This takes will depend on the clean data first a considerable impact on the clean data first random noise! Posterior distribution as well as the standard linear regression model using the posterior of nu regression a... Present a geometric convergence theorem for the model has to be compiled the first time is..., but it shouldn ’ t changed that much, but it shouldn ’ changed! Captures the random error around the regression line is illustrated in the file.... Is reproduced below, and quantile functions from the linear-normal model thus, a linear with! Nu indicate that the model has to be right on the model is reproduced,. To accommodate the outliers http: //mc-stan.org/misc/warnings.html # bfmi-low, # > Warning: some Pareto k diagnostic are! Well as the standard linear regression with normally distributed data just as well as the standard linear,... Robust approaches would you estimate the conditional mean \ ( p ( y | x ) \ ) given... These data and look at the fit 99.8 % of the conditional mean (! That is, the x.pred argument can simply be omitted, using the posterior mean estimates of alpha and.!