www.delorie.com/gnu/docs/mcsim/mcsim_42.html   search  
 
Buy GNU books!


MCSim User' Manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.5 Specifying a statistical model

Statistical models are defined in the simulation specification file, rather than in the model definition file. It is necessary to define a statistical model (with parameter dependencies, prior distributions and likelihood) if you want to use MCMC sampling. MCMC sampling will then give you in output a sample of parameters drawn from their joint posterior distribution. Take for example the simple linear regression model:

y_i = N(Mu_i, Sigma^2), Mu_i = Alpha + Beta * (x_i - x_bar).

where the observed (x,y) pairs are (1,1), (2,3), (3,3), (4,3) and (5,5). The quantities Alpha and Beta are given N(0,10000) priors, and 1/Sigma^2 is given a Gamma(1e-2,1e-2) prior. x_bar is the average of the above values for x. We want the posterior distributions of Alpha, Beta, and Sigma^2.

The first thing to do is to define a model to compute y as a function of x. Here is such a model (quite similar to the one distributed with MCSim source code (see section B.1 `linear.model'):

 
# ---------------------------------------------
# Model definition file for a linear model
# ---------------------------------------------
Outputs = {y};

# Model parameters
Alpha = 0;
Beta = 0;
Sigma2 = 1;
x_bar = 0;

CalcOutputs { y = Alpha + Beta * (t - x_bar); }
# ---------------------------------------------

The parameters' initialization values are arbitrary, and could be anything reasonable. They will be changed or sampled through the input file. Note that Sigma2is not used in the model equations, but still needs to be defined here in order to be part of the statistical model. On the other hand, Mu is not defined, since we do not really need it.Finally x is replaced by the time, t, for convenience. An alternative would be to define an input `x' and use it instead of t.

We now need to write an input file specifying the distribution of y (i.e., the likelihood), and the prior distributions of the various parameters. Here is what such a file could look like:

 
# ---------------------------------------------------------------
# Simulation input file for a linear regression
# ---------------------------------------------------------------
SimType (MCMC);
MCMC ("linear.MCMC.out", "", "", 50000, 0, 5, 40000, 63453.1961);
Level {
  Distrib(Alpha,  Normal_v, 0, 10000);
  Distrib(Beta,   Normal_v, 0, 10000);
  Distrib(Sigma2, InvGamma, 0.01, 0.01);
  Distrib(y, Normal_v, Prediction(y), Sigma2);
  Experiment {
    x_bar = 3.0;
    PrintStep (y, 1, 5, 1);
    Data  (y, 1, 3, 3, 3, 5);
  }
} # end Level
End
# ---------------------------------------------------------------

The file begins the obvious SimType() (see section 6.3.1 SimType() specification) and MCMC() (see section 6.3.6 MCMC() specification) keywords. The keyword Level comes next. Level is used to specify the dependence between model parameters in a hierarchy. There should be at least one Level in every MCMC input file, even for a non-hierarchical model like the one above (actually, "non-hierarchical" models can be thought of has having only one level of hierarchy). See below for further discussion of the Level keyword. You can also look at the MCMC input files provided as examples with MCSim source code. The Distrib() statements define the parameter priors. Normal_v specifications are used since we use variances instead of standard deviations. The inverse-Gamma distribution is used for the variance component, since the precision is supposed to be Gamma-distributed. The likelihood is the distribution of the data, given the model: it is also specified by a Distrib() statement, valid for every y data point. Again, note that the Muvariable is not used. Instead the Prediction(y) specification is used to signify the linear model output. These distributions are in effect for every sub-level or every Experiment included in the current level.

The "simulations" to perform, and the corresponding data values, are specified by the Experiment section. Only one Experiment is needed here, but several could be specified. In this section, the value of x_baris provided. The different values of x (time in our model) can be specified via a PrintStep() specification (see section 6.4.4 PrintStep() specification), since they are equally spaced. More generally, a Print() specification could have been used (see section 6.4.3 Print() specification). The data values are given in a Data() statement.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003