www.delorie.com/gnu/docs/mcsim/mcsim_34.html   search  
 
Buy GNU books!


MCSim User' Manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.3.6 MCMC() specification

Markov chain Monte Carlo (MCMC) simulations, used in a Bayesian context, allow the user to specify a statistical model (eventually hierarchical) and sample parameters from their joint posterior distribution, given a prior distribution for each parameter, a set of data to simulate, and corresponding likelihoods. Sampling from the posterior is not immediate: it requires the simulation chain, which start by sampling purely from the prior, to reach equilibrium. Checking that equilibrium is obtained is best achieved, in our opinion, by running multiple independent chains. Hence these computations are very intensive. For a discussion of Markov chain Monte Carlo and convergence issues you should consult the appropriate statistical literature (for example, Bernardo and Smith, 1994; Gelman, 1992; Gelman et al., in press; Gelman et al., 1995; Gelman and Rubin, 1992; Smith, 1991; Smith and Roberts, 1993) (see section Bibliographic References). Technically, MCSim uses Metropolis-Hasting sampling and you do not need to worry about issues of conjugacy or log-concavity of your prior or posterior distributions. Like simple Monte Carlo simulations, MCMC simulations require the use of two specifications, MCMC() and Distrib() and of one special section definition: Level. The syntax for the MCMC() specification is:

 
MCMC("<OutputFilename>", "", "", <nRuns>, 
     <simTypeFlag>, <printFrequency>, <itersToPrint>, <RandomSeed>);

The output filename is a string field and must be enclosed in quotes. If a null-string "" is given, the default name `MCMC.default.out' will be used.

If a restart file name (enclosed in quotes) is given, the first simulations will be read from that file (which must be a text file). This allows you to continue a chain where you left it, since an MCMC output file can be used as a restart file with no change. Note that the first line of the file (which typically contains column headers) is skipped. Also, the number of lines in the file must be less than or equal to nRuns. The first column of the file should be integers, and the following columns (tab- or space-separated) should give the various parameters, in the same order as specified in the list of Distrib() specifications in the input file. The third field is reserved for future use and should just be a pair of empty quotes.

The integer nRuns gives the total number of runs to be performed, including the runs eventually read in the restart file. The next field, simTypeFlag should be either 0, 1, or 2. It should be set at zero to start a chain of MCMC simulations. In that case, parameters are updated by Metropolis steps, one at a time. If the value of simTypeFlag is set to 1 or 2, a restart file must also be specified. In the case of 1, the output file will contain codes for the level sequence, experiment numbers, printing times, data values and the corresponding model predictions, computed using the last parameter vector of the restart file. This is useful to quickly check the model fit to the data. If simTypeFlag is equal to 2, the entire restart file is used to compute the parameters' covariance matrix. All parameters are then updated at once using a multivariate normal kernel as proposal distribution of the Metropolis steps. This results in large improvement in speed. However, we recommend that this option be used only when convergence is approximately obtained (therefore, you should run MCMC simulations with simTypeFlag set to 0 first, up to approximate convergence, and then restart the chain with the flag at 2).

The integer printFrequency should be set to 1 if you want an output at each iteration, to 2 if you want an output at every other iteration etc. itersToPrint is the number of final iterations for which output is required (e.g., 1000 will request output for the last 1000 iterations; to print all iterations just set this parameter to the value of nRuns). Note that if no restart file is used, the first iteration is always printed, regardless of the value of itersToPrint. Finally, the seed of the pseudo-random number generator can be any positive real number. Seeds between 1.0 and 2147483646.0 are used as is, others are rescaled silently within those bounds.

To use the MCMC specification, you must define a statistical model precising each parameter's prior distribution, or conditional distribution (in the case of a hierarchical model), and the data likelihood (i.e., the distribution of observation errors). These distributions must be enclosed in a Level section and are specified with Distrib() statements (see section 6.5 Specifying a statistical model). In the context of MCMC sampling, MCSim provides an extension of the Distrib() specification. First, the first two shape parameters of distributions may depend on other model parameters. For example:

 
Distrib(A, Normal, 0, 1);
Distrib(B, Normal, A, C);

The data distribution is given by a similar statement, which uses the specification Prediction() to differentiate data from their predicted counterparts. The Prediction() specification can be used for the first two shape parameters only (therefore, not for ranges, except in the case of uniform or loguniform distributions). If Prediction() is used for the first shape parameter, the variable enclosed in parentheses must be the same as the variable whose distribution is described. There should be one and only one distribution specified for a given type of data in the whole input file (i.e., you cannot redefine a likelihood; this limitation will hopefully be removed in a future release). Note that only states and outputs can use Prediction() specifications (but you can always define an output to be equal to a parameter or an input in your model file). For example:

 
Distrib(y, TruncNormal, Prediction(y), Prediction(z), -10, 10);

To recapitulate, the extended Distrib() syntax, for use with MCMC simulations is therefore:

 
Distrib(<identifier>, <iType>, [<shape parms>]);

where the first two shape parameters can be Prediction(<identifier>), or any model parameter or numerals, and the last two shape parameters numerals only (this limitation will also be removed in a future release).

If a statement like:

 
Distrib(Var, <iType>, Prediction(<Var>), Prediction(<Other_Var>), ...);

is used, the two variables Var and Other_Var must have identical output times. It is then useful to group them in the same Print() statement.

The other tool MCSim brings you to build a complete statistical model is the Level keyword. The use of this keyword, is described below (see section 6.5 Specifying a statistical model).

Finally, the format of the output file of MCMC simulations is discussed in a later section (see section 6.6 Analyzing results).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003