www.delorie.com/gnu/docs/mcsim/mcsim_34.html | search |
Buy GNU books! | |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
MCMC()
specification
Markov chain Monte Carlo (MCMC) simulations, used in a Bayesian context,
allow the user to specify a statistical model (eventually hierarchical)
and sample parameters from their joint posterior distribution, given a
prior distribution for each parameter, a set of data to simulate, and
corresponding likelihoods. Sampling from the posterior is not immediate:
it requires the simulation chain, which start by sampling purely from
the prior, to reach equilibrium. Checking that equilibrium is obtained
is best achieved, in our opinion, by running multiple independent
chains. Hence these computations are very intensive. For a discussion of
Markov chain Monte Carlo and convergence issues you should consult the
appropriate statistical literature (for example, Bernardo and Smith,
1994; Gelman, 1992; Gelman et al., in press; Gelman et al., 1995; Gelman
and Rubin, 1992; Smith, 1991; Smith and Roberts, 1993)
(see section Bibliographic References). Technically, MCSim uses
Metropolis-Hasting sampling and you do not need to worry about issues of
conjugacy or log-concavity of your prior or posterior distributions.
Like simple Monte Carlo simulations, MCMC simulations require the use of
two specifications, MCMC() and Distrib()
and of one special
section definition: Level. The syntax for the MCMC()
specification is:
MCMC("<OutputFilename>", " |
The output filename is a string field and must be enclosed in quotes. If a null-string "" is given, the default name `MCMC.default.out' will be used.
If a restart file name (enclosed in quotes) is given, the first
simulations will be read from that file (which must be a text file).
This allows you to continue a chain where you left it, since an MCMC
output file can be used as a restart file with no change. Note that the
first line of the file (which typically contains column headers) is
skipped. Also, the number of lines in the file must be less than or
equal to nRuns. The first column of the file should be integers,
and the following columns (tab- or space-separated) should give the
various parameters, in the same order as specified in the list of
Distrib()
specifications in the input file. The third field is
reserved for future use and should just be a pair of empty
quotes.
The integer nRuns gives the total number of runs to be performed, including the runs eventually read in the restart file. The next field, simTypeFlag should be either 0, 1, or 2. It should be set at zero to start a chain of MCMC simulations. In that case, parameters are updated by Metropolis steps, one at a time. If the value of simTypeFlag is set to 1 or 2, a restart file must also be specified. In the case of 1, the output file will contain codes for the level sequence, experiment numbers, printing times, data values and the corresponding model predictions, computed using the last parameter vector of the restart file. This is useful to quickly check the model fit to the data. If simTypeFlag is equal to 2, the entire restart file is used to compute the parameters' covariance matrix. All parameters are then updated at once using a multivariate normal kernel as proposal distribution of the Metropolis steps. This results in large improvement in speed. However, we recommend that this option be used only when convergence is approximately obtained (therefore, you should run MCMC simulations with simTypeFlag set to 0 first, up to approximate convergence, and then restart the chain with the flag at 2).
The integer printFrequency should be set to 1 if you want an output at each iteration, to 2 if you want an output at every other iteration etc. itersToPrint is the number of final iterations for which output is required (e.g., 1000 will request output for the last 1000 iterations; to print all iterations just set this parameter to the value of nRuns). Note that if no restart file is used, the first iteration is always printed, regardless of the value of itersToPrint. Finally, the seed of the pseudo-random number generator can be any positive real number. Seeds between 1.0 and 2147483646.0 are used as is, others are rescaled silently within those bounds.
To use the MCMC specification, you must define a statistical model
precising each parameter's prior distribution, or conditional
distribution (in the case of a hierarchical model), and the data
likelihood (i.e., the distribution of observation errors). These
distributions must be enclosed in a Level
section and are
specified with Distrib()
statements (see section 6.5 Specifying a statistical model). In the context of MCMC sampling, MCSim provides an
extension of the Distrib()
specification. First, the first two
shape parameters of distributions may depend on other model parameters.
For example:
Distrib(A, Normal, 0, 1); Distrib(B, Normal, A, C); |
The data distribution is given by a similar statement, which uses the
specification Prediction() to differentiate data from their
predicted counterparts. The Prediction()
specification can be
used for the first two shape parameters only (therefore, not for ranges,
except in the case of uniform or loguniform distributions). If
Prediction()
is used for the first shape parameter, the variable
enclosed in parentheses must be the same as the variable whose
distribution is described. There should be one and only one distribution
specified for a given type of data in the whole input file (i.e.,
you cannot redefine a likelihood; this limitation will hopefully be
removed in a future release). Note that only states and outputs can use
Prediction()
specifications (but you can always define an output
to be equal to a parameter or an input in your model file). For
example:
Distrib(y, TruncNormal, Prediction(y), Prediction(z), -10, 10); |
To recapitulate, the extended Distrib()
syntax, for use with MCMC
simulations is therefore:
Distrib(<identifier>, <iType>, [<shape parms>]); |
where the first two shape parameters can be Prediction(<identifier>), or any model parameter or numerals, and the last two shape parameters numerals only (this limitation will also be removed in a future release).
If a statement like:
Distrib(Var, <iType>, Prediction(<Var>), Prediction(<Other_Var>), ...); |
is used, the two variables Var and Other_Var must have
identical output times. It is then useful to group them in the same
Print()
statement.
The other tool MCSim brings you to build a complete statistical model is
the Level
keyword. The use of this keyword, is described below
(see section 6.5 Specifying a statistical model).
Finally, the format of the output file of MCMC simulations is discussed in a later section (see section 6.6 Analyzing results).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
webmaster | delorie software privacy |
Copyright © 2003 by The Free Software Foundation | Updated Jun 2003 |