FACILITIES PROVIDED BY THIS SOFTWARE
This software is meant to support research and education regarding:
* Flexible Bayesian models for regression and classification
based on neural networks and Gaussian processes, and for
probability density estimation using mixtures and Dirichlet
diffusion trees. Neural network training using early stopping
is also supported.
* Markov chain Monte Carlo methods, and their applications to
Bayesian modeling, including implementations of Metropolis,
hybrid Monte Carlo, slice sampling, and tempering methods.
These facilities might be useful for actual problems, but you should
note that many features that might be needed for real problems have
not been implemented, that the programs have not been tested to the
extent that would be desirable for important applications.
The complete source code (in C) is provided, allowing researchers to
modify the program to test new ideas. It is not necessary to know C
to use the programs (assuming you manage to install them correctly).
This software is designed for use on a Unix or Linux system, using
commands issued to the command interpreter (shell). No particular
window system or other GUI is required, but a plotting program will be
very useful. I use the xgraph plot program, written by David
Harrison, which allows plots to be produced by just piping data from
one of the commands; it can be obtained from my web page.
Markov chain Monte Carlo facilities.
All the Bayesian models are implemented using Markov chains to sample
from the posterior distribution. For the elaborate models based on
neural networks, Gaussian processes, mixtures, and Dirichlet diffusion
trees, this is done by combining general-purpose Markov chain sampling
procedures with special modules written in C. Other models could be
implemented in the same way, but this is a fairly major project.
To allow people to play around with the various Markov chain methods
more easily, a facility is provided for defining distributions (on
R^n) by giving a simple formula for the probability density. Many
Markov chain sampling methods, such as the Metropolis algorithm,
hybrid Monte Carlo, slice sampling, simulated tempering, and annealed
importance sampling may then be used to sample from this distribution.
Bayesian posterior distributions can be defined by giving a formula
for the prior density and for the likelihood based on each of the
cases (which are assumed to be independent).
A long review paper of mine on "Probabilistic Inference Using Markov
Chain Monte Carlo Methods" can be obtained from my web page. This
review discusses methods based on Hamiltonian dynamics, including the
"hybrid Monte Carlo" method. These methods are also discussed in my
book on "Bayesian Learning for Neural Networks". My web page also has
papers on slice sampling ("Markov chain Monte Carlo methods based on
`slicing' the density function" and "Slice sampling"), Annealed
Importance Sampling, and circularly-coupled Markov chain sampling, all
of which are implemented in this software.
Neural network and Gaussian process models.
The neural network models are described in my thesis, "Bayesian
Learning for Neural Networks", which has now been published by
Springer-Verlag (ISBN 0-387-94724-8). The neural network models
implemented are essentially as described in the Appendix of this book.
The Gaussian process models are in many ways analogous to the network
models. The Gaussian process models implemented in this software, and
computational methods that used, are described in my technical report
entitled "Monte Carlo implementation of Gaussian process models for
Bayesian regression and classification", available from my web page,
and in my Valencia conference paper on "Regression and classification
using Gaussian process priors", in Bayesian Statistics 6. The
Gaussian process models for regression are similar to those evaluated
by Carl Rasmussen in his thesis, "Evaluation of Gaussian Processes and
other Methods for Non-Linear Regression", available from his web page,
at the URL http://www.cs.utoronto.ca/~carl/; he also talks about
neural network models. To understand how to use the software
implementing these models, it is essential for you to have read at
least one of these references.
The neural network software supports Bayesian learning for regression
problems, classification problems, and survival analysis, using models
based on networks with any number of hidden layers, with a wide
variety of prior distributions for network parameters and
hyperparameters. The Gaussian process software supports regression
and classification models that are similar to neural network models
with an infinite number of hidden units, using Gaussian priors.
The advantages of Bayesian learning for both types of model include
the automatic determination of "regularization" hyperparameters,
without the need for a validation set, the avoidance of overfitting
when using large networks, and the quantification of uncertainty in
predictions. The software implements the Automatic Relevance
Determination (ARD) approach to handling inputs that may turn out to
be irrelevant (developed with David MacKay).
For problems and networks of moderate size (eg, 200 training cases, 10
inputs, 20 hidden units), fully training a neural network model (to
the point where one can be reasonably sure that the correct Bayesian
answer has been found) typically takes up to several hours on a modern
personal computer. However, quite good results, competitive with
other methods, are often obtained after training for just a few
minutes. The time required to train the Gaussian process models
depends a lot on the number of training cases. For 100 cases, these
models may take only a few minutes to train (again, to the point where
one can be reasonably sure that convergence to the correct answer has
occurred). For 1000 cases, however, training might well take a day.
The software also implements neural network training using early
stopping, as described in my paper on "Assessing relevance
determination methods using DELVE", in Neural Networks and Machine
Learning, C. M. Bishop, editor, Springer-Verlag, 1998. A similar
early stopping method is also described in Carl Rasmussen's thesis
(see above).
Bayesian mixture models and Dirichlet diffusion trees.
The software implements Bayesian mixture models for multivariate real
or binary data, with both finite and countably infinite numbers of
components. The countably infinite mixture models are equivalent to
Dirichlet process mixture models. The sampling methods that I have
implemented for these models are described in my technical report on
"Markov chain sampling methods for Dirichlet process mixture models",
which can be obtained from web page; see also my technical report on
"Bayesian mixture modeling by Monte Carlo simulation".
The software also implements models based on Dirichlet diffusion
trees, described in my technical report on "Defining priors for
distributions using Dirichlet diffusion trees", and in my Valencia
conference paper on "Density modeling and clustering using Dirichlet
diffusion trees", in the Bayesian Statistics 7. Dirichlet diffusion
trees can be seen both as a way of modeling distributions and as a
method for hierarchical clustering.
Software components.
The software consists of a number of programs and modules. Each major
component has its own directory, as follows:
util Modules and programs of general utility.
mc Modules and programs that support sampling using Markov
chain Monte Carlo methods, using modules from util.
dist Programs for doing Markov chain sampling on a distribution
given by a simple formula, or by giving a Bayesian prior
and likelihood, using the modules from util and mc.
net Modules and programs that implement Bayesian inference
for models based on multilayer perceptron neural networks,
using the modules from util and mc. Also implements simple
gradient descent training, possibly with early stopping.
gp Modules and programs that implement Bayesian inference
for models based on Gaussian processes, using the modules
from util and mc.
mix Modules and programs that implement Bayesian inference
for finite and infinite mixture models, using modules
from util and mc.
dft Modules and programs that implement density modeling and
clustering methods based on Dirichlet diffusion trees.
In addition, the 'bvg' directory contains modules and programs for
sampling from a bivariate Gaussian distribution, as a simple
demonstration of how the Markov chain Monte Carlo facilities can be
used from a special module written in C. Other than by providing this
example, and the detailed documentation on various commands, I have
not attempted to document how you might go about using the Markov
chain Monte Carlo modules for another application written in C.
The following directories contain examples of how these programs can
be used, many of which are discussed in the documentation:
ex-netgp Examples of Bayesian regression and classification
models based on neural networks and Gaussian processes.
ex-mixdft Examples of Bayesian mixture models and Dirichlet diffusion
tree models. Includes command files for the test in my
paper on "Markov chain sampling methods for Dirichlet process
mixture models" and for one of the tests in "Density modeling
and clustering using Dirichlet diffusion trees".
ex-dist Examples of Markov chain sampling on distributions
specified by simple formulas.
ex-bayes Examples of Markov chain sampling for Bayesian models
specified using formulas for the prior and likelihood.
ex-surv Examples of neural network survival models.
ex-gdes Examples of neural network learning using gradient
descent and early stopping.
ex-ais Contains command and data files used for the tests in
my paper on "Annealed importance sampling".
You should note, however, that these examples do not constitute
"recipes" that can be used unchanged for new problems. They are
intended to help you understand the models, priors, and computational
methods, so that you can devise an appropriate way of handling
whatever problem you are interested in.
The 'bin' directory contains links to all the programs. The 'doc'
directory contains all the documentation.
Portability of the software.
The software is written in ANSI C, and is meant to be run in a UNIX
environment. It is known to work on various Linux systems running on
a Pentium processor, on a Solaris system running on a SPARC processor,
and on an SGI machine running IRIX Release 6.5. It also runs OK on
Compaq Alpha machines, provided that the -ieee and -std options are
given to the Compaq C compiler (this may no longer be necessary - I
haven't checked recently). As far as I know, the software does not
depend on any peculiarities of these environments (except perhaps for
the use of the drand48 pseudo-random number generator, and the lgamma
function, and the Unix facilities used to start programs to evaluate
"external" functions in the 'dist' module), but you may nevertheless
have problems getting it to work in substantially different
environments, and I can offer little or no assistance in this regard.
There is no dependence on any particular graphics package or graphical
user interface. The 'xxx-plt' programs are designed to allow their
output to be piped directly into the 'xgraph' plotting program, but
other plotting programs can be used instead, or the numbers can be
examined directly. The 'xxx-tbl' programs output the same information
in a different format, which is useful when plotting or analysing data
with S-Plus or R, since this format is convenient for the S-Plus or R
read.table command.