Software



I have reorganized my programs so that they can now all be downloaded in a single jar file. If you download and add this jar file to your CLASSPATH, you should be able to run all the programs.

Note that I use the LINKAGE data format for most input files, and for some output files. Go here for more information on this format.

Two new programs, McLink and McLinkLD, have recently been added (25/08/06). McLink implements an MCMC scheme for linkage analysis with an option to assume a specified model for linkage disequilibrium between the markers. McLinkLD samples from the joint distribution of LD models and linkage lod scores given the observed pedigree and genotypes. These programs are provided for users to experiment with and results from them should be interpreted with caution. It is not clear under what conditions, if any, the MCMC methods implemented by these programs result in good mixing properties and reliable results.


General pedigree analysis utilities
  • CheckFormat

    a program that checks the format of LINKAGE parameter and pedigree input files.

  • CheckErrors previously called GMCheck

    a program that uses graphical modelling or Bayesian network methods to calculate the posterior probability of genotype or phenotype errors in pedigrees.

  • DownCodeAlleles

    a program that removes alleles unobserved in genotype data from the specified model for the locus.

  • GeneCountAlleles

    a program that implements gene counting, or the EM algorithm, to obtain maximum likelihood estimates for allele frequencies from genotypes of related individuals.

  • SelectLoci

    a program for selecting subsets of loci from LINKAGE input files.

  • SelectKindreds

    a program for selecting subsets of kindreds from LINKAGE input files. You can probably do the same thing with a grep command.

  • TrimPed

    a program to remove individuals from a pedigree if they have insufficient observed data.


Linkage analysis programs
  • TwoPointLods

    a program for calculating simple two point lod scores on a grid of values for the recombiation parameter.

  • MaxTwoPointLods

    a program for finding the maximum lod score. Note that the search includes values of the recombination fraction between 0.5 and 1.

  • McLink

    a program for calculating multi locus linkage statistics in extended pedigrees using Markov chain Monte Carlo integration. There is an option to run assuming linkage disequilibrium between the markers which can be specified as a model output from HapGraph. As this is a Markov chain Monte Carlo implementation with unknown mixing properties it may not give reliable results in all cases. This program is provided primarily for those who want to experiment with MCMC pedigree analysis.

  • McLinkLD

    a program that combines McLink and HapGraph. This iteratively updates inheritance states in a pedigree and the graphical model for linkage disequilibrium giving, in effect, linkage statistics model averaged over estimated linkage disequilibrium models. This is very computationally intensive. If you can estimate a linkage disequilibrium model using HapGraph and input it to McLink that is probably a more tractable solution. As this is a Markov chain Monte Carlo implementation with unknown mixing properties it may not give reliable results in all cases. This program is provided primarily for those who want to experiment with MCMC pedigree analysis.


Haplotyping programs
  • HapGraph

    a program for fitting a graphical model for linkage disequilibrium to haplotype data, and a general graphical model fitting program. HapGraph now estimates graphical models from genotype data. It also estimates haplotype frequencies and reconstructs phase.

  • HaploFreqs

    a program for listing the haplotypes and their frequencies according to a graphical model for linkage disequilibrium.

  • GCHap

    a program for calculating maximum likelihood estimates of haplotype frequencies from a sample of genotyped individuals. This uses a staged gene counting, or EM, method starting with a small number of loci and adding one at each stage.

  • ApproxGCHap

    a program for calculating rough maximum likelihood estimates of haplotype frequencies from a sample of genotyped individuals. It is the same as GCHap except that to save time and space, haplotypes with low frequency are eliminated at each stage.


Viewing programs
  • ViewGraph

    a general program for viewing and editing graphs.

  • ViewPed

    a program for viewing pedigrees when the input is in the form of a standard triplet file.

  • ViewLinkPed

    a program for viewing pedigrees when the input is in the form of a LINKAGE pedigree file.






The top level program, which can all be run by typing something like

    % java ClassName input1 input2 ...
are described fully in their class descriptions in the "Unnamed" package of the Javadocs web pages.

The new programs are written in Java version 1.5. so you need an appropriate Java virtual machine to run them.

Note that several of the programs are computationally demanding and may take considerable time to run. If they throw an error indicating that there was insufficient storage, increase this with the -Xms and -Xmx options to java.


Links