|
Software
Notes
Useful links
Brief program descriptions
Notes
My programs can all be downloaded in
a single jar file called
genepi.jar.
If you download and add this jar file
to your CLASSPATH, you should be able to run all the programs.
Note that I use the LINKAGE data format for most input files, and for
some output files. Go here for
more information on this format.
New additions (27/10/08) are the programs for performing shared genomic segment
analysis, or IBD mapping. These include the ability to evaluate
statistical significance using models that account for linkage disequilibrium.
These models are estimated from control data using
IntervalLD. Through various model restrictions, this program now works
in time and storage linear in the number of markers. I've used it
to estimate LD models from sets of 60 unrelated individuals genotyped under
the HapMap project for over 200000 loci.
There are also several programs for general data handling that I've added
recently.
Previous additions were, McLink and McLinkLD (25/08/06).
McLink implements an MCMC scheme for linkage
analysis with an option to assume a specified model for
linkage disequilibrium between
the markers. McLinkLD samples from the joint distribution of
LD models and linkage lod scores given the observed pedigree
and genotypes. These programs are provided for users to
experiment with and results from them should be interpreted with
caution. It is not clear under what conditions, if any, the MCMC
methods implemented by these programs result in good mixing
properties and reliable results.
The top level program, which can all be run by typing something like
% java ClassName input1 input2 ...
are described fully in their class descriptions
in the
"Unnamed"
package of the
Javadocs web pages.
All these programs are written in Java
version 1.5.
so you need an appropriate Java virtual machine to run them.
Note that several of the programs are computationally demanding and
may take considerable time to run. If they throw an error indicating
that there was insufficient storage, increase this with the
-Xms and -Xmx options to java.
Useful links
Programs for shared genomic segment analysis
-
SGS
finds regions of heterozygous sharing in sets of genotyped
individuals. For mapping using identity by descent methods in
pedigrees and founder populations.
-
HGS
finds regions at which sets of individuals are homozygous.
Potentially useful for identifying deletions.
-
SimSGS
a program for simulating data to match that observed using
SGS
in order to assess the significance.
Allows modelling of linkage disequilibrium using graphical
models estimated by
IntervalLD .
-
SimHGS
a program for simulating data to match that observed using
HGS
in order to assess the significance.
Allows modelling of linkage disequilibrium using graphical
models estimated by
IntervalLD .
-
MakeProbands
a program that specifies which individuals in a pedigree to
consider are probands. It is sharing between these individuals
that is considered by
SGS
and
HGS.
-
IntervalLD
a variant of the
HapGraph
program that restricts the models allowed to a specific
subset of graphical models with conditional independence
graphs that are interval graphs. This program scales
linearly with the number of loci and can be used with over
100000 loci.
The output can be used by
SimSGS,
SimHGS,
GeneDrop,
and
GeneDrops.
-
GeneDrop
a program to simulate a single instance of genetic data
to match that seen in a pedigree. This uses multi locus
gene drop under linkage equilibrium, or under linkage disequilibrium
using models estimated by
IntervalLD.
-
GeneDrops
a program to simulate a multiple instances of genetic data
to match that seen in a pedigree. This uses multi locus
gene drop under linkage equilibrium, or under linkage disequilibrium
using models estimated by
IntervalLD.
General pedigree analysis utilities
-
CheckFormat
a program that checks the format of LINKAGE parameter and pedigree
input files.
-
CheckParameters
a program that checks the format of LINKAGE paramter files. Basically the
first half of
CheckFormat.
-
cMorgansToTheta
a program that converts interlocus genetic distances from centi Morgans
to recombination fractions.
-
CheckPedigree
a program that checks the format of LINKAGE pedigree files. Basically the
second half of
CheckFormat.
-
CheckTriplets
a program that checks a list of individual, father, mother triplets for
consistency with the usual pedigree rules.
-
CheckErrors
previously called
GMCheck
a program that uses graphical modelling or Bayesian
network methods to calculate the posterior probability of genotype
or phenotype errors in pedigrees.
-
ObligateErrors
like
CheckErrors
but only reports obligate errors, not likely ones.
Needs less space to run than
CheckErrors.
-
DownCodeAlleles
a program that removes alleles unobserved in genotype data from
the specified model for the locus.
-
GeneCountAlleles
a program that implements gene counting, or the EM algorithm,
to obtain maximum likelihood estimates for allele frequencies
from genotypes of related individuals.
-
SelectLoci
a program for selecting subsets of loci from LINKAGE input files.
-
GetPolymorphisms
a program for selecting subsets of loci from LINKAGE input files
that removes loci for which only 1 allele is seen in the data.
-
HetCutOff
a program that selects subsets of loci for which the heterozygisity
score is higher than a specified threshold.
-
Heterozygosities
a program that computes and reports the heterozygosity scores for
the loci in a LINKAGE parameter file.
-
SelectKindreds
a program for selecting subsets of kindreds from LINKAGE input files.
You can probably do the same thing with a grep command.
-
TrimPed
a program to remove individuals from a pedigree if they have insufficient
observed data.
Linkage analysis programs
-
OnePoints
a program for calculating the one point lod score for a locus.
That is, just the likelihood of the data at the locus given
the specified locus parameters.
-
TwoPointLods
a program for calculating simple two point lod scores on
a grid of values for the recombiation parameter.
-
MaxTwoPointLods
a program for finding the maximum lod score. Note that the search
includes values of the recombination fraction between 0.5 and 1.
-
McLink
a program for calculating multi locus linkage statistics in
extended pedigrees using Markov chain Monte Carlo integration.
There is an option to run assuming linkage disequilibrium between
the markers which
can be specified as a model output from HapGraph.
As this is a Markov chain Monte Carlo implementation with
unknown mixing properties it may not give reliable results in
all cases. This program is provided primarily for those
who want to experiment with MCMC pedigree analysis.
-
McLinkLD
a program that combines McLink and HapGraph. This iteratively
updates inheritance states in a pedigree and the graphical model for linkage
disequilibrium giving, in effect, linkage statistics model
averaged over estimated linkage disequilibrium models.
This is very computationally intensive. If you can estimate
a linkage disequilibrium model using HapGraph and input it
to McLink that is probably a more tractable solution.
As this is a Markov chain Monte Carlo implementation with
unknown mixing properties it may not give reliable results in
all cases. This program is provided primarily for those
who want to experiment with MCMC pedigree analysis.
Haplotyping programs
- HapGraph
a program for fitting a graphical model for linkage disequilibrium
to haplotype data, and a general graphical model fitting program.
HapGraph now estimates graphical models from genotype data. It also
estimates haplotype frequencies and reconstructs phase.
- GCHap
a program for calculating maximum likelihood estimates of
haplotype frequencies from a sample of genotyped individuals.
This uses a staged gene counting, or EM, method starting
with a small number of loci and adding one at each stage.
- ApproxGCHap
a program for calculating rough maximum likelihood estimates of
haplotype frequencies from a sample of genotyped individuals.
It is the same as GCHap except that to save time and space,
haplotypes with low frequency are eliminated
at each stage.
-
LinkageToPhase
a program to convert LINKAGE formated data into files suitable
for inputting into the
PHASE
programs.
-
LinkageToFastPhase
a program to convert LINKAGE formated data into files suitable
for inputting into the
FASTPHASE
programs.
Viewing programs
-
ViewGraph
a general program for viewing and editing graphs.
-
ViewPed
a program for viewing pedigrees when the input is in
the form of a standard triplet file.
-
ViewLinkPed
a program for viewing pedigrees when the input is in
the form of a LINKAGE pedigree file.
|