power.HE

power.HE is designed to calculate sample size and power for the Haseman-Elston method in linkage analyses for a quantitative trait.

power_HE_v1-2.r

GEESIZE

GEESIZE version 3.1 is designed to compute the minimum sample size in studies with correlated response data based on generalized estimating equations (GEE). These correlated response data arise e.g. in repeated measurement designs, family studies or studies involving paired organs like ophtalmological studies.

GEESIZE is a SAS macro using SAS IML which has to be used within a SAS programm. Thus, the SAS IML modul has to be licensed.

The program is based on the following publications:
Rochon, J. (1998)Application of GEE procedures for sample size calculations in repeated measures Stat Med, 17, 1643-1658

Dahmen, G., Rochon, J., König, I. R., Ziegler, A. (2004), Sample size calculations for controlled clinical trials using generalized estimating equations (GEE) Methods Inf Med, 43(5), 451-6

The user might also be interested in:

Dahmen, G., Ziegler, A. (2004), Generalized estimating equations in controlled clinical trials: Hypotheses testing Biom J, 46, 214-232

Dahmen, G., Ziegler, A. (2006), Independence Estimating Equations for Controlled Clinical Trials with Small Sample Sizes Methods Inf Med, 45, 430-4

The documentation file gives an instruction to the use of the macro.

The output comprised the minimal sample size required in each treatment group under the predefined parameter setting. A detailed definition of the output can be find in the documentation file.

GEESIZE SAS Macro
GEESIZE User Documentation
GEESIZE Examples

Copyright: Prof. Dr. Andreas Ziegler

silcLOD

silcLOD (significance levels and critical LODs) is designed to calculate nominal significance levels and critical LOD scores depending on the length of the investigated region, number of chromosomes, and the cross-over rate. The global significance level as well as the precision of the calculation have to be specified.

The program is based on the following publication:
Lander, E., Kruglyak, L. (1995) Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results.Nature Genetics, 11, 241-247.

silcLOD is started by typing "silcLOD".

Within the program, you are prompted to specify the following parameters:

  • Length of genomic region in Morgan: Length of the investigated region. The default is given by 33 (length of the human genome).
  • Number of chromosomes: Number of investigated chromosomes. The default is given by 23 (total number of human chromosomes).
  • Cross over rate: Total crossing over rate between the genotypes. For different mapping methods, the values for humans are given below (according to Lander and Kruglyak, 1995, Table 1). The default is given by 2.

    Mapping method Cross over rate
    Lod score analysis 1
    Allele sharing in sibs and half-sibs 2
    Allele sharing in grandparent-grandchildren 1
    Allele sharing in uncle-nephew 5/2
    Allele sharing in first cousin 8/3
    Allele sharing in first cousin, once removed 20/7
    Allele sharing in second cousin 16/5

  • Global significance level: Desired global signficance level for the investigation. The default is given by 0.05.
  • Precision: The precision sets the maximally allowed difference between the specified and the calculated global significance level. The default is given by 0.00000001.

In any stage, entering "?" gives help for specifying the parameters. The output can be saved or presented on screen only. The results render the nominal alpha for a single marker using an infinitely dense marker map as well as the critical LOD scores for single markers using an infinitely dense marker map or maps assuming distances of 10cM, 5cM, 2cM, or 1cM.

SilcLOD
SilcLOD - Documentation

Copyright: Andreas Ziegler
Contact: Inke.Koenig[at]imbs.uni-luebeck.de

GroupSeq

is designed to calculate sequential boundaries in R with extended functionalites compared with the FORTRAN program by Reboussin et al. (2000, Controlled Clinical Trials, 21: 190-207).

It is available from CRAN under http://cran.r-project.org/web/packages/GroupSeq/index.html.

Contact: Inke.Koenig[at]imbs.uni-luebeck.de

minsage

minsage (mininmal sample size for genotypes) is designed to calculate the sample size of genotypes minimally required to ensure that all alleles with a specified frequency at one locus are detected with a given confidence.

The program is based on the following publication:
Gregorius, H.-G. (1980) The probability of losing an allele when diploid genotypes are sampled. Biometrics, 36, 643-652.

minsage is started by typing "minsage".

Within the program, you are prompted to specify the following parameters:

  • allele frequency: minimum allele frequency a that is to be detected
  • confidence: confidence for detecting the allele
  • uniformly distributed alleles or biallelic markers: The allele can be set to be the less frequent allele of a biallelic marker. Otherwise, if neither the number of alleles nor the genotypic frequencies are known, alleles can set to be uniformly distributed.

The output renders the minimal sample size N of genotypes needed to detect alleles of frequency a with the specified confidence. The results are given both for the case that Hardy- Weinberg equilibrium can be or cannot be assumed.

minsage
minsage - User Documentation

Copyright: Andreas Ziegler
Contact: Inke.Koenig[at]imbs.uni-luebeck.de

abi2link

abi2link is designed to create linkage files out of ABI genotype and phenotype files. Please see example directory for a detailed file description.

Usage:/abi2link ARGUMENTS

Currently known arguments:

--map <haldane|kosambi> locus mapping function
--ped <file> pedigree file
--chr <file> chromosome description file
--trait <file> trait file (optional)
--estimate <all|founder> estimate allele frequencies from all individuals or from founders only (optional, default: all)
--prefix <name> output file prefix (optional, default: abi2link)
-v, --version print version information and exit
-h, --help print this text and exit

abi2link
abi2link - User Documentation

Copyright: Andreas Ziegler

EECI

EECI (effect estimates confidence intervals) is an Excel tool for estimating confidence intervals for a number of epidemiological effect measures.

Download EECI

The program is based on the following publication: Ziegler, A. and König, I. R. (2010): A Statistical Approach to Genetic Epidemiology: Concepts and Applications. Second edition. Wiley-VCH: Weinheim.

Microsoft Office 2007 is required for using this tool. Only the bold numbers can be modified by the user.

metaxa

Under construction.

metaxa (META-analysis with eXAct weights). An R-package for calculating meta-analyses which incorporates the variability of τ².

Further details readme.txt

ranger

ranger is a fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression and survival trees are supported. Ranger is available as R package or as a pure C++ version. 

The R package is available on CRAN. To install, just use

install.packages("ranger")

See also the project page at GitHub: github.com/mnwright/ranger

Random Jungle

News - Jan 07, 2015 

Random Jungle has been superseded by the new software package ranger.

The source code of Random Jungle is available again (see below). Unfortunately we cannot offer any support for Random Jungle anymore. 

We strongly encourage you to use the new software package ranger!

Download
Source code (Build 2.1.0)
Ubuntu Linux 64 Bit (Build 2.1.0)
Ubuntu Linux 64 Bit, Sparse version (Build 2.1.0)
Ubuntu Linux 64 Bit, MPI version (Build 2.1.0)
Ubuntu Linux 64 Bit, Sparse MPI version (Build 2.1.0)
CentOS 64 Bit (Build 2.1.0)
CentOS 64 Bit, MPI version (Build 2.1.0)
Windows 64 Bit (Build 2.1.0)
 
Support and help
Manual

Schwarz. D (2010); On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data;Bioinformatics (2010) 26 (14): 1752-1758
 
Frequently Asked Questions (FAQ)
I have problems with some libraries like libxml2 or boost!
Random Jungle is build with the newest stable release of all libraries. Sometimes some linux distribution do not offer the newest packages. Therefore your admin might help you. We do not offer alternative compiled Random Jungle versions with different package version.
 
I am using the Windows version and get the error: Cygwin1.dll 'not found'!
You must install the cygwin1.dll. Please read more here: Cygwin1.dll 'not found' by stackoverflow
 
Literature
Schwarz. D. F., König. I.R. and Ziegler A. (2010); On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data, Bioinformatics 26 (14): 1752-1758.

Malley J.D., Kruppa J., Dasgupta A., Malley K.G. and Ziegler A. (2011) Probability Machines. Consistent Probability Estimation Using Nonparametric Learning Machines. Methods Inf Med 50(5).

Kruppa J, Ziegler A. and König I.R. (2012) Risk estimation and risk prediction using machine-learning methods. Hum Genet 131(10):1639-54.

Support by sekretariat[at]imbs-luebeck.de