______________________________________________________________________________________
METAINTER
–
Metaanalysis tool for multiple regression models in
genomewide association studies allowing for interaction
______________________________________________________________________________________
Tim Becker
Dmitriy Drichel
Christine Herold
André Lacour
Vitalia Schüller
Tatsiana Vaitsiakhovich
German Center for Neurodegenerative Diseases (DZNE), Bonn
Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn
Bonn
May 5, 2014
METAINTER is a standalone software written in C/C++ to perform metaanalysis of summary statistics obtained from a series of related studies. The special feature of METAINTER is the ability to metaanalyze the results of multiple linear and logistic regression models, broadly used in genomewide association studies (GWAS).
It is assumed that a unique predefined model is used in multiple studies to test for association of SNP tuples with a particular phenotype. SNP coding and parameter coding in the regression models have to follow the standards specified in (Cordell and Clayton, 2002), see also Section 3.1. As input, the analysis results of the individual studies have to be provided in tabulated format. METAINTER supports the output format of the genetic interaction analysis software INTERSNP (http://intersnp.meb.unibonn.de/) as well as any freely defined format, Section 3.3.5.
The main metaanalysis method implemented in METAINTER is the method of the synthesis of regression slopes (MSRS), Section 2.4, suggested in (Becker and Wu, 2007). MSRS requires, in addition to model parameters estimates and their standard error, the availability of the covariance matrix. The covariance matrix of model parameters is provided, for instance, by INTERSNP tool (Herold et al., 2009). Note that in case of tests with just one parameter, MSRS is equivalent to the standard fixed effects metaanalysis method. Within MSRS framework, METAINTER can be used to test the homogeneity of studies results, and to obtain the common parameter estimates of multiple regression models in the joint sample.
Since the covariance matrix of model parameters is not always available, METAINTER provides three further metaanalysis methods: the Fisher’s method, the Stouffer’s method with weights, the Stouffer’s method with weights and effect directions, see Chapter 2. Thereby, METAINTER enables metaanalysis of singlemarker association tests, global haplotype tests and tests for and under genegene interaction.
There are four metaanalysis methods implemented in METAINTER:
The first two methods are based on combining pvalues of individual studies participating in metaanalysis and can be applied for summarizing the results of any association test. The Stouffer’s method with weights and effect directions represents a pvalue combination approach, where the consistency of effect directions across the studies is involved, and can be used to metaanalyze the results of multiple regression models. The fourth method, based on multivariate generalized least squares estimation, can be applied to synthesize the results of multiple regression models. MSRS involves model parameter estimates and their correlation and provides the overall metaanalysis pvalues together with metaanalytic estimates of the regression slopes.
Assume that Study 1, … , Study k were conducted to test a particular hypothesis H_{1} versus the null H_{0}. Let p_{j} be a pvalue, n_{j} be a sample size and w_{j} be a study specific weight of Study j, j = 1,…,k.
The test statistic of the Fisher’s method (Fisher, 1932) has the form
According to the Stouffer’s method with weights (Stouffer et al., 1949), (Lipták, 1959), a combined pvalue can be found as
In case of multiple regression models the Stouffer’s method with weights can be modified in order to include the information on the consistency of effect directions across the studies.
Assume that the same regression model is used in all studies, and consider two of them, Study 1 and Study 2. Let exemplarily a model equation be

(2.1) 
and assume that it is tested versus logitY = β_{0}. To compare the effect directions between two studies, we suggest the following criterion:
Two studies are said to have the same effect directions, if and only if the dot product of two vectors
In case, when the model equation (2.1) is tested versus logitY = β_{0} + β_{1}X_{1} + … + β_{S}X_{S}, 1 ≤ S < P, we have to include into consideration the predictor estimates _{} with j = 1,2 and l = S + 1,…,P only.
According to the Stouffer’s method with weights and effect directions a combined pvalue can be found as
We describe the method of synthesis of regression slopes for a multiple linear regression model (Becker and Wu, 2007). The method works analogously for a multiple logistic regression model.
Assume that in k studies a multiple linear regression model

(2.2) 
was tested versus Y = β_{0} to find the effect of P predictors on the outcome variable Y. The aim of the metaanalysis is to combine the results of k studies to obtain the overall pvalue and the common estimate of the slope vector
There are several tests available:
In general situation, where the model equation (2.2) is tested versus

(2.3) 
in each study, 1 ≤ S < P, the method can be modified by including predictor slopes β_{S+1},…,β_{P } in β, b, Σ and thus reducing the dimension of W. Then a test of model fit
To perform the metaanalysis with MSRS, METAINTER requires the input of the elements of the covariance matrix _{j} from each Study j, j = 1,…,k. We underline the corresponding elements in the matrices below.
We summarize the metaanalysis methods implemented in METAINTER in the table below. Note that we refer here to the regression models (2.1), (2.2) that are tested versus logitY = β_{0} or Y = β_{0}, respectively.




Method 
Primary model 
Inputs 
Outputs 




Fisher’s method 
arbitrary 
p_{j} 
p_{*} 
Stouffer’s method with weights 
arbitrary 
p_{j}, w_{j} 
p_{*} 
Stouffer’s method with weights and effect directions 
regression 
p_{j}, w_{j}, _{}, _{} 
p_{*} 
Method of synthesis of regression slopes 
regression 
, 
p_{*}^{}, , _{}^{}_{}^{},^{*} 








p_{j} is a pvalue of Study j;
w_{j} is a weight of Study j, e.g. square root of the sample size, _{} ;
_{} is an estimate of slope β_{i} in Study j;
is a standard error of an estimate of the slope β_{i} in Study j;
_{j} is an estimate of the covariance matrix Σ_{j} in Study j;
p_{*} is a metaanalysis pvalue;
_{}^{} is a metaanalysis estimate of the slope β_{i};
_{}^{} is a standard error of _{}^{};^{*} is a metaanalysis estimate of the covariance matrix = diag;
i = 0,…,P, j = 1,…,k.
In this chapter we describe the prerequisites needed to run METAINTER. We start with SNP and model parameters coding in individual studies. Then we discuss some technical issues, the user have to be aware of during the preparation of the input files. After that we present a list of options available in METAINTER, and describe how to create a configuration file. We continue with the description of the METAINTER output files. At the end of the chapter, we give three examples.
METAINTER is written in C/C++ and can be operated from the command line.
Compilation: g++ metainter.cpp o metainter lm O3.
Run: ./metainter configurationfile.txt
Consider basic models interpreting different genetic effects of genotypes AA, Aa and aa at a single locus in genome on a given phenotype, where A is a susceptibility allele.
To model genetic effects on a quantitative trait a standard linear regression equation for an outcome variable y is used:

(3.1) 
where x, x_{D} are two genetic predictor variables corresponding to the additive and the dominance effects of a SNP and coded according to the number of copies of the susceptibility allele, e.g. 2,1,0. There are different ways to define values of the predictor variables (coding scheme). We work with the coding scheme (Cordell and Clayton, 2002)
The coefficients β_{0}, β, β_{D} ∈ ℝ are real numbers, β represents a magnitude of the additive effect defined as a half of the difference between two homozygote genotypic values, β_{D} is a magnitude of the dominance effect defined as the difference between the heterozygote genotypic value and the intercept parameter β_{0}, having a particular biological meaning depending on the choice of the coding scheme. All three coefficients have to be estimated.
To model genetic effects on a qualitative trait in e.g. casecontrol studies a logistic regression model is used. Let p denote the probability of expressing a phenotype, and let 0 < p < 1. In the logistic regression the logarithm of odds log =: logitp is modeled as

(3.2) 
with the same predictor variables as before and with the coefficients β, β_{D} ∈ ℝ presenting the logarithm of the odds ratios (or genotype relative risks) that describe association between disease and genotypes.
The linear and logistic regression models (3.1), (3.2) for a single locus can easily be adjusted to the twolocus models. Moreover, they can be modified for modeling pairwise statistical interaction. An extension of (3.2) to a model allowing for pairwise interaction is represented by a logistic regression equation with interaction terms:

(3.3) 
This equation models the additive x_{i}, and dominance effects x_{iD}, i = 1,2, at two loci as well as interaction effects between them; x_{1} = 1,0,1 and x_{1D} = 0.5,0.5,0.5 for the genotypes AA, Aa, aa; x_{2} = 1,0,1 and x_{2D} = 0.5,0.5,0.5 for the genotypes BB, Bb, bb, respectively, where A and B are susceptibility alleles. The coefficients β_{i}, β_{iD} ∈ ℝ, represent the logarithm of the genotype relative risks at the locus i, i = 1,2. The coefficients γ_{12}, γ_{1,2D}, γ_{1D,2}, γ_{1D,2D} ∈ ℝ reflect the magnitude of the interaction effects. In the same manner the linear regression equation with interaction terms can be defined as a generalization of (3.1). For more details on linear and logistic regression models used in Genetic Epidemiology see (Cordell and Clayton, 2002)
METAINTER currently does not allow different order in listing SNPs across studies. For example, if a SNP pair is specified as (rs1,rs2) in study A, but is given in the order (rs2,rs1) in study B, the pair will not be metaanalyzed. Note that INTERSNP output tuples are typically ordered by the genomic location and the issue of different orders will not occur with INTERSNP files (unless the usage of different genome builds caused flips in the genomic order). For other input file formats, the lines affected by the inconsistent ordering have to be reedited. Note that it is not sufficient to flip the columns with the respective SNP names. Changing the SNPs order has an impact on the sign of parameter estimates and the entries of the covariance matrix. The sign changes depending on the parameter type (additive or dominance variation term), and readjustment can become tricky when more than two SNPs are involved.
METAINTER takes care of the varying allele references across the studies. Suppose that for a C/T SNP the alleles in input file are given in the order C/T for Study 1, but given in the order T/C for Study 2. In this case all parameter estimates of logadditive type that depend on the SNP in the underlying regression model in Study 2 will be multiplied by 1 to unify the reference. The procedure will be repeated for all SNPs of the tuple under consideration with diverging reference. In addition, all entries of the covariance matrix (if available) that depend on logadditive terms of the SNP will be multiplied by 1. Note that the diagonal elements of the covariance matrix depend twice on the same parameter. Therefore, the diagonal elements remain unchanged, which is selfunderstood as these elements of the covariance matrix are just the variances of the model parameters. For parameters of entire dominance variation type no modification is needed. Note that allele reference is automatically handled by INTERSNP.
METAINTER also attempts to solve strand flips. If a SNP is given as C/T polymorphism in Study 1, and as G/A polymorphism in Study 2, METAINTER assumes that C↔G and T↔A. If the alleles of Study 2 occur in the order A/G instead, the SNP will undergo in addition the procedure described in the previous paragraph. C/G polymorphism, of course, will not be flipped by METAINTER. For such polymorphisms, strand consistency across studies has to be established prior to analysis.
Parameters and options of the metaanalysis have to be specified in a configuration file.
METAINTER uses two top level keywords:
GENERAL 
This obligatory keyword indicates that all the following lines of a configuration file, until the occurrence of the other top level keyword NEW_STUDY, specify either general options or options for all studies. It is obligatory to specify the keywords METHOD and OUTPUT under GENERAL. 


NEW_STUDY 
This obligatory keyword indicates that all the following lines of a configuration file, until the occurrence of the line with the keyword NEW_STUDY, refer to the same study. Studies will be enumerated according to the number of occurrences of NEW_STUDY. Under each NEW_STUDY, the keyword FILE has to be specified. The first NEW_STUDY must occur after the GENERAL keyword, and the GENERAL keyword cannot reoccur after a NEW_STUDY. Keywords specified under NEW_STUDY overwrite values that were set under GENERAL. The keywords INTERSNP and INTERSNPSINGLE and the keywords that are set by them cannot be overwritten. 




OUTPUT <string> 
This obligatory keyword is used to specify the path and the name of the output files. The value of <string> is a name tag. All output file names begin with this tag. 


METHOD <string> 
This obligatory keyword is used to specify metaanalysis method that shall be applied. The methods are coded as 1 = Fisher’s method, 2 = Stouffer’s method with weights, 3 = Stouffer’s method with weights and effect directions, 4 = Method of synthesis of regression slopes. Several methods can be chosen in one run. Examples: METHOD 1;3; // do methods 1 and 3 METHOD 14; // do all four methods 


pFILTER <r> 
This optional keyword is used to set a pvalue cutoff level r. METAINTER produces two output files, one with all results, another one with those results that reached a particular pvalue cutoff. The default value for r is 1.0 × 10^{6}. 




INTERSNP format keywords can be used when the primary analysis of all studies was performed by INTERSNP. The keywords INTERSNP and INTERSNPSINGLE are special keywords that indicate that input files from all studies were generated with INTERSNP and therefore have the same format. The keywords INTERSNP or INTERSNPSINGLE have to be specified under GENERAL. When they are used, specification of model parameters becomes redundant. Only the INTERSNP test used in the primary analysis has to be specified.
INTERSNP <n> 
This optional keyword indicates that input files from all studies were generated with INTERSNP, and a multimarker test was used in the primary analysis. Here, n is the test indicator of an INTERSNP two or threemarker test. The indicators are those used with the INTERSNP keyword TEST. 


INTERSNPSINGLE <n> 
This optional keyword indicates that input files from all studies were generated with INTERSNP, and a singlemarker test was used in the primary analysis. Here, n is the test indicator of an INTERSNP singlemarker test. The indicators are those used with the INTERSNP keyword SINGLEMARKER. 




FILE <filename> 
This obligatory keyword specifies the path and the name of the input file for a current study. It has to be used under each NEW_STUDY. 


STUDYWEIGHT <r> 
This optional keyword specifies the weight of a study. It is needed in methods 2 and 3. The keyword has to be specified under each NEW_STUDY. 




For instance, square root of the sample size can be chosen as a study weight (Zaykin, 2011).
In case when "free" input file format is used, several additional keywords are obligatory. They can be specified either under GENERAL and will refer then to all studies, or under NEW_STUDY to set them for a current study. Values specified under GENERAL can be modified for a particular study by redefining the keywords in the corresponding NEW_STUDY block. File formats are allowed to differ across studies.
HEADERLINES <n> 
This optional keyword specifies the number of header lines in a study (all studies, when specified under GENERAL). The default is 0. 


nSNPs <n> 
This obligatory for free format keyword specifies the number of SNPs in the analysis model. It has to be specified under GENERAL. 


nPARAM <n> 
This obligatory for free format keyword specifies the number of parameters in the primary analysis model. It has to be specified under GENERAL. 


PARAMREFERENCE <string> 
This keyword is obligatory for free format, when methods 3 or 4 are selected. For each model parameter, PARAMREFERENCE indicates, which SNP this parameter refers to. The keyword has to be specified under GENERAL. 




We explain how to use the last two keywords by example. Suppose that the full genotype model defined by 2 SNPs (nSNPs 2) is used in casecontrol studies. In this case a logistic regression model
PARAMREFERENCE 1;1;2;2;1+2;1+2;1+2;1+2;
For each parameter, the PARAMREFERENCE string clarifies, which SNP (indicated by its number) the parameter depends on. The first parameter is x_{1}, it depends only on SNP 1 ("1;"). The second parameter is x_{1D}, it also depends only on SNP 1. Then it comes to parameters that depend on SNP 2 ("2;") only. Parameters 5 to 8 are interaction terms. They depend on both SNPs, hence we use "1+2;" for them. Models with more SNPs can be defined analogously.
PARAMTYPE <string> 
This keyword is obligatory for free format, when methods 3 or 4 are selected. For each parameter, PARAMTYPE indicates the type of the parameter (A for (log) additive or D for dominance variation). The keyword has to be specified under GENERAL. 




We again explain the keyword by the 2SNP full genotype 8 df model example. The proper usage of the keyword in this case is
PARAMTYPE A;D;A;D;A+A;A+D;D+A;D+D;
Parameters 1 and 3 depend on one SNP and are logadditive parameters, hence we use "A;". Parameters 2 and 4 depend on one SNP and are dominance variation parameters, hence we use "D;". Parameter 5 depends on two SNPs and corresponds to the interaction term x_{1}x_{2}, where the first and the second component are logadditive ("A+A;"). Analogously, for x_{1}x_{2D} term we have "A+D;", for x_{1D}x_{2} we use "D+A;", and for x_{1D}x_{2D} we write "D+D;".
In a 5SNP model including a parameter for the interaction term x_{1}x_{3}x_{5D}, we would specify PARAMREFERENCE "1+3+5;" and PARAMTYPE "A+A+D;" for this parameter.
pCOL <n> 
This keyword is obligatory for free format, when methods 1, 2 or 3 are selected. The keyword indicates the column with pvalues in each study, and can be specified both under GENERAL and NEW_STUDY. 


SNPCOLS <string> 
This obligatory for free format keyword indicates the columns with SNPs IDs. The keyword can be specified both under GENERAL and NEW_STUDY. 

Example: 

SNPCOLS 3;7; // Columns 3 and 7 contain SNPs IDs 


CHRCOLS <string> 
This optional keyword indicates the columns with SNPs chromosomes. The keyword can be specified both under GENERAL and NEW_STUDY. 

Example: 

CHRCOLS 2;6; // Columns 2 and 6 contain chromosomes of the SNPs (the same SNPs order as in SNPCOLS is assumed) 


POSCOLS <string> 
This optional keyword indicates the columns with SNPs positions (bp). The keyword can be specified both under GENERAL and NEW_STUDY. 

Example: 

POSCOLS 4;8; // Columns 4 and 8 contain position in bp of the SNPs (the same order as in SNPCOLS is assumed) 


ALLELECOLS <string> 
This keyword is obligatory for free format, when methods 3 or 4 are selected, and indicates the columns with SNPs alleles. The keyword can be specified both under GENERAL and NEW_STUDY. Note that the number of alleles that have to be specified is twice the number of SNPs. 

Example: 

ALLELECOLS 12;13;14;15; // Columns 12 to 15 contain SNPs alleles. 




Remark: Two alleles of the first SNP are given in column 12 and 13, those of the second SNP are given in columns 14 and 15. The sign of the parameter estimates refers to the alleles in columns 12 (SNP 1) and columns 14 (SNP 2). In other words, it is assumed that the alleles in columns 12, 14 were coded as "1" and that the alleles in columns 13, 15 were coded as "1" in the regression analysis. This rule coincides with that of PLINK, in SNPTEST the coding is the other way round (!).
BETACOLS <string> 
This keyword is obligatory for free format, when methods 3 or 4 are selected, and indicates the columns with parameter (beta) estimates. The keyword can be specified both under GENERAL and NEW_STUDY. One column for each parameter (as defined by nPARAM) is needed. 


SECOLS <string> 
This keyword is obligatory for free format, when methods 3 or 4 are selected, and indicates the columns with standard errors. The keyword can be specified both under GENERAL and NEW_STUDY. One column for each parameter (as defined by nPARAM) is needed. 


COVCOLS <string> 
This keyword is obligatory for free format, when method 4 is selected, and indicates the columns with the entries of the covariance matrix sigma. More precisely, only columns for the entries of the upper triangle (including the diagonal) of the covariance matrix have to be indicated. The keyword can be specified both under GENERAL and NEW_STUDY. The number of the required columns is (nPARAM+2)*(nPARAM+1)/2, see Section 2.4. 

Example: 

COVCOLS 3276; // columns 32 to 76 contain the entries of the upper triangle of the covariance matrix (model with nPARAM=8) 




An overview of all METAINTER keywords described above is given in the following table:




Top level keywords 
Status 
Description 




GENERAL 
obligatory 
To specify the general options 
NEW_STUDY 
obligatory 
To specify the options for a current study 




General keywords 






OUTPUT <string> 
obligatory 
To specify the path and the name of the output files 
METHOD <string> 
obligatory 
To specify the method(s) of the metaanalysis to be applied 


1=Fisher’s method; 


2=Stouffer’s method with weights; 


3=Stouffer’s method with weights and effect directions; 


4=Method of synthesis of regression slopes 
pFILTER <r> 
optional 
To specify the pvalue cutoff. By default r= 1.0 × 10^{6} 




INTERSNP format keywords 






INTERSNP <n> 
optional 
To indicate that input files from all studies were 


generated with INTERSNP and that 


a two or threemarker test was used; 


n is the argument of the keyword TEST in INTERSNP 
INTERSNPSINGLE <n> 
optional 
To indicate that input files from all studies were 


generated with INTERSNP and that 


a singlemarker test was used; 


n is the argument of the keyword SINGLEMARKER in INTERSNP 




Studyspecific keywords 






FILE <filename> 
obligatory 
To specify the path and the filename of the input file for a current study 
STUDYWEIGHT <r> 
optional 
To specify the weight for a current study 




Arbitrary input files keywords 






HEADERLINES <n> 
optional 
To indicate the number of header lines of a project 
nSNPS <n> 
obligatory 
To indicate the number of SNPs in the initial analysis model 


Has to be specified under GENERAL 
nPARAM <n> 
obligatory 
To indicate the number of parameters in the primary analysis model. 


Has to be specified under GENERAL 
PARAMREFRERENCE <string> 
obligatory^{*} 
To indicate for each parameter, which SNP it depends on 


Has to be specified under GENERAL 
PARAMTYPE <string> 
obligatory^{*} 
To indicate for each parameter, wether it is additive or dominance variance parameter 


Has to be specified under GENERAL 
pCOL <n> 
obligatory^{**} 
To indicate the column with pvalues 


Can be specified both under GENERAL and NEW_STUDY 
SNPCOLS <string> 
obligatory 
To indicate the columns with SNP names 


Can be specified both under GENERAL and NEW_STUDY 
CHRCOLS <string> 
optional 
To indicate the columns with SNP chromosomes 


Can be specified both under GENERAL and NEW_STUDY 
POSCOLS <string> 
optional 
To indicate the columns with SNP positions 




Can be specified both under GENERAL and NEW_STUDY 
ALLELECOLS <string> 
obligatory^{*} 
To indicate the columns with SNP alleles 


Can be specified both under GENERAL and NEW_STUDY 
BETACOLS <string> 
obligatory^{*} 
To indicate the columns with parameter (beta) estimates 


Can be specified both under GENERAL and NEW_STUDY 
SECOLS <string> 
obligatory^{*} 
To indicate the columns with standard errors 


Can be specified both under GENERAL and NEW_STUDY 
COVCOLS 
obligatory^{***} 
To indicate the columns with the entries of the upper triangle of the covariance matrix 


Can be specified both under GENERAL and NEW_STUDY 









^{*}obligatory for METHOD 3, 4
^{**}obligatory for METHOD 1, 2, 3
^{***}obligatory for METHOD 4
We assume that "test" has been specified as output name tag. The corresponding line in the configuration file is
OUTPUT test;
METAINTER creates the following output files:
The log file restates the selected keywords and provides some basic summary statistics. These should be selfexplanatory.
The main output file is tabseparated and contains all results. The majority of columns headings should be selfexplanatory. The column "minimalPlausibility" indicates, wether the metaanalysis pvalue is smaller than the smallest pvalue observed in any of the studies. The column "consistency" shows if the regression slopes of Study 1 (more precisely, the first nonmissing study) and Study j have the same direction in the nPARAMdimensional space, see Section 2.3. Studies with missing values are indicated by "x". The column "consistency" contains meaningful values only when METHOD 3 is selected.
This file has the same format as the main output file testResult.txt, but contains only those lines for which at least one metaanalysis method has a pvalue below the cutoff pFILTER (by default, 1.0 × 10^{6}).
This file lists the SNP tuples for which no metaanalysis was conducted. Possible reasons are:
a) The tuple was found in one study only;
b) Allele codes were inconsistent across studies;
c) The pvalue was missing or had invalid value in some studies.
The main output file can contain tuples with no valid metaanalysis. This can happen, for instance, in case of missing or invalid standard errors (value < 0), then methods 3 and 4 cannot be conducted.
Example. Consider a project, where:
Let the output files with the results from the INTERSNP run be titled as
for Study 1 to 3, respectively. A configuration file to perform the metaanalysis with METAINTER in this example has to be organized as follows:




Keyword 
Parameter 
Comment 
GENERAL 

// general options valid for all studies 



INTERSNP 
4 
// to indicate that input files from all studies were generated with INTERSNP, twomarker TEST 4 
METHOD 
1; 2; 3; 4; 
// to specify the method(s) of metaanalysis to be applied: 1=Fisher’s method, 2=Stouffer’s method with weights, 3=Stouffer’s method with weights and effect directions, 4=Method of synthesis of regression slopes 
pFILTER 
0.0001 
// pvalue cutoff 
OUTPUT 
MA_IS 
// the path and the name of the output files 



NEW_STUDY 

// options for a current study 



FILE 
Study1_IS.txt 
// the path and the name of the input file for a current study 
STUDYWEIGHT 
55 
// weight for a current study, here 55 is appr. square root of the sample size 3000 



NEW_STUDY 





FILE 
Study2_IS.txt 

STUDYWEIGHT 
45 




NEW_STUDY 





FILE 
Study3_IS.txt 

STUDYWEIGHT 
32 










In this section two examples of configuration files for free format input files are presented. In the first example, it is assumed that the results of the primary analysis are organized in the same manner in all studies, i.e. all input files have the same structure. The amount of columns, the order, in which they appear, etc. have to be consistent in all studies. The second example refers to the case, when the results of the primary analysis organized differently in different studies.
Example 1. Consider a project, where:
Let the input files with the results of the primary analysis are titled as
A configuration file to perform the metaanalysis with METAINTER in this example has to be organized as follows:




Keyword 
Parameter 
Comment 
GENERAL 

// general options valid for all studies 



OUTPUT 
MA_FF1 
// the path and the name of the output files 
METHOD 
1;2;3;4; 
// to specify the method(s) of metaanalysis to be applied: 1=Fisher’s method, 2=Stouffer’s method with weights, 3=Stouffer’s method with weights and effect directions, 4=method of synthesis of regression slopes 
pFILTER 
0.0001 
// pvalue cutoff 
HEADERLINES 
1 
// number of header lines of a project 
nSNPS 
2 
// number of SNPs in the primary analysis model 
nPARAM 
4 
// number of parameters in the primary analysis model 
PARAMREFERENCE 
1+2;1+2;1+2;1+2; 
// to indicate for each parameter, which SNP it depends on 
PARAMTYPE 
A+A;A+D;D+A;D+D; 
// to indicate for each parameter, wether it is additive or dominance variance parameter 
pCOL 
10 
// column with pvalues 
SNPCOLS 
3;7; 
// columns with SNP names 
CHRCOLS 
2;6; 
// columns with SNP chromosomes 
POSCOLS 
4;8; 
// columns with SNP positions 
ALLELECOLS 
1114; 
// columns with SNP alleles 
BETACOLS 
15;17;19;21; 
// columns with parameter (beta) estimates 
SECOLS 
16;18;20;22; 
// columns with standard error 
COVCOLS 
2337; 
// columns with the entries of the upper triangle of the covariance matrix 



NEW_STUDY 

// options for a current study 



FILE 
Study1_FF1.txt 
// the path and the name of the input file for a current study 
STUDYWEIGHT 
55 
// weight for a current study, here 55 is appr. square root of the sample size 3000 



NEW_STUDY 





FILE 
Study2_FF1.txt 

STUDYWEIGHT 
45 




NEW_STUDY 





FILE 
Study3_FF1.txt 

STUDYWEIGHT 
32 










Example 2. Consider a project, where:
Let the input files with the results of the primary analysis are titled as
A configuration file to perform the metaanalysis with METAINTER in this example has to be organized as follows:




Keyword 
Parameter 
Comment 
GENERAL 

// general options valid for all studies 



OUTPUT 
MA_FF2 
// the path and the name of the output files 
METHOD 
1;2;3;4; 
// to specify the method(s) of metaanalysis to be applied: 1=Fisher’s method, 2=Stouffer’s method with weights, 3=Stouffer’s method with weights and effect directions, 4=method of synthesis of regression slopes 
pFILTER 
0.0001 
// pvalue cutoff 
nSNPS 
2 
// number of SNPs in the primary analysis model 
nPARAM 
4 
// number of parameters in the primary analysis model 
PARAMREFERENCE 
1+2;1+2;1+2;1+2; 
// to indicate for each parameter, which SNP it depends on 
PARAMTYPE 
A+A;A+D;D+A;D+D; 
// to indicate for each parameter, wether it is additive or dominance variance parameter 



NEW_STUDY 

// options for a current study 



FILE 
Study1_FF2.txt 
// the path and the name of the input file for a current study 
HEADERLINES 
1 
// number of header lines of a project 
pCOL 
10 
// column with pvalues 
SNPCOLS 
3;7; 
// columns with SNP names 
CHRCOLS 
2;6; 
// columns with SNP chromosomes 
POSCOLS 
4;8; 
// columns with SNP positions 
ALLELECOLS 
1114; 
// columns with SNP alleles 
BETACOLS 
15;17;19;21; 
// columns with parameter (beta) estimates 
SECOLS 
16;18;20;22; 
// columns with standard error 
COVCOLS 
2337; 
// columns with the entries of the upper triangle of the covariance matrix 
STUDYWEIGHT 
55 
// weight for a current study, here 55 is appr. square root of the sample size 3000 



NEW_STUDY 





FILE 
Study2_FF2.txt 

HEADERLINES 
1 

pCOL 
11 

SNPCOLS 
1;2; 

CHRCOLS 
3;4; 

POSCOLS 
5;6; 

ALLELECOLS 
710; 

BETACOLS 
1215; 

SECOLS 
1619; 

COVCOLS 
2034; 

STUDYWEIGHT 
2000 




NEW_STUDY 




FILE 
Study3_FF2.txt 

HEADERLINES 
0 

pCOL 
11 

SNPCOLS 
3;8; 

CHRCOLS 
1;6; 

POSCOLS 
2;7; 

ALLELECOLS 
4;5;9;10; 

BETACOLS 
12;14;16;18; 

SECOLS 
13;15;17;19; 

COVCOLS 
2034; 

STUDYWEIGHT 
1000 










Becker, B.J., Wu, M.J. (2007) The synthesis of regression slopes in metaanalysis. Stat. Sci. 22, 414429.
Cordell, H.J., Clayton, D.G. (2002) A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using casecontrol or family data: application to HLA in type 1 diabetes. Am. J. Hum. Genet. 70, 124–141.
Fisher, R. (1932) Statistical methods for research workers. Oliver and Boyd, Edinburgh.
Herold, C., Steffens, M. et al. (2009) INTERSNP: genomewide interaction analysis guided by a priori information, Bioinformatics 25, 32753281, doi: 10.1093/bioinformatics/btp596.
Herold, C., Mattheisen, M. et al. (2012) Integrated genomewide pathway association analysis with INTERSNP. Hum. Hered. 73, 6372, doi: 10.1159/000336196.
Lipták, T. (1959) On the combination of independent tests, Publ. Math. Inst. Hungar. Acad. Sci. 3, 171197.
Stouffer, S., DeVinney, L. et al. (1949) The American soldier: Adjustment during army life. Vol. 1. Princeton University Press, Princeton, US.
Zaykin, D.V. (2011) Optimally weighted Ztest is a powerful method for combining probabilities in metaanalysis, J. Evol. Biol. 24, 18361841, doi: 10.1111/j.14209101.2011.02297.x.