Correlation: Tetrachoric model
In the model underlying tetrachoric correlation it is assumed that the frequency data in a 2 × 2-table stem from dichotimizing two continuous random variables X and Y that are bivariate normally distributed with mean m = (0, 0) and covariance matrix:
For instance, one could be interested in correlating the responses to two yes-no questionnaire items, each of which relates to an underlying normally distributed variable (e.g., product preferences).
The value ρ in the above (normalized) covariance matrix is the tetrachoric correlation coefficient. The frequency data in the table depend on the criterion used to dichotomize the marginal distributions of X and Y and the tetrachoric correlation coefficient.
From a theoretical perspective, a specific scenario is completely characterized by a 2 × 2 probability matrix:

where the marginal probabilities are regarded as fixed. The whole matrix is completely specified if the marginal probabilities p⋆2 = Pr(X = 1), p2⋆ = Pr(Y = 1) and the table probability p11 are given. If z1 and z2 denote the quantiles of the standard normal distribution corresponding to the marginal probabilities p⋆1 and p1⋆, that is, Φ(zx) = p⋆1 and Φ(zy) = p1⋆, then p11 is the CDF of the bivariate normal distribution described above, with the upper limits zx and zy:

where Φ(x, y, r) denotes the density of the bivariate normal distribution. The upper limits zx and zy are the values at which the variables X and Y are dichotomized.
Observed frequency data are assumed to be random samples from this theoretical distribution. Thus, it is assumed that random vectors (xi, yi) have been drawn from the bivariate normal distribution described above that are afterwards assigned to one of the four cells according to the 'column' criterion xi ≤ zx versus xi > zx and the 'row' criterion yi ≤ zy versus yi > zy. Given a frequency table

The central problem in testing specific hypotheses about tetrachoric correlation coefficients is to estimate the correlation coefficient and its standard error from frequency data.
Two approaches to solving this problem have be proposed. One apporach is to estimate the exact correlation coefficient (e.g. Brown & Benedetti, 1977). The other approach is to use simple approximations ρ* of ρ that are easier to compute (e.g. Bonett & Price, 2005). G*Power provides power analyses for both approaches. (See, however, the implementation notes for a qualification of the term 'exact' used to distinguish between both approaches.)
The exact computation of the tetrachoric correlation coefficient is difficult. One reason is computational in nature (see the implementation notes below). A more principal problem is, however, that frequency data are discrete, which implies that the estimation of a cell probability can be no more accurate than 1/(2N). The inaccuracies in estimating the true correlation ρ are especially severe when there are cell frequencies smaller than 5. In these cases, caution is warranted when interpreting the estimated r. For a more thorough discussion of these issues see Brown and Benedetti (1977) and Bonett and Price (2005).
Testing the tetrachoric correlation coefficient
The implemented routines estimate the power of a test that the tetrachoric correlation ρ has a fixed value ρ0. That is, the null and alternative hypothesis for a two-sided test areH0 : ρ − ρ0 = 0The hypotheses are identical for both the exact and the approximation mode.
H1 : ρ − ρ0 ≠ 0.
In the power procedures the use of the Wald test statistic: W = (r − ρ0)/se0(r) is assumed, where se0(r) is the standard error computed at ρ = ρ0.
As will be illustrated in the example section, the outputs of G*Power may be also be used to perform the statistical test.
Effect size index
The correlation coefficient assumed under H1 (H1 corr ρ) is used as effect size. The following additional inputs are needed to fully specify the effect size:H0 corr ρ
This is the tetrachoric correlation coefficient assumed under H0. An input of the typeH1 corr ρ = H0 corr ρcorresponds to "no effect" and must not be used in a priori power calculations.
Marginal prop x.
This is the marginal probability that X > zx (i.e. p*2)Marginal prop y.
This is the marginal probability that Y > zy (i.e. p2*)The correlations must be within the interval ]− 1, 1[. The probabilities must be within the interval ]0, 1[.
Effect size calculation
The effect size drawer may be used to determine H1 corr ρ in two different ways.A first possibility is to specify, for each cell of the 2 × 2 table, the probability of this event assumed under H1. Pressing the Calculate button calculates the exact (Correlation ρ) and approximate (Approx. correlation ρ∗) tetrachoric correlation coefficient, and the marginal probabilities Marginal prob x = p12 + p22, and Marginal prob y = p21 + p22. The exact correlation coefficient is used as H1 corr ρ (see below).
Note that the four cell probabilities must sum to 1. It therefore suffices to specify three of them explicitly. If you leave one of the four cells empty, G*Power computes the fourth value as: (1 - sum of three p).

A second possibility is to compute a confidence interval for the tetrachoric correlation in the population from the results of a previous investigation, and to choose a value from this interval as H1 corr ρ. In this case you specify four observed frequencies, the relative position 0 < k < 1 inside the confidence interval (0, 0.5, 1 corresponding to the left, central, and right position, respectively), and the confidence level (1 − α) of the confidence interval (see below).
From these data G*Power computes the total sample size N = f11 + f12 + f21 + f22 and estimates the cell probabilities pij by: pij = (fij + 0.5)/(N + 2). These are used to compute the sample correlation coefficient r, the estimated marginal probabilities, the borders (L, R) of the (1 − α) confidence interval for the population correlation coefficient ρ, and the standard error of r. The value L + (R − L) ∗ k is used as H1 corr ρ.
The computed correlation coefficient, the confidence interval, and the standard error of r depend on whether the exact (Brown & Benedetti, 1977) or the approximate (Bonett & Price, 2005) computation method was chosen in the Options dialog in the main window. In the exact mode, the labels of the output fields are Correlation r, C.I. ρ lwr, C.I. ρ upr, and Std. error of r. In the approximate mode an asterisk ∗ is appended after r and ρ.
From these data G*Power computes the total sample size N = f11 + f12 + f21 + f22 and estimates the cell probabilities pij by: pij = (fij + 0.5)/(N + 2). These are used to compute the sample correlation coefficient r, the estimated marginal probabilities, the borders (L, R) of the (1 − α) confidence interval for the population correlation coefficient ρ, and the standard error of r. The value L + (R − L) ∗ k is used as H1 corr ρ.
The computed correlation coefficient, the confidence interval, and the standard error of r depend on whether the exact (Brown & Benedetti, 1977) or the approximate (Bonett & Price, 2005) computation method was chosen in the Options dialog in the main window. In the exact mode, the labels of the output fields are Correlation r, C.I. ρ lwr, C.I. ρ upr, and Std. error of r. In the approximate mode an asterisk ∗ is appended after r and ρ.

Clicking on the button Calculate and transfer to main window copies the values given in H1 corr ρ, Margin prob x, Margin prob y, and - in frequency mode - Total sample size to the corresponding input fields in the main window.
Options
You can choose between the exact approach in which the procedure proposed by Brown and Benedetti (1977) is used and the approximation suggested by Bonett and Price (2005).Examples
To illustrate the application procedure we refer to Example 1 in Bonett and Price (2005). The Yes or No answers of 930 respondents to two questions in a personality inventory are recorded in a 2 × 2 table with the following result: f11 = 203, f12 = 186, f21 = 167, f22 = 374.First we use the effect size dialog to compute from these data the confidence interval for the tetrachoric correlation in the population. We choose, in the effect size drawer, From C.I. calculated from observed freq. Next we insert the above values in the corresponding fields and press Calculate. Using the exact computation mode (selected in the Options dialog in the main window), we get an estimated correlation of r = 0.334, a standard error of r = 0.0482, and a 95% confidence interval of [0.240, 0.429] for the population ρ. We choose the left border of the C.I. (i.e. relative position 0, corresponding to 0.240) as the value of the tetrachoric correlation coefficient ρ under H0.
We now want to know how many subjects we need to a achieve a power of 0.95 in a one-sided test of the H0 that ρ = 0 vs. the H1 that ρ = 0.24, given the same marginal probabilities and α = 0.05.
Clicking on Calculate and transfer to main window copies the computed H1 corr ρ = 0.2399846 and the marginal probabilities px = 0.6019313 and py = 0.5815451 to the corresponding input fields in the main window. The complete input and output is as follows:
Select
Type of power analysis: A priori
Input
Tail(s): One
H1 corr ρ: 0.2399846
α err prob: 0.05
Power (1-β err prob): 0.95
H0 corr ρ: 0
Marginal prob x: 0.6019313
Marginal prob y: 0.5815451
Output
Critical z: 1.644854
Total sample size: 463
Actual power: 0.950370
H1 corr ρ: 0.239985
H0 corr ρ: 0.0
Critical r lwr: 0.122484
Critical r upr: 0.122484
Std err r: 0.074465
This shows that we need at least a sample size of 463 in this case (the Actual power output field shows the power for a sample size rounded to an integer value).
The output also contains the values for ρ under H0 and H1 used in the internal computation procedure. In the exact computation mode a deviation from the input values would indicate that the internal estimation procedure did not work correctly for the input values (this should only occur for extreme values of r or marginal probabilities). In the approximate mode, the output values correspond to the r values resulting from the approximation formula.
The remaining outputs show the critical value(s) for r under H0: In the Wald test assumed here, z = (r − ρ0)/se0(r) is approximately standard normally distributed under H0. The critical values of r under H0 are given
- as a quantile z1 − α/2 of the standard normal distribution, and
- in the form of critical correlation coefficients r and standard error se0(r). (In one-sided tests, the single critical value is reported twice in Critical r lwr and Critical r upr). In the example given above, the standard error of r under H0 is 0.074465, and the critical value for r is 0.122484. Thus, (r − ρ)/se(r) = (0.122484 − 0)/0.074465 = 1.64485 = z1−α, as expected.
Using G*Power to perform the statistical test of H0
G*Power may also be used to perform the statistical test of H0. Assume that we want to test theH0: ρ = ρ0 = 0.4 againstfor α = 0.05. Assume further that we observed the following frequencies:
H1: ρ ≠ 0.4
f11 = 120,To perform the test we first open the effect size drawer and select the From C.I. calculated from observed freq option. Here we compute from the observed frequencies the correlation coefficient r and the estimated marginal probabilities. In the exact mode we find
f12 = 45,
f21 = 56, and
f22 = 89.
Correlation r = 0.512751,In the main window we then choose a Post hoc type of power analysis. Clicking on Calculate and transfer to main window in the effect size drawer copies the values for marginal x, marginal y, and the sample size 310 to the main window. We now set
Est. marginal prob x = 0.4326923, and
Est. marginal prob y” = 0.4679487.
Tail(s) = TwoAfter clicking on Calculate in the main window, the output section shows the critical values for the correlation coefficient ([0.244446, 0.555554]) and the standard error under H0 (0.079366). These values show that the test is not significant for the chosen α-level, because the observed r = 0.512751 lies inside the interval [0.244446, 0.555554]. We then use the G*Power calculator to compute the associated p value. Inserting
H0 corr ρ = 0.4 and
α err prob = 0.05.
z = (0.513-0.4)/0.0794; 1-normcdf(z,0,1)and clicking on the Calculate button yields p = 0.077.
If we instead want to use the approximate mode, we would choose the Options dialog in the main window and then choose Use approximation (Bonett and Price, 2005). We may then proceed in essentially the same way as described above. In this case we find a very similar value for the correlation coefficient r∗ = 0.5093278. The critical values for r∗ given in the output section of the main window are [0.233365, 0.540709] and the standard error for r∗ is 0.078882.
Note: To compute the p value in the Use approximation (Bonett and Price, 2005) mode, we should use H0 corr ρ∗ given in the output and not H0 corr ρ specified in the input. Accordingly, in the G*Power calculator we enter
z = (0.509-0.397)/0.0788; 1-normcdf(z,0,1)which yields p = 0.0776, a value very close to that given above for the exact mode.
Related tests
Correlation: Bivariate normal model
Correlation: Point biserial model
Implementation notes
Given ρ and the marginal probabilties px and py, the following procedures are used to calculate the value of ρ (in the exact mode) or ρ* (in the approximate mode) and to estimate the standard error of r and r*.Exact mode
In the exact mode the algorithms proposed by Brown and Benedetti (1977) are used to calculate r and to estimate the standard error s(r). Note that the latter is not the expected standard error σ(r)! To compute σ(r) would require to enumerate all possible tables Ti for the given N. If p(Ti) and ri denote the probability and the correlation coefficient of table i, then σ2(r) = ∑i (ri − ρ)2 p(Ti) (see Brown and Benedetti, 1977, p. 349, for details). The number of possible tables increases rapidly with N. It is therefore in general computationally too expensive to compute this exact value. Thus, 'exact' does not mean that the exact standard error is used in the power calculations.In the exact mode it is not necessary to estimate r in order to calculate power, because it is already given in the input. We nevertheless report the value of r calculated by the routine in the output to indicate possible limitations in the precision of the routine for |r| near 1. Thus, if the r's reported in the output section deviate markedly from those given in the input, all results should be interpreted with caution.
To estimate s(r) the formula based on asymptotic theory proposed by Pearson in 1901 is used:

or, with respect to cell probabilities,

where

Brown and Benedetti (1977) show that this approximation is quite good if the minimal cell frequency is at least 5 (see their Tables 1 and 2).
Approximation mode
Bonett and Price (2005) propose the following approximations.Correlation coefficient
Their approximation ρ∗ of the tetrachoric correlation coefficient is:ρ* = cos(π/(1 + ωc)),
where c = (1 − |p1* − p*1 |/5 − (1/2 − pm)2)/2, with p1* = p11 + p12 , p*1 = p11 + p21, pm = the smallest marginal proportion, and ω = p11 p22 /(p12 p21). The same formulae are used to compute an estimate r* from frequency data fij. The only difference is that estimates pij* = (fij + 0.5)/N of the true probabilities are used.
Confidence Interval
The 100 · (1 − α) confidence interval for r* is computed as follows:CI = [cos(π/(1 + Lc* )), cos(π/(1 + Uc*))],
where

and zα/2 is the α/2 quartile of the standard normal distribution.
Asymptotic standard error for r*
The standard error is given by:
with


Power calculation
The H0 distribution is the standard normal distribution N(0, 1). The H1 distribution is the normal distribution with mean N(m1, s1), wherem1 = (ρ − ρ0)/sr0
s1 = sr1/sr0.
s1 = sr1/sr0.
The values sr0 and sr1 denote the standard error under H0 and H1, that is, s(ρ0) and s(ρ) in the exact mode, s(ρ0*) and s(ρ*) in the approximation mode.
Validation
The correctness of the procedure used to calculate r, s(r) and r*, s(r*) was checked by reproducing the examples in Brown and Benedetti (1977) and Bonett and Price (2005), respectively. The soundness of the power routines were checked by Monte-Carlo simulations in which we found good agreement between simulated and predicted power.References
Bonett, D. G., & Price, R. M. (2005). Inferential methods for the tetrachoric correlation coefficient. Journal of Educational and Behavioral Statistics, 30, 213-225.Brown, M. B., & Benedetti, J. K. (1977). On the mean and variance of the tetrachoric correlation coefficient. Psychometrika, 42, 347-355.
Letzte Änderung: 29.06.2009, 16:43

