One-Tailed versus Two-Tailed Tests

If you are interested in testing two directional parameter hypotheses against each other (e.g., H0: mu1 <= mu2; H1: mu1 > mu2), a one-tailed test is more appropriate than a two- tailed test. Limiting the region of rejection to one tail of the sampling distributions of H1 provides greater power with respect to an alternative hypothesis in the direction of that tail. The figure below tries to illustrate this.

Alpha Error Probability

Alpha is the probability of falsely accepting H1 when in fact H0 is true. The figure below illustrates alpha for an F-test with respect to an alternative hypothesis that corresponds to a so-called "noncentral" F sampling distribution defined by the noncentrality parameter lambda.

Power and the Beta Error Probability

The power of a test is defined as 1-beta, and beta is the probability of falsely accepting H0 when in fact H1 is true. The figure below illustrates beta and the power of an F-test with respect to an alternative hypothesis that corresponds to a so-called "noncentral" F sampling distribution defined by the noncentrality parameter lambda.

Effect Size

Effect size can be conceived of as measures of the "distance" between H0 and H1.

Hence, effect size refers to the underlying population rather than a specific sample. In specifying an effect size, researchers define the degree of deviation from H0 that they consider important enough to warrant attention. In other words, effects that are smaller than the specified effect size are considered negligible. The effect size parameter should be specified prior to collecting (or analyzing) the data.

Which choice is considered appropriate depends on

  1. the theoretical context of the research,
  2. related research results published previously, and
  3. cost-benefit considerations in applied research.

Cohen's (1969, 1977, 1988, 1992) effect size measures are well known and his conventions of "small," "medium," and "large" effects proved to be useful. For these reasons, we decided to render G*Power completely compatible with Cohen's measures and to display the effect size conventions appropriate for the type of test selected. These effect size indices and some of the computational procedures to arrive at effect size estimates are described in the context of the tests for which they have been defined. These are:

t-Test on Means

d

t-Test on Correlations

r

Other t-Tests

f

F-Test (ANOVA)

f

F-Test (MCR)

f2

Other F-Tests

f2

Chi-Square Test

w

In G*Power, effect size values can either be entered directly or they can be calculated from basic parameters characterizing H1 (e.g., means, variances, and probabilities). To use the latter option, users must click on the "Calc 'x' " button (x representing the effect size parameter of the test currently selected).

In order to prepare the appropriate G*Power input, it may sometimes be necessary to know the relation between the sample size and the effect size measure on the one hand and the noncentrality parameter of the noncentral distributions on the other hand. We have provided the relation between the sample size, the effect size measures, and the noncentrality parameters on a separate page.

Total Sample Size

In G*Power the total sample size is the number of subjects summed over all groups of the design.

In a t-test on means, the sample size may vary between groups A and B. Note, however, that in this case we want sigma to be approximately equal in both groups. Otherwise, both the t-test and the corresponding G*Power calculations may be misleading because the distributions of the test statistic under H0 and H1 will differ substantially from (central and noncentral) t-distributions.

Another problem could be unequal standard deviations in the populations underlying the two samples. In this case, Cohen (1977) recommended to adjust sigma to sigma' according to

 

             ________________________
            /                        \
sigma'=    /   sigmaA2  +  sigmaB2
          /  _______________________
         /             
       \/               2
 

According to Cohen (1977) the number of participants in both groups A and B must be equal for this correction to be acceptable. If the group sizes vary, then this adjustment is not appropriate.

Please note that you will only arrive at an approximation of the true power of the t-test if the assumption of equal variances is violated. However, Cohen (1977) argues that the approximation will be "adequate" from most purposes.

As a general warning, you should keep in mind that G*Power results are valid if the statistical assumptions underlying the tests are met (e.g., normal distributions and homogeneous variances within cells). Some work has been done on the robustness of these tests, that is, the deviation of actual and nominal alpha error probablities when the distribution assumptions are not met. However, little is known on a test's power given a misspecified distribution model. Thus, G*Power results may or may not be useful approximations to the true power values in such cases.

In F-Test (ANOVA), we assume that there is an equal number of subjects in each group. If, in a post-hoc or compromise power analysis, the total sample size is not a multiple of the group size, then the power analysis will be based on the average group size (a noninteger value). G*Power will inform you if this is the case.

Note also that in a priori power analyses, the sample size is usually rounded to the next multiple of the number of groups or cells in your design. This implies that the actual power of your test usually is slightly larger than the power you entered as a parameter.

 

The Ratio q:= beta/alpha

In a compromise power analysis, the ratio q := beta/alpha specifies the relative seriousness of both types of errors (cf. Cohen, 1965, 1988, p. 5).

For instance, if alpha errors appear twice as serious as beta errors, then you can risk a beta error which is twice as large as alpha, thus q = beta/alpha = 2/1 = 2. This value is what you would then insert as the "beta/alpha ratio" in a compromise power analysis.

Alternatively, if you think you'd rather not risk committing a beta error (e.g., a beta error is considered three times as important as an alpha error), then you would specify q = beta/alpha = 1/3 = 0.3333.

These choices depend on the different valences you associate with either outcome of the test. However, we suspect that in basic psychological research at least, q = beta/alpha 1/1 = 1 is the rational choice most often.

Given your decision as to the relative seriousness of both types of errors, the problem is to calculate an optimum critical value for the test statistic which satisfies beta/alpha = q. This optimum critical value can be regarded as a rational compromise (hence the term "Compromise power analysis") between the demands for a low alpha-risk and a large power level, given a fixed sample size.

 

The Noncentrality Parameter

The noncentrality parameter of the t distribution is called delta, and that of the F and Chi^2 distributions is called lambda. Both measures increase as a function of N and the effect size postulated by H1. More detailed information about the relation among sample size, effect size, and the noncentrality parameter is also available.


The Critical Value

The critical value of the test statistic (z, t, F, and Chi^2 in the cases we look at here) defines the boundary of the rejection region of H0. Publications of power values and final decisions concerning total sample sizes or critical values should always be based on accuracy mode calculations.

 

The Relation Among Sample Size, Effect Size, and Noncentrality Parameter

It may sometimes be necessary to know the relation between the total sample size and the effect size measure on the one hand and the noncentrality parameter of the noncentral distributions on the other hand. Therefore, we present these relations here for all test procedures offered by G*Power.

t-Test on Means

In t-test on means, the noncentrality parameter delta is

	
               ____________
              /            \
             /   n1 * n2
delta = d * /______________
           /     
         \/      n1 + n2
		 
			

where

    | mu1 - mu2 |
d = _____________
       sigma
		  
			

is Cohen's (1977, 1988, p. 40) effect size parameter for t tests for means, and n1 and n2 are the sample sizes in groups 1 and 2, respectively.

t-Test on Correlations

In t-test on correlations, the noncentrality parameter delta is

 
             _______________
            /               \
           /    rho2         
delta =   / ___________ * N
         /           
       \/      1-rho2
			

where N is the total sample size (i.e., the number of pairs of values) and rho is the population correlation coefficient according to H1 (i.e., Cohen's rho, see Cohen, 1977, 1988, p. 77-81).

Other t-Tests

In the Other t-Tests option we used f as an effect size measure (cf. Cohen, 1977, 1988, Chap. 8.2). The relation between delta and f is

              ___
             /   \		
delta = f * /  N
          \/
			 
			
F-Test (ANOVA), F-Test (MCR), and Other F-Tests

The standardized effect size measures f or f2 are also used in power analyses for F-tests (F-Test (ANOVA), F-Test (MCR), and Other F-Tests). Their relation to the noncentrality parameter lambda of the noncentral F distribution is given by

          
lambda= f2 * N

where

        rho2
f2 = __________
       1-rho2
  

and rho^2 denotes the coefficient of determination in the population according to H1 (e.g., Koele, 1982, p. 514). For global ANOVA F-tests, rho^2 is just eta2.

For special F-tests of main effects or interactions in complex ANOVA-designs, rho2 equals the partial eta2.

Analogously, rho 2 coincides with the (partial) squared multiple correlation in multiple regression/correlation F-tests (cf. Cohen, 1988, Chap. 9.2.1).

Chi-Square Tests

For Chi-Square tests based on m-cell contingency tables (m in N), Cohen (1977, 1988, Chap. 7) uses

		  
         ________________________________
        /                                \
       /    m       (p0(i) - p1(i))2
w :=  /    Sum  _______________________
     /     i=1          p0(i)
   \/
		  
		  

as an effect size measure, where p0(i) and p1(i) denote the cell probabilities for the i-th cell according to H0 and H1, respectively. Then

			
           
lambda = w2 * N
			
			

is the noncentrality parameter of the noncentral chi-square distribution (Cohen, 1988, p. 549).

 

Actual Power

When you use G*Power to perform an a priori power analysis, the program calculates the 'exact' sample size for you. Assume that this exact sample size for a t-test is 60.70. Of course, you cannot recruit 60.70 subjects. Therefore, G*Power rounds to the next reasonable integer for your t-test, which would be 62 (two groups of 31 subjects each).

However, 62 is larger than 60.70, and one way to express what this means is to say that, with 62 subjects and all other parameters being equal, your t-test has more power to detect an effect than it would have given the 'exact' number of 60.70 subjects. This 'inflated' power value is displayed as Actual power. Note that in this way G*Power guarantees that with the sample size computed for an a priori power analysis, the power of your test is always at least the power you specified.


Math

Home

Prg


Please report suggestions for improvements to
Axel Buchner, Franz Faul, or Edgar Erdfelder.