|
If you are interested in testing two directional parameter hypotheses against each other (e.g., H0: mu1 <= mu2; H1: mu1 > mu2), a one-tailed test is more appropriate than a two- tailed test. Limiting the region of rejection to one tail of the sampling distributions of H1 provides greater power with respect to an alternative hypothesis in the direction of that tail. The figure below tries to illustrate this. |


|
Alpha is the probability of falsely accepting H1 when in fact H0 is true. The figure below illustrates alpha for an F-test with respect to an alternative hypothesis that corresponds to a so-called "noncentral" F sampling distribution defined by the noncentrality parameter lambda. |
|
The power of a test is defined as 1-beta, and beta is the probability of falsely accepting H0 when in fact H1 is true. The figure below illustrates beta and the power of an F-test with respect to an alternative hypothesis that corresponds to a so-called "noncentral" F sampling distribution defined by the noncentrality parameter lambda. |
|
Effect size can be conceived of as measures of the "distance" between H0 and H1. Hence, effect size refers to the underlying population rather than a specific sample. In specifying an effect size, researchers define the degree of deviation from H0 that they consider important enough to warrant attention. In other words, effects that are smaller than the specified effect size are considered negligible. The effect size parameter should be specified prior to collecting (or analyzing) the data. Which choice is considered appropriate depends on
Cohen's (1969, 1977, 1988, 1992) effect size measures are well known and his conventions of "small," "medium," and "large" effects proved to be useful. For these reasons, we decided to render G*Power completely compatible with Cohen's measures and to display the effect size conventions appropriate for the type of test selected. These effect size indices and some of the computational procedures to arrive at effect size estimates are described in the context of the tests for which they have been defined. These are: |
|
|
d | |
|
|
r | |
|
|
f | |
|
|
f | |
|
|
f2 | |
|
|
f2 | |
|
|
w |
|
In G*Power, effect size values can either be entered directly or they can be calculated from basic parameters characterizing H1 (e.g., means, variances, and probabilities). To use the latter option, users must click on the "Calc 'x' " button (x representing the effect size parameter of the test currently selected). In order to prepare the appropriate G*Power input, it may sometimes be necessary to know the relation between the sample size and the effect size measure on the one hand and the noncentrality parameter of the noncentral distributions on the other hand. We have provided the relation between the sample size, the effect size measures, and the noncentrality parameters on a separate page. |
|
In G*Power the total sample size is the number of subjects summed over all groups of the design. In a t-test on means, the sample size may vary between groups A and B. Note, however, that in this case we want sigma to be approximately equal in both groups. Otherwise, both the t-test and the corresponding G*Power calculations may be misleading because the distributions of the test statistic under H0 and H1 will differ substantially from (central and noncentral) t-distributions. Another problem could be unequal standard deviations in the populations underlying the two samples. In this case, Cohen (1977) recommended to adjust sigma to sigma' according to
According to Cohen (1977) the number of participants in both groups A and B must be equal for this correction to be acceptable. If the group sizes vary, then this adjustment is not appropriate. Please note that you will only arrive at an approximation of the true power of the t-test if the assumption of equal variances is violated. However, Cohen (1977) argues that the approximation will be "adequate" from most purposes. As a general warning, you should keep in mind that G*Power results are valid if the statistical assumptions underlying the tests are met (e.g., normal distributions and homogeneous variances within cells). Some work has been done on the robustness of these tests, that is, the deviation of actual and nominal alpha error probablities when the distribution assumptions are not met. However, little is known on a test's power given a misspecified distribution model. Thus, G*Power results may or may not be useful approximations to the true power values in such cases. In F-Test (ANOVA), we assume that there is an equal number of subjects in each group. If, in a post-hoc or compromise power analysis, the total sample size is not a multiple of the group size, then the power analysis will be based on the average group size (a noninteger value). G*Power will inform you if this is the case. Note also that in a priori power analyses, the sample size is usually rounded to the next multiple of the number of groups or cells in your design. This implies that the actual power of your test usually is slightly larger than the power you entered as a parameter. |
|
In a compromise power analysis, the ratio q := beta/alpha specifies the relative seriousness of both types of errors (cf. Cohen, 1965, 1988, p. 5). For instance, if alpha errors appear twice as serious as beta errors, then you can risk a beta error which is twice as large as alpha, thus q = beta/alpha = 2/1 = 2. This value is what you would then insert as the "beta/alpha ratio" in a compromise power analysis. Alternatively, if you think you'd rather not risk committing a beta error (e.g., a beta error is considered three times as important as an alpha error), then you would specify q = beta/alpha = 1/3 = 0.3333. These choices depend on the different valences you associate with either outcome of the test. However, we suspect that in basic psychological research at least, q = beta/alpha 1/1 = 1 is the rational choice most often. Given your decision as to the relative seriousness of both types of errors, the problem is to calculate an optimum critical value for the test statistic which satisfies beta/alpha = q. This optimum critical value can be regarded as a rational compromise (hence the term "Compromise power analysis") between the demands for a low alpha-risk and a large power level, given a fixed sample size. |
|
The noncentrality parameter of the t distribution is called delta, and that of the F and Chi^2 distributions is called lambda. Both measures increase as a function of N and the effect size postulated by H1. More detailed information about the relation among sample size, effect size, and the noncentrality parameter is also available. |
|
The critical value of the test statistic (z, t, F, and Chi^2 in the cases we look at here) defines the boundary of the rejection region of H0. Publications of power values and final decisions concerning total sample sizes or critical values should always be based on accuracy mode calculations. |
|
It may sometimes be necessary to know the relation between the total sample size and the effect size measure on the one hand and the noncentrality parameter of the noncentral distributions on the other hand. Therefore, we present these relations here for all test procedures offered by G*Power. t-Test on MeansIn t-test on means, the noncentrality parameter delta is
where
is Cohen's (1977, 1988, p. 40) effect size parameter for t tests for means, and n1 and n2 are the sample sizes in groups 1 and 2, respectively. t-Test on CorrelationsIn t-test on correlations, the noncentrality parameter delta is
where N is the total sample size (i.e., the number of pairs of values) and rho is the population correlation coefficient according to H1 (i.e., Cohen's rho, see Cohen, 1977, 1988, p. 77-81). Other t-TestsIn the Other t-Tests option we used f as an effect size measure (cf. Cohen, 1977, 1988, Chap. 8.2). The relation between delta and f is
F-Test (ANOVA), F-Test (MCR), and Other F-TestsThe standardized effect size measures f or f2 are also used in power analyses for F-tests (F-Test (ANOVA), F-Test (MCR), and Other F-Tests). Their relation to the noncentrality parameter lambda of the noncentral F distribution is given by lambda= f2 * N where
and rho^2 denotes the coefficient of determination in the population according to H1 (e.g., Koele, 1982, p. 514). For global ANOVA F-tests, rho^2 is just eta2. For special F-tests of main effects or interactions in complex ANOVA-designs, rho2 equals the partial eta2. Analogously, rho 2 coincides with the (partial) squared multiple correlation in multiple regression/correlation F-tests (cf. Cohen, 1988, Chap. 9.2.1). Chi-Square TestsFor Chi-Square tests based on m-cell contingency tables (m in N), Cohen (1977, 1988, Chap. 7) uses
as an effect size measure, where p0(i) and p1(i) denote the cell probabilities for the i-th cell according to H0 and H1, respectively. Then lambda = w2 * N is the noncentrality parameter of the noncentral chi-square distribution (Cohen, 1988, p. 549). |
|
When you use G*Power to perform an a priori power analysis, the program calculates the 'exact' sample size for you. Assume that this exact sample size for a t-test is 60.70. Of course, you cannot recruit 60.70 subjects. Therefore, G*Power rounds to the next reasonable integer for your t-test, which would be 62 (two groups of 31 subjects each). However, 62 is larger than 60.70, and one way to express what this means is to say that, with 62 subjects and all other parameters being equal, your t-test has more power to detect an effect than it would have given the 'exact' number of 60.70 subjects. This 'inflated' power value is displayed as Actual power. Note that in this way G*Power guarantees that with the sample size computed for an a priori power analysis, the power of your test is always at least the power you specified. |
|
|
|
|
|
|
|
|
Please report suggestions for improvements to Axel Buchner, Franz Faul, or Edgar Erdfelder. |