Means: Wilcoxon signed-rank test (one-sample case)

Note: This option will be available in the next version of G*Power.

The Wilcoxon signed-rank test is a nonparametric alternative to the one sample t test. Its use is mainly motivated by uncertainty concerning the assumption of normality made in the t test.

The Wilcoxon signed-rank test can be used to test whether a given distribution H is symmetric about zero. The power routines implemented in G*Power refer to the important special case of a "shift model", which states that H is obtained by subtracting two symmetric distributions F and G, where G is obtained by shifting F by an amount ∆: G(x) = F(x − ∆) for all x. The relation of this shift model to the one sample t test becomes clear if we assume that F is the fixed distribution with mean µ0 stated in the null hypothesis, and that G is the distribution of the test group with mean µ. Under these assumptions H(x) = F(x) − G(x) is symmetric about zero under H0, that is if ∆ = µ - µ0 = 0 or, equivalently, if F(x) = G(x). Alternatively, H(x) = F(x) − G(x)  is asymmetric under H1, that is if ∆ ≠ 0.

The Wilcoxon signed-rank test is based on ranks. Assume that a sample of size N is drawn from a distribution H(x). To each sample value xi a rank S between 1 and N is assigned that corresponds to the position of |xi| in a increasingly ordered list of all absolute sample values. The general idea of the test is to calculate the sum of the ranks assigned to positive sample values (x > 0) and the sum of the ranks assigned to negative sample values (x < 0) and to reject the hypothesis that H is symmetric if these two rank sums are clearly different.

The actual procedure is as follows: The rank sum of negative values is known if that of positive values is given. It therefore suffices to consider the rank sum Vs of positive values. The positive ranks can be specified by a n-tupel (S1 , . . . , Sn), where 0 ≤ nN.

There are N!/(n! · (N - n)!) possible n-tuples for a given n. Note that n can take on the values 0, 1, . . . , N. The total number of possible choices for the S's therefore is ∑1=0..N N!/(i! · (N - i)!) = 2N . (We here assume a continuous distribution H for which the probabilities for ties, that is, the occurance of two identical |x| is zero.) Therefore, if the null hypothesis is true, then the probability to observe a particular n and a certain n-tuple is P(N+ = n; S1 = s1 , . . . , Sn = sn) = 1/2N. To calculate the probability to observe, under H0, a particular positive rank sum Vs = S1 + . . . + Sn we just need to count the number k of all tuples with rank sum Vs and to add up their probabilities. Thus P(Vs = v) = k/2n . Repeating this for all possible Vs between the minimal value 0 (corresponding to the case n = 0) and the maximal value N(N + 1)/2 (corresponding to the n = N tuple (1, 2, . . . , N), gives the discrete probability distribution of Vs under H0. This distribution is symmetric about N (N + 1)/4. Referring to this probability distribution we choose, in a one-sided test, a critical value c with P(V ≥ c) ≤ α and reject the null hypothesis if a rank sum V > c is observed. With increasing sample size the exact distribution converges rapidly to the normal distribution with mean E(Vs) = N (N + 1)/4 and variance Var(Vs) = N (N + 1)(2N + 1)/24.

Power of the Wilcoxon rank-sum test

The signed-rank test as described above is distribution free in the sense that its validity does not depend on the specific form of the response distribution H. This distribution independence does no longer hold, however, if one wants to estimate numerical values for the power of the test. The reason is that the effect of a certain shift ∆ on the deviation from symmetry and therefore the distribution of Vs depends on the specific form of F (and G). For power calculations it is therefore necessary to specify the response distribution F. G*Power provides three predefined continuous and symmetric response functions that differ with respect to kurtosis, that is, the "peakedness" of the distribution.

Normal distribution N(µ, σ2 )

distribution function normal.png

Laplace or Double Exponential distribution
distribution function laplace.png

Logistic distribution
distribution function logistic.png


Scaled and/or shifted versions of the Laplace and Logistic densities that can be calculated by applying the transformation 1/a p((x - b)/a), a > 0, are again probability densities and are referred to by the same name.

Approaches to the power analysis

G*Power implements two different methods to estimate the power for the Wilcoxon signed-rank test:
  1. The asymptotic relative efficiency (A.R.E.) method that defines power relative to the one-sample t test, and
  2. a normal approximation to the power proposed by Lehmann (1975, pp. 164-166).
We describe the general idea of both methods in turn. More specific information can be found in the implementation section below.

A.R.E-method

The A.R.E method assumes the shift model described in the introduction. It relates normal approximations to the power of the one-sample t test (Lehmann, 1975, Eq. (4.44), p. 172) and the Wilcoxon test for a specified distribution H (Lehmann, 1975, Eq. (4.15), p. 160). If, for a model with fixed H and ∆, the sample size N is required to achieve a specified power for the Wilcoxon signed-rank test and a samples size N' is required in the t test to achieve the same power, then the ratio N'/N is called the efficiency of the Wilcoxon signed-rank test relative to the one-sample t test. The limiting efficiency as sample size N increases towards infinity is called the asymptotic relative efficiency (A.R.E. or Pitman efficiency) of the Wilcoxon signed-rank test relative to the t test. It is given by (Hettmansperger, 1984, p. 71)

means_wilcoxon_signed_rank_test_one_sample_case_ARE.png
Note that the A.R.E. of the Wilcoxon signed-rank test to the one-sample t test is identical to the A.R.E of the Wilcoxon rank-sum test to the two-sample t test (if H = F; for the meaning of F see the documentation of the Wilcoxon rank-sum test).

If H is a normal distribution, then the A.R.E. is 3/π ≈ 0.955. This shows that the efficiency of the Wilcoxon test relative to the t test is rather high even if the assumption of normality made in the t test is true. It can be shown that the minimal A.R.E. (for H with finite variance) is 0.864. For non-normal distributions the Wilcoxon test can be much more efficient than the t test. The A.R.E.s for some specific distributions are given in the implementation notes. To estimate the power of the Wilcoxon test for a given H with the A.R.E. method one basically scales the sample size with the corresponding A.R.E. value and then performs the procedure for the t test for two independent means.

Lehmann method

The computation of the power requires the distribution of Vs for the non-null case, that is, for cases in which H is not symmetric about zero. The Lehmann method uses the fact that

(VsE(Vs))/√(Var(Vs))
tends towards the standard normal distribution as N approaches infinity for any fixed distributions H for which 0 < P(X < 0) < 1. The problem is then to compute expectation and variance of Vs. These values depend on three "moments" p1, p2, p3, which are defined as:
- p1 = P(X < 0).
- p2 = P(X + Y > 0).
- p3 = P(X + Y > 0 and X + Z > 0)
where X, Y, and Z are independent random variables with distribution H. The expectation and variance are given as
 
E(Vs) = N (N − 1)p2/2 + N p1
V ar(Vs) = N (N − 1)(N − 2)(p3p21) + N (N − 1)[2(p1p2)2 + 3p2(1 − p2)]/2 + N p1 (1 − p1).
The value p1 is easy to interpret: If H is continuous and shifted by an amount ∆ > 0 to larger values, then p1 is the probability to observe a negative value. For a null shift (no treatment effect, ∆ = 0, i.e. H symmetric about zero) we get p1 = 1/2.

If c denotes the critical value of a level-α test, and Φ denotes the CDF of the standard normal distribution, then the normal approximation of the power of the (one-sided) test is given by

Π(H) ≈ 1 − Φ [(caE(Vs))/√(Var(Vs))]
where a = 0.5 if a continuity correction is applied, and a = 0 otherwise. The formulae for p1, p2, and p3 for the predefined distributions are given in the implementation section below.

Effect size index

The conventional values proposed by Cohen (1969, p. 38) for the t test are applicable. He defined the following conventional values for d:
small d = 0.2
medium d = 0.5
large d = 0.8
Pressing the button Determine on the left side of the effect size label opens the effect size dialog. You can use this dialog to calculate d from the means and a common standard deviations in the two populations.

t means difference from constant effect size drawer.png

If the sample sizes are equal (n1 = n2) a mean σ' may be used as the common within-population σ (Cohen, 1969, p.42):
σ' = √((σ12 + σ22)/2)
where σ12 and σ22 are the variances in populations 1 and 2, respectively. This is the formula used by G*Power  when you select the n1 = n2 option in the effect size drawer.

In the case of substantially different sample sizes the n1 = n2 option should not be used because it may lead to power values that differ greatly from the true values (Cohen, 1969, p.42).

If you have unequal sample sizes and unequal variances in the populations from which the samples were or are to be drawn, then it is very reasonable to bring the samples to equal sizes.

Options

This test has no options.

Examples


Related tests



Implementation notes

The H0 distribution is the central Student t distribution t(N k − 2, 0). The H1 distribution is the noncentral Student t distribution t(N k − 2, δ), where the noncentrality parameter δ is given by: δ = d √((N1 N2 k)/(N1 + N2)). Parameter k represents the asymptotic relative efficiency relative to correspondig t tests (Lehmann, 1975, p. 371ff) and depends in the following way on the parent distribution:

Uniform parent distribution: k = 1.0
Normal parent distribution: k = 3/pi
Logistic parent distribution: k = π2 /9
Laplace parent distribution: k = 3/2

Min ARE = 0.864; this is a limiting case that gives a theoretic al minimum of the power for the Wilcoxon-Mann-Whitney test.

Validation

The results were checked against the values produced by PASS (Hintze, 2006) and those produced by unifyPow (O’Brien, 1998). There was complete correspondence with the values given in O’Brien, while there were slight differences to those produced by PASS. The reason of these differences seems to be that PASS truncates the weighted sample sizes to integer values.

References

Hettmansperger, T. P. (1984). Statistical inference based on ranks. New York: Wiley.


Hintze, J. (2006). NCSS, PASS, and GESS. Kaysville, Utah: NCSS.

Lehmann, E. L. (1975). Nonparametrics. Statistical methods based on ranks. San Francisco, CA: Holden-Day.

O’Brien, R. (2002). Sample size analysis in study planning (using unifypow.sas). (available on the WWW: http://www.bio.ri.ccf.org/UnifyPow.all/UnifyPowNotes020811.pdf)


    Freitag, 10. 02. 2012


gpicon-128.png

Questions about this website? Contact

Axel Buchner


Letzte Änderung: 05.06.2009, 12:17
Seitenende