Linear bivariate regression: Two groups, difference between slopes

A linear regression is used to estimate the parameters a, b of a linear relationship Y = a + bX between the dependent variable Y and the independent variable X. X is assumed to be a set of fixed values, whereas Yi is modeled as a random variable: Yi = a + bXi + εi, where εi denotes normally distributed random errors with mean 0 and standard deviation σi. A common assumption also adopted here is that all σi's are identical, that is σi = σ. The standard deviation of the error is also called the standard deviation of the residuals.

Having determined the linear relationships between X and Y in two groups: Y = a1 + b1X, Y = a2 + b2X, we may ask whether the slopes b1 and b2 are identical. The null and the two-sided alternative hypotheses are

H0 : b1b2 = 0
H1 : b1b2 ≠ 0.

Effect size index

The absolute value of the difference between the slopes |∆slope| = |b1b2| (Slope H1) is used as effect size. To fully specify the effect size, the following additional inputs must be given:

Std dev residual σ

The standard deviation σ > 0 of the residuals in the combined data set (i.e. the square root of the weighted sum of the residual variances in the two data sets): If σ2r1 and σ2r2 denote the variance of the residuals r1 = (a1 + b1 X1) − Y1 and r2 = (a2 + b2 X2) − Y2 in Groups 1 nd 2, and n1, n2 denote the sample sizes in Groups 1 and 2, then

t linear regression Identity of slopes  two groups e01.png


Std dev σ_x1

The standard deviation σx1 > 0 of the X values in Group 1.

Std dev σ_x2

The standard deviation σx2 > 0 of the X values in Group 2.

Important relationships between the standard deviations σxi of Xi, σyi of Yi, the slopes bi of the regression lines, and the correlation coefficient ρi between Xi and Yi are:

σyi = (bi σxi)/ρi
σyi = σri / √(1 − ρ2i)
where σi denotes the standard deviation of the residuals Yi − (bi X + ai).

The effect size dialog may be used to determine Std dev residual σ (the standard deviation of the residuals) and |∆ slope| (the absolute value of the difference between the slopes, |b1b2|) from other values based on the equations above. Pressing the Determine button on the left side of the effect size label in the main window opens the effect size drawer.

t linear regression Identity of slopes  two groups effect size drawer.png



The input variables are located on the left side of the arrow '=>', the output variables are located on the right side. The input values must conform to the usual restrictions, that is, σxi > 0, σx2 > 0, −1 < ρi < 1. In addition, the fact that σyi = (bi σxi)/ρi, together with the restriction on ρi implies the additional restriction −1 < b·σxiyi < 1.

Clicking on the Calculate and transfer to main window button copies the values given in Std dev σ_x1, Std dev σ_x2, Std dev residual σ, Allocation ratio N2/N1, and |∆slope| to the corresponding input fields in the main window.

Options

This test has no options.

Examples

We replicate an example given on page 594 in Dupont and Plummer (1998) that refers to an example in Armitage, Berry, and Matthews, 2002 (p. 325). The data and relevant statistics are shown in below. Note: Contrary to Dupont and Plummer (1998), we here consider the data as hypothesized true values and normalize the variance by N, not by (N − 1).

t linear regression Identity of slopes  two groups table 01.png



The relation of age and vital capacity for two groups of men working in the cadmium industry is investigated. Group 1 includes n1 = 28 workers with less than 10 years of cadmium exposure. Group 2 includes n2 = 44 workers never exposed to cadmium.

The standard deviation of the ages in both groups are σx1 = 9.029 and σx2 = 11.87. Regressing vital capacity on age gives the following slopes of the regression lines: β1 = −0.04653 and β2 = −0.03061. To calculate the pooled standard deviation of the residuals we use the effect size drawer. We use sx, sy, slope => residual s, r input mode and insert the values given above, the standard deviations σy1, σy2 of y (capacity) as given the table above, and the allocation ratio n2/n1 = 44/28 = 1.571428. This results in a pooled standard deviation of the residuals of σ = 0.5578413 (see the effect size drawer above).

We want to recruit enough workers to detect a true difference in slope of |(−0.03061) − (−0.04653)| = 0.01592 with power of .80, α = 0.05 and the same allocation ratio to the two groups as in the sample data.

Select

Type of power analysis: A priori

Input

Tail(s): Two
|∆ slope|: 0.01592
α err prob: 0.05
Power (1- β ): 0.80
Allocation ratio N2/N1: 1.571428
Std dev residual σ: 0.5578413
Std dev σ x1: 9.02914
Std dev σ x2: 11.86779

Output

Noncentrality parameter δ: 2.811598
Critical t: 1.965697
Df: 415
Sample size group 1: 163
Sample size group 2: 256
Total sample size: 419
Actual power: 0.800980

The output shows that we need 419 workers in total, with 163 in Group 1 and 256 in Group 2. These values are close to those reported in Dupont and Plummer (1998, p. 596) for this example (166 + 261 = 427). The slight difference is due to the fact that Dupont and Plummer normalized the variances by N − 1, and their using shifted central t distributions instead of non-central t distributions.

Relation to Multiple Regression: Special

The present procedure is essentially a special case of the multiple regression procedure, but provides a more convenient interface. To show this, we demonstrate how the multiple regression procedure can be used to compute the example above (see also Dupont and Plummer, 1998, p.597). First, the data are combined and extended into a data set of size n1 + n2 = 28 + 44 = 72. With respect to this combined data set we define the following variables (vectors of length 72):
  • y contains the measured vital capacity
  • x1 contains the age data
  • x2 codes group membership (0 = not exposed, 1 = exposed)
  • x3 contains the element-wise product of x1 and x2

The multiple regression model

y = β0 + β1 x1 + β2 x2 + β3 x3 + εi

reduces to y = β0 + β1 x1 and y = (β0 + β2 x2) + (β1 + β3) x1 for unexposed and exposed workers, respectively. In this model, β3 represents the difference in slope between both groups, which is assumed to be zero under the null hypothesis. Thus, the above model reduces to

y = β0 + β1 x1 + β2 x2 + εi

if the null hypothesis is true. 

Performing a multiple regression analysis with the full model leads to β1 = −0.01592 and R21 = 0.3243. With the reduced model assumed in the null hypothesis one finds R20 = 0.3115. From these values we compute the following effect size

f 2 = (R21R20)/(1 − R21) = 0.3243 − 0.3115 1 − 0.3243 = 0.018870 75= 0.3115. 

Performing an a priori power analysis with α err prob = 0.05, Power (1-β err prob) = 0.80, Numerator df = 1 and Number of predictors = 3, we get N = 418, that is almost the same result as in the example above.

Related tests

Correlation: Point biserial model
Linear multiple regression: R2 deviation from zero


Implementation notes

The procedure implements a slight variant of the algorithm proposed in Dupont and Plummer (1998). The only difference is that we replaced their approximation of the noncentral t distributions by shifted central t distribution with noncentral distributions. In most cases this makes no big difference. 

The H0 distribution is the central t distribution with df = n1 + n2 − 4 degrees of freedom, where n1 and n2 denote the sample sizes in the two groups. The H1 distribution is the non-central t distribution with the same degrees of freedom and the noncentrality parameter δ = ∆√(n2).


Statistical Test

The power is calculated for the t test for equal slopes as described in Armitage et al. (2002) in Chapter 11. The test statistic is (see their Equations 11.18, 11.19, 11.20):

t-linear-regression-Identity-of-slopes--two-groups-e02.png

with df = n1 − n2 − 4 degrees of freedom.

Let for group i ∈ {1, 2}, Sxi , Syi, denote the sum of squares in X and Y (i.e. the variance times ni). Then a pooled estimate of the residual variance be obtained by s2r = (Sy1 + Sy)/(n1 + n2 − 4). The standard error of the difference of slopes is

t-linear-regression-Identity-of-slopes--two-groups-e03.png

Power of the test

In the procedure for equal slopes the noncentrality parameter is δ = ∆√n, with ∆ = |∆slope|/σR and

t-linear-regression-Identity-of-slopes--two-groups-e04.png

where m = n1/n2, σx1 and σx2 are the standard deviations in Groups 1 and 2, respectively, and σ is the common standard deviation of the residuals.

Validation

The results were checked for a range of input scenarios against the values produced by the PS program published by Dupont and Plummer (1998). Only slight deviations were found that are probably due to the use of the noncentral t distribution in G*Power instead of the shifted central t distributions that are used in PS.

References


Armitage, P., Berry, G., & Matthews, J. N. S. (2002). Statistical methods in medical research (4th ed.). Oxford: Blackwell Science.

Dupont, W. D., & Plummer, W. D., Jr. (1998). Power and sample size calculations for studies involving linear regression. Controlled clinical trials, 19, 589-601.

    Freitag, 10. 02. 2012


gpicon-128.png

Questions about this website? Contact

Axel Buchner


Letzte Änderung: 06.12.2009, 21:01
Seitenende