R软件中的wilcox.test()检验
(2012-03-12 17:57:26)
标签:
杂谈 |
分类: R软件学习 |
Wilcoxon Rank Sum and Signed Rank Tests
Description
Performs one- and two-sample Wilcoxon tests on vectors of data; the latter is also known as ‘Mann-Whitney’ test.
Usage
wilcox.test(x, ...) ## Default S3 method: wilcox.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, exact = NULL, correct = TRUE, conf.int = FALSE, conf.level = 0.95, ...) ## S3 method for class 'formula' wilcox.test(formula, data, subset, na.action, ...)
Arguments
x |
numeric vector of data values. Non-finite (e.g. infinite or missing) values will be omitted. |
y |
an optional numeric vector of data values: as with
x non-finite values will be omitted. |
alternative |
a character string specifying the alternative hypothesis, must
be one of "two.sided" (default),
"greater" or "less" . You can specify just
the initial letter. |
mu |
a number specifying an optional parameter used to form the null hypothesis. See ‘Details’. |
paired |
a logical indicating whether you want a paired test. |
exact |
a logical indicating whether an exact p-value should be computed. |
correct |
a logical indicating whether to apply continuity correction in the normal approximation for the p-value. |
conf.int |
a logical indicating whether a confidence interval should be computed. |
conf.level |
confidence level of the interval. |
formula |
a formula of the form lhs ~ rhs where
lhs is a numeric variable giving the data values and
rhs a factor with two levels giving the corresponding
groups. |
data |
an optional matrix or data frame (or similar: see
model.frame ) containing the variables in the formula
formula . By default the variables are taken from
environment(formula) . |
subset |
an optional vector specifying a subset of observations to be used. |
na.action |
a function which indicates what should happen when the data
contain NA s. Defaults to
getOption("na.action") . |
... |
further arguments to be passed to or from methods. |
Details
The formula interface is only applicable for the 2-sample tests.
If only x
is given, or if both x
and
y
are given and paired
is
TRUE
, a Wilcoxon signed rank test of the null that the
distribution of x
(in the one sample case) or of
x - y
(in the paired two sample case) is symmetric
about mu
is performed.
Otherwise, if both x
and y
are given
and paired
is FALSE
, a Wilcoxon rank sum
test (equivalent to the Mann-Whitney test: see the Note) is carried
out. In this case, the null hypothesis is that the distributions of
x
and y
differ by a location shift of
mu
and the alternative is that they differ by some
other location shift (and the one-sided alternative
"greater"
is that x
is shifted to the
right of y
).
By default (if exact
is not specified), an exact
p-value is computed if the samples contain less than 50 finite
values and there are no ties. Otherwise, a normal approximation is
used.
Optionally (if argument conf.int
is true), a
nonparametric confidence interval and an estimator for the
pseudomedian (one-sample case) or for the difference of the
location parameters x-y
is computed. (The pseudomedian
of a distribution F is the median of the distribution of
(u+v)/2, where u and v are independent, each
with distribution F. If F is symmetric, then the
pseudomedian and median coincide. See Hollander &
Wolfe (1973), page 34.) Note that in the two-sample case the
estimator for the difference in location parameters does not
estimate the difference in medians (a common misconception) but
rather the median of the difference between a sample from
x
and a sample from y
.
If exact p-values are available, an exact confidence interval is
obtained by the algorithm described in Bauer (1972), and the
Hodges-Lehmann estimator is employed. Otherwise, the returned
confidence interval and point estimate are based on normal
approximations. These are continuity-corrected for the interval but
not the estimate (as the correction depends on the
alternative
).
With small samples it may not be possible to achieve very high confidence interval coverages. If this happens a warning will be given and an interval with lower coverage will be substituted.
Value
A list with class "htest"
containing the following
components:
statistic |
the value of the test statistic with a name describing it. |
parameter |
the parameter(s) for the exact distribution of the test statistic. |
p.value |
the p-value for the test. |
null.value |
the location parameter mu . |
alternative |
a character string describing the alternative hypothesis. |
method |
the type of test applied. |
data.name |
a character string giving the names of the data. |
conf.int |
a confidence interval for the location parameter. (Only present
if argument conf.int = TRUE .) |
estimate |
an estimate of the location parameter. (Only present if
argument conf.int = TRUE .) |
Warning
This function can use large amounts of memory and stack (and
even crash R if the stack limit is exceeded) if
exact = TRUE
and one sample is large (several
thousands or more).
Note
The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. The two most common definitions correspond to the sum of the ranks of the first sample with the minimum value subtracted or not: R subtracts and S-PLUS does not, giving a value which is larger by m(m+1)/2 for a first sample of size m. (It seems Wilcoxon's original paper used the unadjusted sum of the ranks but subsequent tables subtracted the minimum.)
R's value can also be computed as the
number of all pairs (x[i], y[j])
for which
y[j]
is not greater than x[i]
, the most
common definition of the Mann-Whitney test.
References
David F. Bauer (1972), Constructing confidence sets using rank statistics. Journal of the American Statistical Association 67, 687–690.
Myles Hollander and Douglas A. Wolfe (1973), Nonparametric
Statistical Methods. New York: John Wiley &
Sons. Pages 27–33 (one-sample), 68–75 (two-sample).
Or second edition (1999).
See Also
wilcox_test
in package coin for exact, asymptotic and
Monte Carlo conditional p-values, including in the
presence of ties.
kruskal.test
for testing homogeneity in location parameters in the case of two
or more samples; t.test
for an alternative under normality assumptions [or large
samples]
Examples
require(graphics) ## One-sample test. ## Hollander & Wolfe (1973), 29f. ## Hamilton depression scale factor measurements in 9 patients with ## mixed anxiety and depression, taken at the first (x) and second ## (y) visit after initiation of a therapy (administration of a ## tranquilizer). x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30) y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) wilcox.test(x, y, paired = TRUE, alternative = "greater") wilcox.test(y - x, alternative = "less") # The same. wilcox.test(y - x, alternative = "less", exact = FALSE, correct = FALSE) # H&W large sample # approximation ## Two-sample test. ## Hollander & Wolfe (1973), 69f. ## Permeability constants of the human chorioamnion (a placental ## membrane) at term (x) and between 12 to 26 weeks gestational ## age (y). The alternative of interest is greater permeability ## of the human chorioamnion for the term pregnancy. x <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46) y <- c(1.15, 0.88, 0.90, 0.74, 1.21) wilcox.test(x, y, alternative = "g") # greater wilcox.test(x, y, alternative = "greater", exact = FALSE, correct = FALSE) # H&W large sample # approximation wilcox.test(rnorm(10), rnorm(10, 2), conf.int = TRUE) ## Formula interface. boxplot(Ozone ~ Month, data = airquality) wilcox.test(Ozone ~ Month, data = airquality, subset = Month %in% c(5, 8))