[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
I hope that someday Octave will include more statistics functions. If you would like to help improve Octave in this area, please contact bug@octave.org.
25.1 Basic Statistical Functions | ||
25.2 Tests | ||
25.3 Models | ||
25.4 Distributions |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
mean (x) = SUM_i x(i) / N |
With the optional argument opt, the kind of mean computed can be selected. The following options are recognized:
"a"
"g"
"h"
If the optional argument dim is supplied, work along dimension dim.
Both dim and opt are optional. If both are supplied, either may appear first.
x(ceil(N/2)), N odd median(x) = (x(N/2) + x((N/2)+1))/2, N even |
std (x) = sqrt (sumsq (x - mean (x)) / (n - 1)) |
The argument opt determines the type of normalization to use. Valid values are
The third argument dim determines the dimension along which the standard deviation is calculated.
cov (x, y)
is the covariance between the i-th
variable in x and the j-th variable in y. If called
with one argument, compute cov (x, x)
.
corrcoef (x, y)
is the correlation between the
i-th variable in x and the j-th variable in y.
If called with one argument, compute corrcoef (x, x)
.
kurtosis (x) = N^(-1) std(x)^(-4) sum ((x - mean(x)).^4) - 3 |
of x. If x is a matrix, return the kurtosis over the first non-singleton dimension. The optional argument dim can be given to force the kurtosis to be given over that dimension.
skewness (x) = N^(-1) std(x)^(-3) sum ((x - mean(x)).^3) |
of x. If x is a matrix, return the skewness along the first non-singleton dimension of the matrix. If the optional dim argument is given, operate along this dimension.
The argument opt determines the type of normalization to use. Valid values are
The third argument dim determines the dimension along which the variance is calculated.
Currently, only 1- and 2-dimensional tables are supported.
If x is a matrix, do the above along the first non-singleton dimension. If the optional argument dim is given then operate along this dimension.
If x is a vector, treat it as a column vector.
For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.
spearman (x)
is equivalent to spearman (x,
x)
.
For two data vectors x and y, Spearman's rho is the correlation of the ranks of x and y.
If x and y are drawn from independent distributions,
rho has zero mean and variance 1 / (n - 1)
, and is
asymptotically normally distributed.
If x is a matrix, do the above for along the first non-singleton dimension. If the optional argument dim is given, operate along this dimension.
If x is a matrix, do the above for each column of x.
If the optional argument dim is supplied, work along dimension dim.
If F is the CDF of the distribution dist with parameters params and G its inverse, and x a sample vector of length n, the QQ-plot graphs ordinate s(i) = i-th largest element of x versus abscissa q(if) = G((i - 0.5)/n).
If the sample comes from F except for a transformation of location and scale, the pairs will approximately follow a straight line.
The default for dist is the standard normal distribution. The optional argument params contains a list of parameters of dist. For example, for a quantile plot of the uniform distribution on [2,4] and x, use
qqplot (x, "uniform", 2, 4) |
If no output arguments are given, the data are plotted directly.
If F is the CDF of the distribution dist with parameters params and x a sample vector of length n, the PP-plot graphs ordinate y(i) = F (i-th largest element of x) versus abscissa p(i) = (i - 0.5)/n. If the sample comes from F, the pairs will approximately follow a straight line.
The default for dist is the standard normal distribution. The optional argument params contains a list of parameters of dist. For example, for a probability plot of the uniform distribution on [2,4] and x, use
ppplot (x, "uniform", 2, 4) |
If no output arguments are given, the data are plotted directly.
If x is a matrix, return the row vector containing the p-th moment of each column.
With the optional string opt, the kind of moment to be computed can
be specified. If opt contains "c"
or "a"
, central
and/or absolute moments are returned. For example,
moment (x, 3, "ac") |
computes the third central absolute moment of x.
If the optional argument dim is supplied, work along dimension dim.
log (p /
(1-p))
of p.
For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.
kendall (x)
is equivalent to kendall (x,
x)
.
For two data vectors x, y of common length n, Kendall's tau is the correlation of the signs of all rank differences of x and y; i.e., if both x and y have distinct entries, then
1 tau = ------- SUM sign (q(i) - q(j)) * sign (r(i) - r(j)) n (n-1) i,j |
in which the q(i) and r(i) are the ranks of x and y, respectively.
If x and y are drawn from independent distributions,
Kendall's tau is asymptotically normal with mean 0 and variance
(2 * (2n+5)) / (9 * n * (n-1))
.
If x is a matrix, do the above for first non singleton dimension of x.. If the option dim argument is given, then operate along this dimension.
If breaks is a scalar, the data is cut into that many
equal-width intervals. If breaks is a vector of break points,
the category has length (breaks) - 1
groups.
The returned value is a vector of the same size as x telling
which group each point in x belongs to. Groups are labelled
from 1 to the number of groups; points outside the range of
breaks are labelled by NaN
.
cor (x, y)
is
the correlation between the i-th variable in x and the
j-th variable in y.
For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.
cor (x)
is equivalent to cor (x, x)
.
- log (- log (x)) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Data may be given in a single vector y with groups specified by a corresponding vector of group labels g (e.g., numbers from 1 to k). This is the general form which does not impose any restriction on the number of data in each group or the group labels.
If y is a matrix and g is omitted, each column of y is treated as a group. This form is only appropriate for balanced ANOVA in which the numbers of samples from each group are all equal.
Under the null of constant means, the statistic f follows an F distribution with df_b and df_w degrees of freedom.
The p-value (1 minus the CDF of this distribution at f) is returned in pval.
If no output argument is given, the standard one-way ANOVA table is printed.
Under the null of equal variances, the test statistic chisq approximately ollows a chi-square distribution with df degrees of freedom.
The p-value (1 minus the CDF of this distribution at chisq) is returned in pval.
If no output argument is given, the p-value is displayed.
For large samples, the test statistic chisq approximately follows a
chisquare distribution with df = length (c)
degrees of freedom.
The p-value (1 minus the CDF of this distribution at chisq) is returned in pval.
If no output argument is given, the p-value is displayed.
The p-value (1 minus the CDF of this distribution at chisq) of the test is returned in pval.
If no output argument is given, the p-value is displayed.
The optional argument string alt describes the alternative
hypothesis, and can be "!="
or "<>"
(non-zero),
">"
(greater than 0), or "<"
(less than 0). The
default is the two-sided case.
The optional argument string method specifies on which
correlation coefficient the test should be based. If method is
"pearson"
(default), the (usual) Pearson's product moment
correlation coefficient is used. In this case, the data should come
from a bivariate normal distribution. Otherwise, the other two
methods offer nonparametric alternatives. If method is
"kendall"
, then Kendall's rank correlation tau is used. If
method is "spearman"
, then Spearman's rank correlation
rho is used. Only the first character is necessary.
The output is a structure with the following elements:
If no output argument is given, the p-value is displayed.
Under the null, the test statistic f follows an F distribution with df_num and df_den degrees of freedom.
The p-value (1 minus the CDF of this distribution at f) is returned in pval.
If not given explicitly, r = 0.
If no output argument is given, the p-value is displayed.
mean
(x) == m
.
Hotelling's T^2 is returned in tsq. Under the null, has an F distribution with and degrees of freedom, where and are the numbers of samples and variables, respectively.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
mean
(x) == mean (y)
.
Hotelling's two-sample T^2 is returned in tsq. Under the null,
(n_x+n_y-p-1) T^2 / (p(n_x+n_y-2)) |
has an F distribution with and degrees of freedom, where and are the sample sizes and is the number of variables.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
The optional argument params contains a list of parameters of dist. For example, to test whether a sample x comes from a uniform distribution on [2,4], use
kolmogorov_smirnov_test(x, "uniform", 2, 4) |
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative F
!= G. In this case, the test statistic ks follows a two-sided
Kolmogorov-Smirnov distribution. If alt is ">"
, the
one-sided alternative F > G is considered. Similarly for "<"
,
the one-sided alternative F > G is considered. In this case, the
test statistic ks has a one-sided Kolmogorov-Smirnov
distribution. The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value is displayed.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative F
!= G. In this case, the test statistic ks follows a two-sided
Kolmogorov-Smirnov distribution. If alt is ">"
, the
one-sided alternative F > G is considered. Similarly for "<"
,
the one-sided alternative F < G is considered. In this case, the
test statistic ks has a one-sided Kolmogorov-Smirnov
distribution. The default is the two-sided case.
The p-value of the test is returned in pval.
The third returned value, d, is the test statistic, the maximum vertical distance between the two cumulative distribution functions.
If no output argument is given, the p-value is displayed.
Suppose a variable is observed for k > 1 different groups, and let x1, ..., xk be the corresponding data vectors.
Under the null hypothesis that the ranks in the pooled sample are not affected by the group memberships, the test statistic k is approximately chi-square with df = k - 1 degrees of freedom.
The p-value (1 minus the CDF of this distribution at k) is returned in pval.
If no output argument is given, the p-value is displayed.
The data matrix is given by y. As usual, rows are observations and columns are variables. The vector g specifies the corresponding group labels (e.g., numbers from 1 to k).
The LR test statistic (Wilks' Lambda) and approximate p-values are computed and displayed.
Under the null, chisq is approximately distributed as chisquare with df degrees of freedom.
The p-value (1 minus the CDF of this distribution at chisq) is returned in pval.
If no output argument is given, the p-value of the test is displayed.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
p1 != p2. If alt is ">"
, the one-sided
alternative p1 > p2 is used. Similarly for "<"
,
the one-sided alternative p1 < p2 is used.
The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
The p-value of the test is returned in pval.
If no output argument is given, the p-value is displayed.
n = sum
(x != y)
and p = 1/2.
With the optional argument alt
, the alternative of interest
can be selected. If alt is "!="
or "<>"
, the
null hypothesis is tested against the two-sided alternative PROB
(x < y) != 1/2. If alt is ">"
, the
one-sided alternative PROB (x > y) > 1/2 ("x is
stochastically greater than y") is considered. Similarly for
"<"
, the one-sided alternative PROB (x > y) < 1/2
("x is stochastically less than y") is considered. The default is
the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
mean
(x) == m
. Under the null, the test statistic t
follows a Student distribution with df = length (x)
- 1
degrees of freedom.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
mean (x) != m
. If alt is ">"
, the
one-sided alternative mean (x) > m
is considered.
Similarly for "<", the one-sided alternative mean
(x) < m
is considered, The default is the two-sided
case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
mean (x) != mean (y)
. If alt is ">"
,
the one-sided alternative mean (x) > mean (y)
is
used. Similarly for "<"
, the one-sided alternative mean
(x) < mean (y)
is used. The default is the two-sided
case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
rr * b =
r
in a classical normal regression model y =
x * b + e
. Under the null, the test statistic t
follows a t distribution with df degrees of freedom.
If r is omitted, a value of 0 is assumed.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
rr * b != r
. If alt is ">"
, the
one-sided alternative rr * b > r
is used.
Similarly for "<", the one-sided alternative rr *
b < r
is used. The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
PROB (x > y) != 1/2. If alt is ">"
, the
one-sided alternative PROB (x > y) > 1/2 is considered.
Similarly for "<"
, the one-sided alternative PROB (x >
y) < 1/2 is considered, The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
var (x) != var (y)
. If alt is ">"
,
the one-sided alternative var (x) > var (y)
is
used. Similarly for "<", the one-sided alternative var
(x) > var (y)
is used. The default is the two-sided
case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
mean (x) != m
. If alt is ">"
, the
one-sided alternative mean(x) > m is considered. Similarly for
"<"
, the one-sided alternative mean(x) < m is
considered. The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
PROB (x > y) != 1/2. If alt is ">"
, the one-sided
alternative PROB (x > y) > 1/2 is considered. Similarly
for "<"
, the one-sided alternative PROB (x > y) <
1/2 is considered. The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
mean (x) ==
m
for a sample x from a normal distribution with unknown
mean and known variance v. Under the null, the test statistic
z follows a standard normal distribution.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
mean (x) != m
. If alt is ">"
, the
one-sided alternative mean (x) > m
is considered.
Similarly for "<"
, the one-sided alternative mean
(x) < m
is considered. The default is the two-sided
case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed along with some information.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
mean (x) != mean (y)
. If alt is ">"
, the
one-sided alternative mean (x) > mean (y)
is used.
Similarly for "<"
, the one-sided alternative mean
(x) < mean (y)
is used. The default is the two-sided
case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed along with some information.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Suppose y takes values in k ordered categories, and let
gamma_i (x)
be the cumulative probability that y
falls in one of the first i categories given the covariate
x. Then
[theta, beta] = logistic_regression (y, x) |
fits the model
logit (gamma_i (x)) = theta_i - beta' * x, i = 1, ..., k-1 |
The number of ordinal categories, k, is taken to be the number
of distinct values of round (y)
. If k equals 2,
y is binary and the model is ordinary logistic regression. The
matrix x is assumed to have full column rank.
Given y only, theta = logistic_regression (y)
fits the model with baseline logit odds only.
The full form is
[theta, beta, dev, dl, d2l, gamma] = logistic_regression (y, x, print, theta, beta) |
in which all output arguments and all input arguments except y are optional.
Stting print to 1 requests summary information about the fitted model to be displayed. Setting print to 2 requests information about convergence at each iteration. Other values request no information to be displayed. The input arguments theta and beta give initial estimates for theta and beta.
The returned value dev holds minus twice the log-likelihood.
The returned values dl and d2l are the vector of first and the matrix of second derivatives of the log-likelihood with respect to theta and beta.
p holds estimates for the conditional distribution of y given x.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
size (sz)
matrix of
random samples from the Beta distribution with parameters a and
b. Both a and b must be scalar or of size r
by c.
If r and c are omitted, the size of the result matrix is the common size of a and b.
size (sz)
matrix of
random samples from the binomial distribution with parameters n
and p. Both n and p must be scalar or of size
r by c.
If r and c are omitted, the size of the result matrix is the common size of n and p.
size (sz)
matrix of
random samples from the Cauchy distribution with parameters lambda
and sigma which must both be scalar or of size r by c.
If r and c are omitted, the size of the result matrix is the common size of lambda and sigma.
size (sz)
matrix of
random samples from the chisquare distribution with n degrees
of freedom. n must be a scalar or of size r by c.
If r and c are omitted, the size of the result matrix is the size of n.
If r and c are given create a matrix with r rows and c columns. Or if sz is a vector, create a matrix of size sz.
If r and c are given create a matrix with r rows and c columns. Or if sz is a vector, create a matrix of size sz.
The arguments can be of common size or scalar.
If r and c are omitted, the size of the result matrix is the size of lambda.
If r and c are omitted, the size of the result matrix is the common size of m and n.
size (sz)
matrix of
random samples from the Gamma distribution with parameters a
and b. Both a and b must be scalar or of size
r by c.
If r and c are omitted, the size of the result matrix is the common size of a and b.
If r and c are given create a matrix with r rows and c columns. Or if sz is a vector, create a matrix of size sz.
The parameters m, t, and n must positive integers with m and n not greater than t.
The parameters m, t, and n must positive integers with m and n not greater than t.
The arguments must be of common size or scalar.
If r and c are given create a matrix with r rows and c columns. Or if sz is a vector, create a matrix of size sz.
The parameters m, t, and n must positive integers with m and n not greater than t.
Inf Q(x) = SUM (-1)^k exp(-2 k^2 x^2) k = -Inf |
for x > 0.
The optional parameter tol specifies the precision up to which
the series should be evaluated; the default is tol = eps
.
log (a)
and variance v.
Default values are a = 1, v = 1.
log (a)
and
variance v.
Default values are a = 1, v = 1.
log (a)
and variance v.
Default values are a = 1, v = 1.
If r and c are omitted, the size of the result matrix is the common size of a and v.
Default values are m = 0, v = 1.
Default values are m = 0, v = 1.
Default values are m = 0, v = 1.
size (sz)
matrix of
random samples from the normal distribution with parameters m
and v. Both m and v must be scalar or of size
r by c.
If r and c are omitted, the size of the result matrix is the common size of m and v.
The number of failures in a Bernoulli experiment with success probability p before the n-th success follows this distribution.
The number of failures in a Bernoulli experiment with success probability p before the n-th success follows this distribution.
The number of failures in a Bernoulli experiment with success probability p before the n-th success follows this distribution.
If r and c are omitted, the size of the result matrix is the common size of n and p. Or if sz is a vector, create a matrix of size sz.
If r and c are omitted, the size of the result matrix is the size of lambda.
size (sz)
matrix of
random numbers from the standard normal distribution.
If r and c are omitted, the size of the result matrix is the size of n.
Default values are a = 0, b = 1.
Default values are a = 0, b = 1.
Default values are a = 0, b = 1.
size (sz)
matrix of
random samples from the uniform distribution on [a, b].
Both a and b must be scalar or of size r by c.
If r and c are omitted, the size of the result matrix is the common size of a and b.
1 - exp(-(x/sigma)^alpha) |
for x >= 0.
alpha * sigma^(-alpha) * x^(alpha-1) * exp(-(x/sigma)^alpha) |
for x > 0.
If r and c are omitted, the size of the result matrix is the common size of alpha and sigma.
The optional parameter n gives the number of summands used for simulating the process over an interval of length 1. If n is omitted, n = 1000 is used.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |