This function considers the general problem of inference from the permutation distribution when comparing parameters from k populations. The test statistics will be based on the difference of estimators that are asymptotically linear. For illustrative purposes we will consider here the 2 sample case, but the function works for k-samples.

Difference of means: Here, the null hypothesis is of the form \(H_0: \mu(P)-\mu(Q)=0\), and the corresponding test statistic is given by $$T_{m,n}=\frac{N^{1/2}(\bar{X}_m-\bar{Y}_n)}{\sqrt{\frac{N}{m}\sigma^2_m(X_1,\dots,X_m)+ \frac{N}{n}\sigma^2_n(Y_1,\dots,Y_n)}}$$ where \(\bar{X}_m\) and \(\bar{Y}_n\) are the sample means from population \(P\) and population \(Q\), respectively, and \(\sigma^2_m(X_1,\dots,X_m)\) is a consistent estimator of \(\sigma^2(P)$ when $X_1,\dots,X_m\) are i.i.d. from \(P\). Assume consitency also under \(Q\).

Difference of medians: Let \(F\) and \(G\) be the CDFs corresponding to \(P\) and \(Q\), and denote \(\theta(F)\) the median of \(F\) i.e. \(\theta(F)=\inf\{x:F(x)\ge1/2\}\). Assume that \(F\) is continuously differentiable at \(\theta(P)\) with derivative \(F'\) (and the same with \(F\) replaced by \(G\)). Here, the null hypothesis is of the form \(H_0: \theta(P)-\theta(Q)=0\), and the corresponding test statistic is given by $$T_{m,n}=\frac{N^{1/2}\left(\theta(\hat{P}_m)-\theta(\hat{Q})\right)}{\hat{\upsilon}_{m,n}}$$ where \(\hat{\upsilon}_{m,n}\) is a consistent estimator of \(\upsilon(P,Q)\): $$\upsilon(P,Q)=\frac{1}{\lambda}\frac{1}{4(F'(\theta))^2}+\frac{1}{1-\lambda}\frac{1}{4(G'(\theta))^2}$$ Choices of \(\hat{\upsilon}_{m,n}\) may include the kernel estimator of Devroye and Wagner (1980), the bootstrap estimator of Efron (1992), or the smoothed bootstrap Hall et al. (1989) to list a few. For further details, see Chung and Romano (2013). Current implementation uses the bootstrap estimator of Efron (1992)

Difference of variances: Here, the null hypothesis is of the form \(H_0: \sigma^2(P)-\sigma^2(Q)=0\), and the corresponding test statistic is given by $$T_{m,n}=\frac{N^{1/2}(\hat{\sigma}_m^2(X_1,\dots,X_,)-\hat{\sigma}_n^2(Y_1,\dots,Y_n))}{\sqrt{\frac{N}{m}(\hat{\mu}_{4,x}-\frac{(m-3)}{(m-1)}(\hat{\sigma}_m^2)^2)+\frac{N}{n}(\hat{\mu}_{4,y}-\frac{(n-3)}{(n-1)}(\hat{\sigma}_y^2)^2)}}$$ where \(\hat{\mu}_{4,m}\) the sample analog of \(E(X-\mu)^4\) based on an iid sample \(X_1,\dots,X_m\) from \(P\). Similarly for \(\hat{\mu}_{4,n}\).

RPT(formula, data, test = "means", n.perm = 499, na.action)

Arguments

formula

a formula object, in which the response variable is on the left of a ~ operator, and the groups on the right.

data

a data.frame containing the named variables needed for the formula. If this argument is missing, then the variables in the formula should be on the search list.

test

testing problem. It admits "means" if the objective is to test for difference of Means, "medians" for difference of Medians, and "variances" for difference of Variances. In the case the user is interested in testing for difference of medians, the Efron (1992) bootstrap estimator is used to estimate the variances (For further details, see Chung and Romano (2013))

n.perm

Numeric. Number of permutations needed for the stochastic approximation of the p-values. The default is n.perm=499.

na.action

a function to filter missing data. This is applied to the model.frame . The default is na.omit, which deletes observations that contain one or more missing values.

Value

An object of class "RPT" is a list containing at least the following components:

description

Type of test, can be Difference of Means, Medians, or Variances.

n_populations

Number of grups.

N

Sample Size.

T.obs

Observed test statistic.

pvalue

P-value.

T.perm

Vector. Test statistics calculated from the permutations of the data.

n_perm

Number of permutations.

parameters

Estimated parameters.

sample_sizes

Sample size of groups.

References

Chung, E. and Romano, J. P. (2013). Exact and asymptotically robust permutation tests. The Annals of Statistics, 41(2):484–507.

Chung, E. and Romano, J. P. (2016). Asymptotically valid and exact permutation tests based on two-sample u-statistics. Journal of Statistical Planning and Inference, 168:97–105.

Devroye, L. P. and Wagner, T. J. (1980). The strong uniform consistency of kernel density estimates. In Multivariate Analysis V: Proceedings of the fifth International Symposium on Multivariate Analysis, volume 5, pages 59–77.

Efron, B. (1992). Bootstrap methods: another look at the jackknife. In Breakthroughs in statistics, pages 569–593. Springer. Hall, P., DiCiccio, T. J., and Romano, J. P. (1989). On smoothing and the bootstrap. The Annals of Statistics, pages 692–704.

Examples

# NOT RUN {
male<-rnorm(50,1,1)
female<-rnorm(50,1,2)
dta<-data.frame(group=c(rep(1,50),rep(2,50)),outcome=c(male,female))
rpt.var<-RPT(dta$outcome~dta$group,test="variances")
summary(rpt.var)

# }