This function considers the general problem of inference from the permutation distribution when comparing parameters from k populations. The test statistics will be based on the difference of estimators that are asymptotically linear. For illustrative purposes we will consider here the 2 sample case, but the function works for k-samples.
Difference of means: Here, the null hypothesis is of the form \(H_0: \mu(P)-\mu(Q)=0\), and the corresponding test statistic is given by $$T_{m,n}=\frac{N^{1/2}(\bar{X}_m-\bar{Y}_n)}{\sqrt{\frac{N}{m}\sigma^2_m(X_1,\dots,X_m)+ \frac{N}{n}\sigma^2_n(Y_1,\dots,Y_n)}}$$ where \(\bar{X}_m\) and \(\bar{Y}_n\) are the sample means from population \(P\) and population \(Q\), respectively, and \(\sigma^2_m(X_1,\dots,X_m)\) is a consistent estimator of \(\sigma^2(P)$ when $X_1,\dots,X_m\) are i.i.d. from \(P\). Assume consitency also under \(Q\).
Difference of medians: Let \(F\) and \(G\) be the CDFs corresponding to \(P\) and \(Q\), and denote \(\theta(F)\) the median of \(F\) i.e. \(\theta(F)=\inf\{x:F(x)\ge1/2\}\). Assume that \(F\) is continuously differentiable at \(\theta(P)\) with derivative \(F'\) (and the same with \(F\) replaced by \(G\)). Here, the null hypothesis is of the form \(H_0: \theta(P)-\theta(Q)=0\), and the corresponding test statistic is given by $$T_{m,n}=\frac{N^{1/2}\left(\theta(\hat{P}_m)-\theta(\hat{Q})\right)}{\hat{\upsilon}_{m,n}}$$ where \(\hat{\upsilon}_{m,n}\) is a consistent estimator of \(\upsilon(P,Q)\): $$\upsilon(P,Q)=\frac{1}{\lambda}\frac{1}{4(F'(\theta))^2}+\frac{1}{1-\lambda}\frac{1}{4(G'(\theta))^2}$$ Choices of \(\hat{\upsilon}_{m,n}\) may include the kernel estimator of Devroye and Wagner (1980), the bootstrap estimator of Efron (1992), or the smoothed bootstrap Hall et al. (1989) to list a few. For further details, see Chung and Romano (2013). Current implementation uses the bootstrap estimator of Efron (1992)
Difference of variances: Here, the null hypothesis is of the form \(H_0: \sigma^2(P)-\sigma^2(Q)=0\), and the corresponding test statistic is given by $$T_{m,n}=\frac{N^{1/2}(\hat{\sigma}_m^2(X_1,\dots,X_,)-\hat{\sigma}_n^2(Y_1,\dots,Y_n))}{\sqrt{\frac{N}{m}(\hat{\mu}_{4,x}-\frac{(m-3)}{(m-1)}(\hat{\sigma}_m^2)^2)+\frac{N}{n}(\hat{\mu}_{4,y}-\frac{(n-3)}{(n-1)}(\hat{\sigma}_y^2)^2)}}$$ where \(\hat{\mu}_{4,m}\) the sample analog of \(E(X-\mu)^4\) based on an iid sample \(X_1,\dots,X_m\) from \(P\). Similarly for \(\hat{\mu}_{4,n}\).
RPT(formula, data, test = "means", n.perm = 499, na.action)
| formula | a formula object, in which the response variable is on the left of a ~ operator, and the groups on the right. |
|---|---|
| data | a data.frame containing the named variables needed for the formula. If this argument is missing, then the variables in the formula should be on the search list. |
| test | testing problem. It admits "means" if the objective is to test for difference of Means, "medians" for difference of Medians, and "variances" for difference of Variances. In the case the user is interested in testing for difference of medians, the Efron (1992) bootstrap estimator is used to estimate the variances (For further details, see Chung and Romano (2013)) |
| n.perm | Numeric. Number of permutations needed for the stochastic approximation of the p-values. The default is n.perm=499. |
| na.action | a function to filter missing data. This is applied to the model.frame . The default is na.omit, which deletes observations that contain one or more missing values. |
An object of class "RPT" is a list containing at least the following components:
Type of test, can be Difference of Means, Medians, or Variances.
Number of grups.
Sample Size.
Observed test statistic.
P-value.
Vector. Test statistics calculated from the permutations of the data.
Number of permutations.
Estimated parameters.
Sample size of groups.
Chung, E. and Romano, J. P. (2013). Exact and asymptotically robust permutation tests. The Annals of Statistics, 41(2):484–507.
Chung, E. and Romano, J. P. (2016). Asymptotically valid and exact permutation tests based on two-sample u-statistics. Journal of Statistical Planning and Inference, 168:97–105.
Devroye, L. P. and Wagner, T. J. (1980). The strong uniform consistency of kernel density estimates. In Multivariate Analysis V: Proceedings of the fifth International Symposium on Multivariate Analysis, volume 5, pages 59–77.
Efron, B. (1992). Bootstrap methods: another look at the jackknife. In Breakthroughs in statistics, pages 569–593. Springer. Hall, P., DiCiccio, T. J., and Romano, J. P. (1989). On smoothing and the bootstrap. The Annals of Statistics, pages 692–704.
# NOT RUN { male<-rnorm(50,1,1) female<-rnorm(50,1,2) dta<-data.frame(group=c(rep(1,50),rep(2,50)),outcome=c(male,female)) rpt.var<-RPT(dta$outcome~dta$group,test="variances") summary(rpt.var) # }