Package 'MissMech'

Title: Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random
Description: To test whether the missing data mechanism, in a set of incompletely observed data, is one of missing completely at random (MCAR). For detailed description see Jamshidian, M. Jalal, S., and Jansen, C. (2014). "MissMech: An R Package for Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random (MCAR)", Journal of Statistical Software, 56(6), 1-31. <https://www.jstatsoft.org/v56/i06/> <doi:10.18637/jss.v056.i06>.
Authors: Mortaza Jamshidian [aut], Siavash Jalal [aut], Camden Jansen [aut], Mao Kobayashi [cre]
Maintainer: Mao Kobayashi <[email protected]>
License: GPL (>= 2)
Version: 1.0.4.9000
Built: 2024-11-01 05:39:23 UTC
Source: https://github.com/indenkun/missmech

Help Index


Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random

Description

The main purpose of this package is to test whether the missing data mechanism, in a set of incompletely observed data, is one of missing completely at random (MCAR). As a by-product, however, this package can impute incomplete data, is able to perform a test to determine whether data have a multivariate normal distribution or whether the covariances for several populations are equal. The test of MCAR follows the methodology proposed by Jamshidian and Jalal (2010). It is based on testing equality of covariances between groups consisting of identical missing data patterns. The data are imputed, using two options of normality and distribution free, and the test of equality of covariances between groups with identical missing data patterns is performed also with options of assuming normality (Hawkins test) or non-parametrically. The user, can optionally use her own method of data imputation as well. Multiple imputation is an option as a diagnostic tool to help identify cases or variables that contribute to rejection of MCAR, when the MCAR test is rejecetd (See Jamshidian and Jalal, 2010 for details). As explained in Jamshidian, Jalal, and Jansen (2014), this package can also be used for imputing missing data, test of multivariate normality, and test of equality of covariances between several groups when data are complete.

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

References

Jamshidian, M. and Jalal, S. (2010). “Tests of homoscedasticity, normality, and missing at random for incomplet multivariate data,” Psychometrika, 75, 649-674, doi:10.1007/s11336-010-9175-3.

Jamshidian, M. Jalal, S., and Jansen, C. (2014). “ MissMech: An R Package for Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random (MCAR),” Journal of Statistical Software, 56(6), 1-31. https://www.jstatsoft.org/v56/i06/, doi:10.18637/jss.v056.i06."

See Also

Useful links:


Montpetit and Bergeman Longitudinal Study on Aging Data

Description

The data consist of 521 cases and 7 variables with 280 of the cases being complete. The variables are Education, Income, Perceived satisfaction of social support, Social coping, Total life events scale, Depression scale, and Self-rated help.

Usage

agingdata

Format

A data frame with 521 observations on the following 7 variables.

References

Montpetit A, Bergeman CS (2007). “Dimensions of control: Mediational analyses of the stress Chealth relationship.” Personality and Individual Differences, 43, 2237 - 2248, doi:10.1016/j.paid.2007.07.003.


K-Sample Anderson Darling Test

Description

This is a non-parametric K-sample test that tests equality of distribution of a variable between k populations based on samples from each of the populations.

Usage

AndersonDarling(data, number.cases)

Arguments

data

A single vector consisting of concatenation of the k samples data being used for the test.

number.cases

A vector consisting of the number of cases in samples 1, 2, ..., k, respectively.

Details

The data is a vector including all the k samples to be used for the test. The j-th element of number.cases is the number of cases in sample j (included in data), for j= 1,...,k.

Value

pn

The test's p-value.

adk.all

The Anderson Darling test statistic corresponding to each group.

adl

The sum of elements of adk.all.

var.sdk

The variance of the finite sample distribution of the Anderson Darling test statistic under the null.

Note

The test does not adjust for tie observations.

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

References

Scholz, F.W. and Stephens, M.A. (1987). ”K-Sample Anderson-Darling Tests,”Journal of the American Statistical Association, 82, 918-924, doi:10.2307/2288805.

Examples

#---- Example 1
set.seed(50)
n1 <- 30
n2 <- 45
n3 <- 60
v1 <- rnorm(n1)
v2 <- runif(n2)
v3 <- rnorm(n3, 2, 3)
AndersonDarling(data = c(v1, v2, v3), number.cases=c(n1, n2, n3))

#---- Example 2
set.seed(50)
n1 <- 30
n2 <- 45
n3 <- 60
v1 <- rt(n1,4)
v2 <- rt(n2,4)
v3 <- rt(n3,4)
AndersonDarling(data=c(v1, v2, v3), number.cases=c(n1, n2, n3))

Hessian of the observed datat Multivariate Normal Log-Likelihood with Incomplete Data

Description

The Hessian of the normal-theory observed data log-likelihood function, evaluated at a given value of the mean vector and the covariance matrix, when data are incomplete. The output is a symmetric matrix with rows/columns corresponding to elements in the mean vector and lower diagonal of the covariance matrix.

Usage

Ddf(data, mu, sig)

Arguments

data

A matrix consisting of at least two columns. Values must be numerical with missing data indicated by NA.

mu

A row matrix consisting of the values of the mean at which points the Hessian of the log-likelihood is to be computed.

sig

A symmetric covariance matrix at at which points the Hessian of the log-likelihood is to be computed.

Details

While mu is a vector, it has to be input as a matrix object. See example nelow.

Value

dd

The resulting Hessian matrix

se

Negative of the inverse of the Hessian matrix

Note

There must be no rows in data that contain no observations.

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

References

Jamshidian, M. and Bentler, P. M. (1999). “ML estimation of mean and covariance structures with missing data using complete data routines.” Journal of Educational and Behavioral Statistics, 24, 21-41, doi:10.2307/1165260.

Examples

set.seed <- 50
n <- 200
p <- 4
pctmiss <- 0.2
y <- matrix(rnorm(n * p),nrow = n)
missing <- matrix(runif(n * p), nrow = n) < pctmiss
y[missing] <- NA
mu <- c(0,0,0,0)
sig <- matrix(c(1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1),4,4)
Ddf(data=y, as.matrix(mu), sig)

Removes groups with identical missing data patterns having at most a given number of cases

Description

Removes groups of missing data patterns with number of cases less than or equal to a specified value (ncases).

Usage

DelLessData(data, ncases = 0)

Arguments

data

A matrix consisting of at least two columns. Values must be numerical with missing data indicated by NA.

ncases

Missing data pattern groups with ncases number of cases or less will be removed from the data set.

Value

data

A matrix of reaaranged data, according to missing data patterns, with missing data patterns having less than ncases number of cases removed.

patused

A matrix indicating the missing data patterns in the data set; observed variable(s) are indiated by 1's' and missing variables are indicated by NA's.

patcnt

A vector consisting of the number of cases corresponding to each pattern in patused.

spatcnt

Cumulative sum of elements of patcnt.

g

Number of missing data patterns.

caseorder

A mapping of case number indices from output data (rearranged data) to input data.

removedcases

The index of cases that were removed from the original data set

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

Examples

set.seed <- 50
n <- 200
p <- 4
pctmiss <- 0.2
y <- matrix(rnorm(n * p),nrow = n)
missing <- matrix(runif(n * p), nrow = n) < pctmiss
y[missing] <- NA
out <- DelLessData(data=y, ncases = 4)
dim(y)
dim(out$data)

Test Statistic for the Hawkins Homoscedasticity Test

Description

Produces the F_ij's and A_ij's that are used in the Hawkins' test of homogeneoity of covariances. See Hawkins (1981) and Jamshidian and Jalal (2010) for more details.

Usage

Hawkins(data, spatcnt)

Arguments

data

A matrix consisting of at least two columns and two rows.

spatcnt

The cumulative sum of the number of cases corresponding to each group.

Value

fij

A vector of F_ij statistics, see Hawkins (1981) reference.

a

A list containg A_ij statistics, see Hawkins (1981) reference.

ni

A vector consisting of the number of cases in each group.

Note

There must be no rows in data that contain no observations.

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

References

Hawkins, D. M. (1981). “A new test for multivariate normality and homoscedasticity,” Technometrics, 23, 105-110, doi:10.2307/1267983.

Jamshidian, M. and Jalal, S. (2010). “Tests of homoscedasticity, normality, and missing at random for incomplete multivariate data,” Psychometrika, 75, 649-674, doi:10.1007/s11336-010-9175-3.

Examples

set.seed <- 50
n <- 200
p <- 4
pctmiss <- 0.2
y <- matrix(rnorm(n * p),nrow = n)
spatcnt <- c(20, 50, 70, 200)
h <- Hawkins(data=y, spatcnt)

Parametric and Non-Parameric Imputation

Description

This function imputes the data using two methods.

method 'Normal' - Imputes the data assuming that the data come from a multivariate normal distribution with mean mu and covariance sig. If mu or sig are not inputted, then their maximum likelihood estimate is used. The imputed values are based on the conditional distribution of the missing given the observed and mu and sigma; see Jamshidian and Jalal (2010) for more details.

method 'Dist.Free' - This method imputes the data nonparametrically using the method of Sirvastava and Dolatabadi (2009). Also see Jamshidian and Jalal (2010).

Usage

Impute(data, mu = NA, sig = NA, imputation.method = "Normal", resid = NA)

Arguments

data

A matrix consisting of at least two columns. Values must be numerical with missing data indicated by NA.

mu

A vector, consisting of population means, used to impute the data. As a default the maximum likelihood estimates based on the observed data is used.

sig

The population covariance matrix used to impute the data. As a default the maximum likelihood estimates based on the observed data is used.

imputation.method

'Normal' uses the normal imputation method. 'Dist.free uses the the method. See Jamshidian and Jalal (2010) and Sirvastava and Dolatabadi (2009).

resid

User defined residual vector to be used in place of the residuals proposed by the Sirvastava and Dolatabadi (2009) method.

Details

This routine uses OrderMissing to order data accordinng to missing data patterns. The output consists of imputed data both in its original order as well as post ordering by OrderMissing.

Value

yimp

The imputed data set (in the order of the original data) after rwos with no datum (if any) have been deleted.

yimpOrdered

The imputed data set ordered by OrderMissing according to missing data pattern

caseorder

A mapping of case number indices from OrderedData to the original data. More specifically, the j-th row of the OrderedData is the caseorder[j]-th (the j-th element of caseorder) row of the original data.

patused

A matrix indicating the missing data patterns in the data set, using 1's' (for observed) and NA's (for missing).

patcnt

A vector consisting the number of cases corresponding to each pattern in patused.

Note

In the above descriptions "original data" refers to the input data after deletion of the rows consisting of all NA's (if any) .

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

References

Srivastava, M. S. and Dolatabadi, M. (2009). “Multiple imputation and other resampling scheme for imputing missing observations,” Journal of Multivariate Analysis, 100, 1919-1937, doi:10.1016/j.jmva.2009.06.003.

Jamshidian, M. and Jalal, S. (2010). “Tests of homoscedasticity, normality, and missing at random for incomplete multivariate data,” Psychometrika, 75, 649-674, doi:10.1007/s11336-010-9175-3.

Examples

set.seed <- 50
n <- 200
p <- 4
pctmiss <- 0.2
y <- matrix(rnorm(n * p),nrow = n)
missing <- matrix(runif(n * p), nrow = n) < pctmiss
y[missing] <- NA

yimp1 <- Impute(data=y, mu = NA, sig = NA, imputation.method = "Normal", resid = NA)
yimp2 <- Impute(data=y, mu = NA, sig = NA, imputation.method = "Dist.Free", resid = NA)

Evaluating Legendre's Polynomials of Degree 1, 2, 3, or 4

Description

This function evaluates the values of Legendre polynomials of degrees 1 to 4 on [0,1] at a value(s) x.

Usage

LegNorm(x)

Arguments

x

A scaler or vector of values at which the Legendre's ploynomials are to be evaluated.

Value

The returned list has the following elements:

p1

p1 is value(s) of the Legendre's polynomial of degree 1 at x

p2

p2 is value(s) of the Legendre's polynomial of degree 2 at x

p3

p3 is value(s) of the Legendre's polynomial of degree 3 at x

p4

p4 is value(s) of the Legendre's polynomial of degree 4 at x

Note

Legendre's polynomias on [0,1] are calculated.

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

References

David, F. N. (1939). “On Neyman's "smooth" test for goodness of fit: I. Distribution of the criterion Psi2 when the hypothesis tested is true,” Biometrika, 31, 191-199, doi:10.1093/biomet/31.1-2.191.

Examples

p <- LegNorm(c(5.2,11,15))
p$p3

ML Estimates of Mean and Covariance Based on Incomplete Data

Description

Normal theory maximum likelihood estimates of mean and covariance matrix is obtained when data are incomplete, using EM algorithm (see Jamshidian and Bentler 1999). If the option Hessian is set to TRUE, then the observed information containing the standard errors of the parameter estimates is also computed.

Usage

Mls(data, mu = NA, sig = NA, tol = 1e-06, Hessian = FALSE)

Arguments

data

A matrix consisting of at least two columns. Values must be numerical with missing data indicated by NA.

mu

EM iteration initial value for mu (the mean vector). The default is a zero vector.

sig

EM iteration initial value for sigma (the covariance matrix). The default is the identity matrix.

tol

The tolerance value used in the convergence criteria described in Jamshidian and Bentler (1999) for stopping the EM algorithm.

Hessian

Hessian of the log-likelihood function, see Jamshidian and Bentler (1999).

Value

mu

The maximum likelihood estimate of the mean vector.

sig

The maximum likelihood estimates of the covariance matrix.

hessian

The Hessian of the observed data log-likelihood function.

stderror

The negative of the inverse of the Hessian of the observed data log-likelihood function. The diagonal elements of stderror are the variance of the parameters based on the observed informaion matrix.

iteration

The number of iterations used in the EM iterative process.

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

References

Jamshidian, M. and Bentler, P. M. (1999). “ML estimation of mean and covariance structures with missing data using complete data routines.” Journal of Educational and Behavioral Statistics, 24, 21-41, doi:10.2307/1165260.

Examples

set.seed <- 50
n <- 200
p <- 4
pctmiss <- 0.2 # Generate 20\% missing data
y <- matrix(rnorm(n * p),nrow = n)
missing <- matrix(runif(n * p), nrow = n) < pctmiss
y[missing] <- NA
ml <- Mls(data=y, mu = NA, sig = NA, tol = 1e-6, Hessian=FALSE)
ml

Order Missing Data Pattern

Description

This function rearranges the data based on their missing data patterns. Morever, missing data patterns consisting of fewer than the user specified number in del.lesscases is deleted from the dataset.

Usage

OrderMissing(data, del.lesscases = 0)

Arguments

data

A matrix or data frame consisting of at least two columns. Values must be numerical with missing data indicated by NA.

del.lesscases

Missing data patterns consisting of del.lesscases number of cases or less will be removed from the data set.

Value

An object of class orderpattern - a list including elements:

data

A matrix containing the rearranged data set

caseorder

A mapping of case number indices from output (ordered) data to input data. More specifically, the j-th row of the output (ordered) data is the caseorder[j]-th (the j-th element of caseorder) row of the input data.

patused

A matrix indicating the missing data patterns in the data set, using 1's' (for observed) and NA's (for missing)

patcnt

A vector consisting the number of cases corresponding to each pattern in patused

spatcnt

Cumulative sum of elements of patcnt

g

Number of missing data patterns

removedcases

The index of cases that were removed from the original data set

Note

If you run the following command line: out <- OrderMissing(data=y, del.lesscases=0), then y[out$caseorder,] is equal to the output data (i.e. out$data). Also out$data[order(out$caseorder),] is equal to the original data y. Note that if del.lesscases is greater than zero, the command out$data[order(out$caseorder),] will result in a data set that has no case order correspondnce to the original data.

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

Examples

set.seed <- 50
n <- 200
p <- 4
pctmiss <- 0.2
y <- matrix(rnorm(n * p),nrow = n)
missing <- matrix(runif(n * p), nrow = n) < pctmiss
y[missing] <- NA
out <- OrderMissing(y, del.lesscases = 0)
a <- out$caseorder
z = out$data
y[a,] # Reverting the original data to the new output order
z[order(a),] # Reverting the ordered datat to the original order

Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random

Description

The main purpose of this package is to test whether the missing data mechanism, for an incompletely observed data set, is one of missing completely at random (MCAR). As a by product, however, this package has the capabilities of imputing incomplete data, performing a test to determine whether data have a multivariate normal distribution, performing a test of equality of covariances for groups, and obtaining normal-theory maximum likelihood estimates for mean and covariance when data are incomplete. The test of MCAR follows the methodology proposed by Jamshidian and Jalal (2010). It is based on testing equality of covariances between groups having identical missing data patterns. The data are imputed, using two options of normality and distribution free, and the test of equality of covariances between groups with identical missing data patterns is performed also with options of assuming normality (Hawkins test) or non-parametrically. Users can optionally use their own method of data imputation as well. Multiple imputation is an additional feature of the program that can be used as a diagnostic tool to help identify cases or variables that contribute to rejection of MCAR, when the MCAR test is rejecetd (See Jamshidian and Jalal, 2010 for details). As explained in Jamshidian, Jalal, and Jansen (2014), this package can also be used for imputing missing data, test of multivariate normality, and test of equality of covariances between several groups when data are completly observed.

Usage

TestMCARNormality(
  data,
  del.lesscases = 6,
  imputation.number = 1,
  method = "Auto",
  imputation.method = "Dist.Free",
  nrep = 10000,
  n.min = 30,
  seed = 110,
  alpha = 0.05,
  imputed.data = NA
)

Arguments

data

A matrix or data frame consisting of at least two columns. Values must be numerical with missing data indicated by NA.

del.lesscases

Missing data patterns consisting of del.lesscases number of cases or less will be removed from the data set.

imputation.number

Number of imputations to be used, if data are to be multiply imputed.

method

method is an option that allows the user to select one of the methods of Hawkins or nonparametric for the test. If the user is certain that data have multivariate normal distribution, the method="Hawkins" should be selected. On the other hand if data are not normally distributed, then method="Nonparametric" should be used. If the user is unsure, then the default value of method="Auto" will be used, in which case both the Hawkins and the nonparametric tests will be run, and the default output follows the recommendation by Jamshidian and Jalal (2010) outlined in their flowchart given in Figure 7 of their paper.

imputation.method

"Dist.Free": Missing data are imputed nonparametrically using the method of Sirvastava and Dolatabadi (2009); also see Jamshidian and Jalal (2010).

"Normal": Missing data are imputed assuming that the data come from a multivariate normal distribution. The maximum likelihood estimate of the mean and covariance obtained from Mls is used for generating imputed values. The imputed values are based on the conditional distribution of the missing variables given the observed variables; see Jamshidian and Jalal (2010) for more details.

nrep

Number of replications used to simulate the Neyman distribution to determine the cut off value for the Neyman test in the program SimNey. Larger values increase the accuracy of the Neyman test.

n.min

The minimum number of cases in a group that triggers the use of asymptotic Chi distribution in place of the emprical distribution in the Neyman test of uniformity.

seed

An initial random number generator seed. The default is 110 that can be reset to a user selected number. If the value is set to NA, a system selected seed is used.

alpha

The significance level at which tests are performed.

imputed.data

The user can optionally provide an imputed data set. In this case the program will not impute the data and will use the imputed data set for the tests performed. Note that the order of cases in the imputed data set should be the same as that of the incomplete data set.

Details

Theoretical, technical and prcatical details about this program and its uses can be found in Jamshidian and Jalal (2010) and Jamshidian, Jalal, and Jansen (2014).

Value

analyzed.data

The data that were used in the analysis. If del.lesscases=0, this is the same as the orginal data inputted. If del.lesscases > 0, then this is the data with cases removed.

imputed.data

The analyzed.data after imputation. If imputation.number > 1, the first imputed data set is returned.

ordered.data

The analyzed.data ordered according to missing data pattern, usin the function OrderMissing.

caseorder

A mapping of case number indices from ordered.data to the original data. More specifically, the j-th row of the ordered.data is the caseorder[j]-th (the j-th element of caseorder) row of the original data.

pnormality

p-value for the nonparametric test: When imputation.number > 1, this is a vector with each element corresponding to each of the imputed data sets.

adistar

A matrix consisting of the Anderson-Darling test statistic for each group (columns) and each imputation (rows).

adstar

Sum of adistar: When imputation.number >1, this is a vector with each element corresponding to each of the imputed data sets.

pvalcomb

p-value for the Hawkins test: When imputation.number >1, this is a vector with each element corresponding to each of the imputed data sets.

pvalsn

A matrix consisting of Hawkins test statistics for each group (columns) and each imputation (rows).

g

Number of patterns used in the analysis.

combp

Hawkins test statistic: When imputation.number > 1, this is a vector with each element corresponding to each of the imputed data sets.

alpha

The significance level at which the hypothesis tests are performed.

patcnt

A vector consisting the number of cases corresponding to each pattern in patused.

patused

A matrix indicating the missing data patterns in the data set, using 1 and NA's.

imputation.number

A value greater than or equal to 1. If a value larger than 1 is used, data will be imputed imputation.number times.

mu

The normal-theory maximum likelihood estimate of the variables means.

sigma

The normal-theory maximum likelihood estimate of the variables covariance matrix.

Note

Note 1: In the above descriptions "original data" refers to the input data after deletion of the rows consisting of all NA's (if any)

Note 2: The normal theory maximum likelihood estimate of mean and covariance is obtained using the EM algorithm, as described in Jamshidian and Bentler (1999). The standard errors for these estimates, based on the observed information matrix, can be obtained via the function Ddf, included in this package.

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

References

Jamshidian, M. and Bentler, P. M. (1999). “ML estimation of mean and covariance structures with missing data using complete data routines.” Journal of Educational and Behavioral Statistics, 24, 21-41, doi:10.2307/1165260.

Jamshidian, M. and Jalal, S. (2010). “Tests of homoscedasticity, normality, and missing at random for incomplete multivariate data,” Psychometrika, 75, 649-674, doi:10.1007/s11336-010-9175-3.

Jamshidian, M. Jalal, S., and Jansen, C. (2014). “MissMech: An R Package for Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random (MCAR),” Journal of Statistical Software, 56(6), 1-31, doi:10.18637/jss.v056.i06.

Examples

#-- Example 1: Data are MCAR and normally distributed

n <- 300
p <- 5
pctmiss <- 0.2
set.seed(1010)
y <- matrix(rnorm(n * p),nrow = n)
missing <- matrix(runif(n * p), nrow = n) < pctmiss
y[missing] <- NA
out <- TestMCARNormality(data=y)
print(out)

# --- Prints the p-value for both the Hawkins and the nonparametric test
summary(out)

# --- Uses more cases
out1 <- TestMCARNormality(data=y, del.lesscases = 1)
print(out1)

#---- performs multiple imputation
Out <- TestMCARNormality (data = y, imputation.number = 10)
summary(Out)
boxplot(Out)

#-- Example 2: Data are MCAR and non-normally distributed (t distributed with d.f. = 5)

n <- 300
p <- 5
pctmiss <- 0.2
set.seed(1010)
y <- matrix(rt(n * p, 5), nrow = n)
missing <- matrix(runif(n * p), nrow = n) < pctmiss
y[missing] <- NA
out <- TestMCARNormality(data=y)
print(out)

# Perform multiple imputation
Out_m <- TestMCARNormality (data = y, imputation.number = 20)
boxplot(Out_m)

#-- Example 3: Data are MAR (not MCAR), but are normally distributed

n <- 300
p <- 5
r <- 0.3
mu <- rep(0, p)
sigma <- r * (matrix(1, p, p) - diag(1, p))+ diag(1, p)
set.seed(110)
eig <- eigen(sigma)
sig.sqrt <- eig$vectors %*%  diag(sqrt(eig$values)) %*%  solve(eig$vectors)
sig.sqrt <- (sig.sqrt + sig.sqrt) / 2
y <- matrix(rnorm(n * p), nrow = n) %*%  sig.sqrt
tmp <- y
for (j in 2:p){
  y[tmp[, j - 1] > 0.8, j] <- NA 
}
out <- TestMCARNormality(data = y, alpha =0.1)
print(out)

#-- Example 4: Multiple imputation; data are MAR (not MCAR), but are normally distributed

n <- 300
p <- 5
pctmiss <- 0.2
set.seed(1010)
y <- matrix (rnorm(n * p), nrow = n)
missing <- matrix(runif(n * p), nrow = n) < pctmiss
y[missing] <- NA
Out <- OrderMissing(y)
y <- Out$data
spatcnt <- Out$spatcnt
g2 <- seq(spatcnt[1] + 1, spatcnt[2])
g4 <- seq(spatcnt[3] + 1, spatcnt[4])
y[c(g2, g4), ] <- 2 * y[c(g2, g4), ]
out <- TestMCARNormality(data = y, imputation.number = 20)
print(out)
boxplot(out)

# Removing Groups 2 and 4
y1= y[-seq(spatcnt[1]+1,spatcnt[2]),]
out <- TestMCARNormality(data=y1,imputation.number = 20)
print(out)
boxplot(out)

#-- Example 5: Test of homoscedasticity for complete data

n <- 50
p <- 5
r <- 0.4
sigma <- r * (matrix(1, p, p) - diag(1, p)) + diag(1, p)
set.seed(1010)
eig <- eigen(sigma)
sig.sqrt <- eig$vectors %*%  diag(sqrt(eig$values)) %*%  solve(eig$vectors)
sig.sqrt <- (sig.sqrt + sig.sqrt) / 2
y1 <- matrix(rnorm(n * p), nrow = n) %*%  sig.sqrt
n <- 75
p <- 5
y2 <- matrix(rnorm(n * p), nrow = n)
n <- 25
p <- 5
r <- 0
sigma <- r * (matrix(1, p, p) - diag(1, p)) + diag(2, p)
y3 <- matrix(rnorm(n * p), nrow = n) %*%  sqrt(sigma)
ycomplete <- rbind(y1 ,y2 ,y3)
y1 [ ,1] <- NA
y2[,c(1 ,3)] <- NA
y3 [ ,2] <- NA
ygroup <- rbind(y1, y2, y3)
out <- TestMCARNormality(data = ygroup, method = "Hawkins", imputed.data = ycomplete)
print(out)

# ---- Example 6, real data

data(agingdata)
TestMCARNormality(agingdata, del.lesscases = 1)

Test of Goodness of Fit (Uniformity)

Description

This routine tests whether the values in a vector x is distributed as uniform (0,1). The Neyman's smooth test of fit, as described by Ladwina (1994) is used. The p-values are obtained based on a resampling method from uniform (0,1).

Usage

TestUNey(x, nrep = 10000, sim = NA, n.min = 30)

Arguments

x

A vector of values, each in the interval [0,1].

nrep

The number of replications used to simulate the Neyman distribution.

sim

A vector of simulated values from the Neyman distribution. If sim = NA this vector is generated by the function SimNev, otherwise the vector inputted is used.

n.min

The minimum sample size that triggers the use of asymptotic Chi distribution in place of the emprical distribution in the Neyman test of uniformity.

Value

pn

The p-value for the test.

n4

The value of the test statistics.

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

References

Ledwina, T. (1994). “Data-driven version of neyman's smooth test of fit,” Journal of the American Statistical Association, 89, 1000-1005, doi:10.2307/2290926.

Examples

# Example 1
x <- runif(100)
TestUNey(x, nrep = 10000, sim = NA)

# Example 2
x <- runif(30,2,5)
x <- (x-min(x))/(max(x)-min(x))
TestUNey(x, nrep = 10000, sim = NA)

# Example 3
x <- c(0.6,0.6,0.5,0.7,0.3,0.4,0.5,0.4,0.2,0.4,0.2,0.5,0.7,0.1,0.7,0.1,0.5,0.5,0.4,0.6,0.3)
TestUNey(x, nrep = 10000, sim = NA)