Factorial designs under baseline parameterization and space-filling designs with applications to big data

Date created: 
Centered L2-discrepancy
Computer experiment
Minimum aberration, orthogonal array
Sub-data selection

This dissertation reports my research work on three topics in the areas of two-level factorial designs under the baseline parameterization, space-filling designs, and sub-data selection for big data. When studying two-level factorial designs, factorial effects are usually given by the orthogonal parameterization. But if each factor has an intrinsic baseline level, the baseline parameterization is a more appropriate alternative. We obtain a relationship between these two types of parameterization, and show that certain design properties are invariant. The relationship also allows us to construct an attractive class of robust baseline designs. We then consider two classes of space-filling designs driven by very different considerations: uniform projection designs and strong orthogonal arrays (SOAs), where the former are obtained by minimizing the uniform projection criterion while the latter are a special kind of orthogonal arrays. We express the uniform projection criterion in terms of the stratification characteristics related to an SOA. This new expression is then used to show that certain SOAs are optimal or nearly optimal under the uniform projection criterion. Finally, we consider the problem of selecting a representative sub-dataset from a big dataset for the purpose of statistical analyses without massive computation. Under the nonparametric regression situation, we present a two-phase selection method, which embodies two important ideas. First, the sub-dataset should be a space-filling subset within the full dataset. Second, in the area where the response surface is more rugged, more data points should be selected. Simulations are conducted to demonstrate the usefulness of our method.

Document type: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
Boxin Tang
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.