Transformation And Additivity For Big Data Subsampling
Kujabi, Adama
Citations
Abstract
This thesis introduces a new subsampling framework that combines data transformation, additive modeling, and influence diagnostics to make estimation faster and more accurate. Applying an appropriate transformation to the regression response can improve model assumptions by simplifying complex mean structures to make them more additive in the effects of the inputs. Building on this idea, we develop a scalable transformation-assisted subsampling framework for large-scale additive modeling. The method combines Box–Cox transformations with influence-guided subsampling, where informative observations are selected using leverage scores computed from a penalized spline smoother. We refer to this approach as Transformation-Assisted Leverage Subsampling (TALS). To further enhance computational efficiency, we introduce a fast sketch-based variant called Transformation-Assisted Approximate Leverage Subsampling (TAALS). Simulation studies and real-data applications, including computer experiment benchmarks, demonstrate that TAALS achieves substantial gains in predictive accuracy, robustness, and computational efficiency compared with some existing methods
