Tuesday, Aug 5: 2:00 PM - 3:50 PM
0718
Topic-Contributed Paper Session
Music City Center
Room: CC-104D
Applied
Yes
Main Sponsor
Section on Statistical Learning and Data Science
Presentations
Analyzing multivariate time series networks is popular in many fields from neuroscience to seismology. The inverse spectral density is a common choice for time series network analysis due to its representation of the frequency domain correlation between two variables after removing the best linear predictor of all other variables. In many applications, the goal is to study how these networks change across different conditions. For example, in neuroscience, one might be interested in how the brain connectivity network changes before and after stimulation. With this in mind, we develop a direct estimate of the difference in two high-dimensional inverse spectral densities. By leveraging recent advances in multivariate time series analysis, we establish consistency of our estimator only assuming mild dependence conditions. Using a new convergence rate on high-dimensional spectral density estimators, we obtain a flexible convergence rate for the proposed direct estimator that allows for both varying smoothing spans and dependence in the data. Leveraging this convergence rate and new results on the form of the asymptotic distribution of the spectral density estimator, we also develop a valid inference procedure that handles asymptotic distributions with arbitrary scaling. Finally, to make the procedure computationally tractable, we utilize previously overlooked estimating equations to implement an efficient algorithm. The method is illustrated on synthetic data experiments, on experiments with electroencephalography data, and on experiments with optogentic stimulation and micro-electrocorticography data.
Representation multi-task learning (MTL) has achieved tremendous success in practice. However, the theoretical understanding of these methods is still lacking. Most existing theoretical works focus on cases where all tasks share the same representation, and claim that MTL almost always improves performance. Nevertheless, as the number of tasks grows, assuming all tasks share the same representation is unrealistic. Furthermore, empirical findings often indicate that a shared representation does not necessarily improve single-task learning performance. In this paper, we aim to understand how to learn from tasks with \textit{similar but not exactly the same} linear representations, while dealing with outlier tasks. Assuming a known intrinsic dimension, we proposed a penalized empirical risk minimization method and a spectral method that are \textit{adaptive} to the similarity structure and \textit{robust} to outlier tasks. Both algorithms outperform single-task learning when representations across tasks are sufficiently similar and the proportion of outlier tasks is small. Moreover, they always perform at least as well as single-task learning, even when the representations are dissimilar. We provided information-theoretic lower bounds to demonstrate that both methods are nearly \textit{minimax} optimal in a large regime, with the spectral method being optimal in the absence of outlier tasks. Additionally, we introduce a thresholding algorithm to adapt to an unknown intrinsic dimension. We conducted extensive numerical experiments to validate our theoretical findings.
Speaker
Ye Tian, Columbia University, Department of Statistics
In the measurement-constrained problems, despite the availability of large datasets, we may be only affordable to observe the labels on a small portion of the large dataset. This poses a critical question that which data points are most beneficial to label given a budget constraint. In this paper, we focus on the estimation of the optimal individualized threshold in a measurement-constrained M-estimation framework. Our goal is to estimate a high-dimensional parameter θ in a linear threshold θTZ for a continuous variable X such that the discrepancy between whether Xexceeds the threshold θTZ and a binary outcome Y is minimized. We propose a novel K-step active subsampling algorithm to estimate θ, which iteratively samples the most informative observations and solves a regularized M-estimator. The theoretical properties of our estimator demonstrate a phase transition phenomenon with respect to β≥1, the smoothness of the conditional density of X given Y and Z. For β>(1+3‾√)/2, we show that the two-step algorithm yields an estimator with the parametric convergence rate Op((slogd/N)1/2) in l2 norm. The rate of our estimator is strictly faster than the minimax optimal rate with Ni.i.d. samples drawn from the population. For the other two scenarios 1<β≤(1+3‾√)/2 and β=1, the estimator from the two-step algorithm is sub-optimal. The former requires to run K>2 steps to attain the same parametric rate, whereas in the latter case only a near parametric rate can be obtained. Furthermore, we formulate a minimax framework for the measurement-constrained M-estimation problem and prove that our estimator is minimax rate optimal up to a logarithmic factor. Finally, we demonstrate the performance of our method in simulation studies and apply the method to analyze a large diabetes dataset.
This paper studies the convergence rates of optimal transport (OT) map estimators, a topic of growing interest in statistics, machine learning, and various scientific fields. Despite recent advancements, existing results rely on regularity assumptions that are very restrictive in practice and much stricter than those in Brenier's Theorem, including the compactness and convexity of the probability support and the bi-Lipschitz property of the OT maps. We aim to broaden the scope of OT map estimation and fill this gap between theory and practice. Given the strong convexity assumption on Brenier's potential, we first establish the non-asymptotic convergence rates for the original plug-in estimator without requiring restrictive assumptions on probability measures. Additionally, we introduce a sieve plug-in estimator and establish its convergence rates without the strong convexity assumption on Brenier's potential, enabling the widely used cases such as the rank functions of normal or $t$-distributions. We also establish new Poincaré-type inequalities, which are proved given sufficient conditions on the local boundedness of the probability density and mild topological conditions of the support, and these new inequalities enable us to achieve faster convergence rates for Donsker function class. Moreover, we develop scalable algorithms to efficiently solve the OT map estimation using neural networks and present numerical experiments to demonstrate the effectiveness and robustness.
Keywords
Optimal Transport
Poincaré Inequality
For the 2024 U.S. presidential election, would digital ads against Donald Trump impact voter turnout in Pennsylvania (PA), a key "tipping point" state? The gold standard to address this question, a randomized experiment where voters get randomized to different ads, yields unbiased estimates of the ad effect, but is very expensive. Instead, we propose a less-than-ideal, but significantly cheaper and likely faster framework based on transfer learning, where we transfer knowledge from a past ad experiment in 2020 to evaluate ads for 2024. A key component of our framework is a sensitivity analysis that quantifies the unobservable differences between past and future elections, which can be calibrated in a data-driven manner. We propose two estimators of the 2024 ad effect: a simple regression estimator with bootstrap, which we recommend for practitioners in this field, and an estimator based on the efficient influence function for broader applications. Using our framework, we estimate the effect of a digital ad campaign against Trump on voter turnout in PA for the 2024 election. Our results indicate effect heterogeneity across counties of PA and among important subgroups stratified by gender, urbanicity, and education attainment.
Keywords
Causal inference
Sensitivity analysis
Generalizability
Transportability
Exponential tilting