We typically have seminars on Wednesdays at noon in Malone 228. For Fall 2021, we will be having seminars virtually using Zoom in this room. All seminar announcements will be sent to the theory mailing list.

Speaker: Amitabh Basu

Affiliation: JHU

Title: Admissibility of solution estimators for stochastic optimization

Abstract:

We look at stochastic optimization problems through the lens of statistical decision theory. In particular, we address admissibility, in the statistical decision theory sense, of the natural sample average estimator for a stochastic optimization problem (which is also known as the empirical risk minimization (ERM) rule in learning literature). It is well known that for general stochastic optimization problems, the sample average estimator may not be admissible. This is known as Stein’s paradox in the statistics literature. We show in this paper that for optimizing stochastic linear functions over compact sets, the sample average estimator is admissible.

Speaker: Daniel Reichman

Affiliation: Princeton University

Title: Contagious sets in bootstrap percolation

Abstract:

Consider the following activation process in undirected graphs: a vertex is active either if it belongs to a set of initially activated vertices or if at some point it has at least r active neighbors. This process (commonly referred to as bootstrap percolation) has been studied in several fields including combinatorics, computer science, statistical physics and viral marketing. A contagious set is a set whose activation results with the entire graph being active. Given a graph G, let m(G,r) be the minimal size of a contagious set.

I will survey upper and lower bounds for m(G,r) in general graphs and for graphs with special properties (random and pseudo-random graphs, graphs without short cycles) and discuss many unresolved questions.

Based on joint work with Amin Coja-Oghlan, Uriel Feige and Michael Krivelevich.

Speaker: Xuan Wu

Affiliation: Johns Hopkins

Title: Coreset for Ordered Weighted Clustering

Abstract:

Ordered k-Median is a generalization of classical clustering problems such as k-Median and k-Center, that offers a more flexible data analysis, like easily combining multiple objectives (e.g., to increase fairness or for Pareto optimization). Its objective function is defined via the Ordered Weighted Averaging (OWA) paradigm of Yager (1988), where data points are weighted according to a predefined weight vector, but in order of their contribution to the objective (distance from the centers).

Coreset is a powerful data-reduction technique which summarizes the data set into a small (weighted) point set while approximately preserving the objective value of the data set to all centers. When there are multiple objectives (weights), the above standard coreset might have limited usefulness, whereas in a \emph{simultaneous} coreset, which was introduced recently by Bachem and Lucic and Lattanzi (2018), the above approximation holds for all weights (in addition to all centers). Our main result is the first construction of simultaneous coreset for the Ordered k-Median problem of small size.

In this talk, I will introduce the basics of coreset construction for the clustering problem and the main ideas of our new results. Finally, we discuss some remaining open problems.

This talk is based on joint work with Vladimir Braverman, Shaofeng Jiang, and Robert Krauthgamer.

Speaker: Christopher Musco

Affiliation: NYU

Title: Structured Covariance Estimation

Abstract:

Given access to samples from a distribution D over d-dimensional vectors, how many samples are necessary to learn the distribution’s covariance matrix, T? Moreover, how can we leverage a priori knowledge about T’s structure to reduce this sample complexity?

I will discuss this fundamental statistical problem in the setting where T is known to have Toeplitz structure. Toeplitz covariance matrices arise in countless signal processing applications, from wireless communications, to medical imaging, to time series analysis. In many of these applications, we are interested in learning algorithms that only view a subset of entries in each d-dimensional vector sample from D. We care about minimizing two notions of sample complexity 1) the total number of vector samples taken and 2) the number of entries accessed in each vector sample. The later goal typically equates to minimizing equipment or hardware requirements.

I will present several new non-asymptotic bounds on these sample complexity measures. We will start by taking a fresh look at classical and widely used algorithms, including methods based on selecting entries from each sample according to a “sparse ruler”. Then, I will introduce a novel sampling and estimation strategy that improves on existing methods in many settings. Our new approach for learning Toeplitz structured covariance utilizes tools from random matrix sketching, leverage score sampling for continuous signals, and sparse Fourier transform algorithms. It fits into a broader line of work which seeks to address fundamental problems in signal processing using tools from theoretical computer science and randomized numerical linear algebra.

Bio:

Christopher Musco is an Assistant Professor in the Computer Science and Engineering department at NYU’s Tandon School of Engineering. His research focuses on the algorithmic foundations of data science and machine learning. Christopher received his Ph.D. in Computer Science from the Massachusetts Institute of Technology and B.S. degrees in Applied Mathematics and Computer Science from Yale University.

Speaker: Guy Kortsarz

Affiliation: Rutgers Universty – Camden

Title: A survey on the Directed Steiner tree problem

Abstract:

The directed Steiner problem is one of the most important problems in optimization, and in particular is more general than Group Steiner and other problems.

I will discuss the (by now classic) 1/\epsilon^3 n^epsilon approximation for the problem by Charikar et al (the algorithm was invented by Kortsarz and Peleg and is called recursive greedy. A technique who people in approximation should know). The running time is more than n^{1/epsilon}. One of the most important open questions in Approximation Algorithms is if there is a polynomial time polylog ratio for this problem. This is open from 1997.

I will discuss the Group Steiner problem ( a special case of the Directed Steiner problem) and the Directed Steiner Forest (a generalization of the Directed Steiner problem) and many more related problems.

Speaker: Jiapeng Zhang

Affiliation: Harvard University

Title:An improved sunflower bound

Abstract:

Speaker: Robert Krauthgamer

Affiliation: Weizmann Institute of Science

Title: On Solving Linear Systems in Sublinear Time

Abstract:

I will discuss sublinear algorithms that solve linear systems locally. In

the classical version of this problem, the input is a matrix S and a vector

b in the range of S, and the goal is to output a vector x satisfying Sx=b.

We focus on computing (approximating) one coordinate of x, which potentially

allows for sublinear algorithms. Our results show that there is a

qualitative gap between symmetric diagonally dominant (SDD) and the more

general class of positive semidefinite (PSD) matrices. For SDD matrices, we

develop an algorithm that runs in polylogarithmic time, provided that S is

sparse and has a small condition number (e.g., Laplacian of an expander

graph). In contrast, for certain PSD matrices with analgous assumptions, the

running time must be at least polynomial.

Joint work with Alexandr Andoni and Yosef Pogrow.

Speaker: Yasamin Nazari

Affiliation: Johns Hopkins University

Title: Sparse Hopsets in Congested Clique

Abstract:

Speaker: Richard Shea

Affiliation: Applied and Computational Math program, Johns Hopkins University

Title: Progress towards building a Dynamic Hawkes Graph

Abstract:

Speaker: Aditya Krishnan

Affiliation: Johns Hopkins University

Title: Schatten Norms in Matrix Streams: The Role of Sparsity

Abstract:

Spectral functions of large matrices contain important structural information about the underlying data, and are thus becoming increasingly important to efficiently compute. Many times, large matrices representing real-world data are *sparse* or *doubly sparse* (i.e., sparse in both rows and columns), and are accessed as a *stream* of updates, typically organized in *row-order*. In this setting, where space (memory) is the limiting resource, all known algorithms require space that is polynomial in the dimension of the matrix, even for sparse matrices. We address this challenge by providing the first algorithms whose space requirement is *independent of the matrix dimension*, assuming the matrix is doubly-sparse and presented in row-order.

In addition, we prove that multiple passes are unavoidable in this setting and show extensions of our primary technique, including a trade-off between space requirements and number of passes. Our algorithms approximate the Schatten p-norms, which we use in turn to approximate other spectral functions, such as logarithm of the determinant, trace of matrix inverse, and Estrada index.

Joint work with Vladimir Braverman, Robert Krauthgamer and Roi Sinoff.

Speaker: Arnold Filtser

Affiliation: Columbia University

Title: TBD

Abstract: TBD

Speaker: Jasper Lee

Affiliation: Brown University

Title: Real-Valued Sub-Gaussian Mean Estimation, Optimal to a (1+o(1)) Multiplicative Factor

Abstract:

We revisit one of the most fundamental problems in statistics: given access to independent samples from a 1D random variable (with finite but unknown mean and variance), what is the best way to estimate the mean, in terms of error convergence with respect to sample size? The conventional wisdom is to use the sample mean as our estimate. However it is known that the sample mean has optimal convergence if and only if the underlying distribution is (sub-)Gaussian. Moreover, it can even be exponentially slower than optimal for certain heavy-tailed distributions. On the other hand, the median-of-means estimator (invented and reinvented in various literature) does have sub-Gaussian convergence for all finite-variance distributions, albeit in the big-O sense with a sub-optimal multiplicative constant. The natural remaining question then, is whether it is possible to bridge the gap, to have an estimator that has optimal sub-Gaussian concentration with an optimal constant, for all finite-variance distributions.

In this talk, we answer the question affirmatively by giving an estimator that converges with the optimal constant inside the big-O, up to a (1+o(1)) multiplicative factor. Our estimator is furthermore computable in time linear in the sample size. The convergence analysis involves deriving tail bounds using tools from linear and convex programming, which may be of independent interest.

Joint work with Paul Valiant.

Speaker: Edinah Gnang

Affiliation: Johns Hopkins University

Title: On the Kotzig-Ringel-Rosa conjecture

Abstract:

In this talk we describe and motivate the K.R.R. conjecture graph partitioning and describe a functional approach enabling us to tackle the K.R.R. conjecture via a new composition lemma. If time permits I will also discuss algorithmic aspects.

Speaker: Aditya Krishnan

Affiliation: Johns Hopkins University

Title: Schatten Norms in Matrix Streams: The Role of Sparsity.

Abstract: Spectral functions of large matrices contain important structural information about the underlying data, and are thus becoming increasingly important to efficiently compute. Many times, large matrices representing real-world data are \emph{sparse} or \emph{doubly sparse} (i.e., sparse in both rows and columns), and are accessed as a \emph{stream} of updates, typically organized in \emph{row-order}. In this setting, where space (memory) is the limiting resource, all known algorithms require space that is polynomial in the dimension of the matrix, even for sparse matrices. We address this challenge by providing the first algorithms whose space requirement is \emph{independent of the matrix dimension}, assuming the matrix is doubly-sparse and presented in row-order.

In addition, we prove that multiple passes are unavoidable in this setting and show extensions of our primary technique, including a trade-off between space requirements and number of passes. Our algorithms approximate the Schatten p-norms, which we use in turn to approximate other spectral functions, such as logarithm of the determinant, trace of matrix inverse, and Estrada index.

Joint work with Vladimir Braverman, Robert Krauthgamer and Roi Sinoff.

Speaker: Brian Brubach

Affiliation: Wellesley College

Title: Online matching under three layers of uncertainty

Abstract:

Online matching problems have become ubiquitous with the rise of the internet and e-commerce. From the humble beginnings of a single problem 30 years ago, the theoretical study of online matching now encompasses dozens of variations inspired by diverse applications. We’ll take a tour through the landscape of online matching problems. As we go, we’ll venture deeper and deeper into the jungle of stochasticity and uncertainty. Finally, we’ll cover some very recent work introducing new algorithms and models. Along the way, I’ll highlight techniques and tricks I’ve found useful in studying these problems.

Speaker: Joshua Grochow

Affiliation: University of Colorado

Title: Isomorphism of tensors, algebras, and polynomials, oh my!

Abstract: We consider the problems of testing isomorphism of tensors, p-groups, cubic polynomials, quantum states, algebras, and more, which arise from a variety of areas, including machine learning, group theory, and cryptography. Despite Graph Isomorphism now being in quasi-polynomial time, and having long had efficient practical software, for the problems we consider no algorithm is known that is asymptotically better than brute force, and state-of-the-art software cannot get beyond small instances. We approach this situation in two ways:

– Complexity-theoretic: we show that all these problems are polynomial-time equivalent, giving rise to a class of problems we call Tensor Isomorphism-complete (TI-complete). Perhaps surprising here is that we show that isomorphism of d-tensors for any fixed d (at least 3) is equivalent to testing isomorphism of 3-tensors. These equivalences let us focus on just one problem rather than dozens, or a whole infinite hierarchy, separately.

– Algorithmic: Adapting the Weisfeiler-Leman method from Graph Isomorphism to tensors, trying to understand tensor isomorphism by taking advantage of isomorphisms of small sub-tensors. This leads to tensor analogues of the Graph Reconstruction conjecture and related questions.

Based on joint works with Vyacheslav V. Futorny and Vladimir V. Sergeichuk (Lin. Alg. Appl., 2019; arXiv:1810.09219), with Peter A. Brooksbank, Yinan Li, Youming Qiao, and James B. Wilson (arXiv:1905.02518), and with Youming Qiao (arXiv:1907.00309).

Speaker: Jingfeng Wu

Affiliation: Johns Hopkins University

Title: Direction Matters: On the Implicit Regularization Effect of Stochastic Gradient Descent with Moderate Learning Rate

Abstract:

Understanding the algorithmic regularization effect of stochastic gradient descent (SGD) is one of the key challenges in modern machine learning and deep learning theory. Most of the existing works, however, focus on very small or even infinitesimal learning rate regime, and fail to cover practical scenarios where the learning rate is moderate and annealing. In this paper, we make an initial attempt to characterize the particular regularization effect of SGD in the moderate learning rate regime by studying its behavior for optimizing an overparameterized linear regression problem. In this case, SGD and GD are known to converge to the unique minimum-norm solution; however, with the moderate and annealing learning rate, we show that they exhibit different directional bias: SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions. Furthermore, we show that such directional bias does matter when early stopping is adopted, where the SGD output is nearly optimal but the GD output is suboptimal. Finally, our theory explains several folk arts in practice used for SGD hyperparameter tuning, such as (1) linearly scaling the initial learning rate with batch size; and (2) overrunning SGD with high learning rate even when the loss stops decreasing.