Projective splitting is general framework for generating decomposition algorithms for convex optimization and monotone inclusion problems, with flexibility not present in the better-known operator splitting schemes such as the Douglas-Rachford splitting, and its special case of the ADMM.
This talk presents a new variant of projective splitting in which one of the operators is Lipschitz continuous and represented only by a noisy random oracle, as used in algorithms such as stochastic gradient descent (SGD). For example, the actual operator might be the gradient of the average of a loss function over a huge dataset, while the oracle might return the same loss gradient averaged over a random “minibatch” of observations.
Our results show that projective splitting still converges in such a setting if we replace its usual coordination step of projecting onto a separating hyperplane with a step in the direction of the hyperplane, using a vanishing stepsize. Thus, we have effectively hybridized projective splitting with the stochastic gradient descent family of algorithms.
We present an application of this new algorithm to an L1-regularized distributionally robust learning problem, which has an “adversarial” minimax objective including several nonsmooth terms. We compare the performance of the new method to several possible competing approaches, all of which must use a “monotone + skew” product-space reformulation. After tuning each algorithm, we find competitive performance on one dataset and superior performance on the remaining ones; we attempt to provide some intuition for these results.
Joint work with Patrick Johnstone (lead author), Thomas Flynn, and Shinjae Yoo of Brookhaven National Laboratory.