Print Close

Massive mixed models in Julia

Presented During: HIGH PERFORMANCE DATA SCIENCE USING JULIA: PROMISE, SUCCESSES, AND CHALLENGES

Phillip Alday Co-Author

Phillip Alday Speaker

Tuesday, Aug 5: 11:00 AM - 11:25 AM
Invited Paper Session

Music City Center

Description

Traditional approaches to mixed effects models using generalized least squares or expectation-maximization approaches struggle to scale to datasets with many thousands of observations and hundreds of levels of a single blocking variable. Special casing of nesting or crossing of random effects is required to achieve acceptable computational performance, but this special casing often makes it very difficult to handle less-than-idealized cases, such partial crossing or multiple levels of nesting. In contrast, an approach based on penalized least squares can take advantage of sparse matrix methods to scale to models with millions of observations and handles nesting and crossing of random effects in a general way. This approach was initially demonstrated by the lme4 package in R.

More recently, the MixedModels.jl package in Julia has expanded upon this foundation. Julia helps to solve the "two language problem": the entirety of MixedModels.jl is written in Julia, unlike lme4 which mixes R and C++ code. Keeping everything in one language makes it much easier for experimentation of potential computational enhancements, such as optimizing the storage of various matrices and intermediate quantities. Moreover, it makes it much easier to onboard other developers, such as myself, as productive collaborators and maintainers. Finally, Julia's use of multiple dispatch is also particularly useful: we are able to use specialized methods for particular patterns of sparsity that arise in the penalized-least squares formulation, which offers an additional performance improvement over relying on generic sparse matrix methods.

As a demonstration of the capabilities that Julia unlocks in this domain, I will present a model fit to the MovieLens (Harper and Konstan, 2015) data containing 32 million observations and partially crossed random effects with 200,948 and 87,585 levels. With scalar valued random effects, it is possible to fit this model in a few hours on a computer with sufficient memory. It is also possible to fit the model with vector valued random effects, although the corresponding fit time increases substantially.

Although we are already very excited to be able to fit such large models at all, we want to fit them even faster. Julia enables us to continue algorithmic development in a coherent way. For example, we can consider alternative representations of triangular matrices in order to have a more compact representation in memory, thus both reducing the memory burden and potentially increasing BLAS performance. Given the reliance on sparse matrix methods, MixedModels.jl has been developed with a focus on CPU and not GPU, because of the historically poor performance of GPUs with sparse matrices. However, the Julia ecosystem provides powerful tooling capable of supporting a hybrid approach combining CPU and GPU, without necessitating the need to call out to another language.

Keywords

mixed models

penalized least squares

crossed random effects

big data