What is Model-Based Machine Learning?
Published:
Originally published on the yhat blog.
Introduction
If you haven’t had your head in the sand, you’ll realize that Machine Learning is no longer a niche research area, it is now a mainstream technology being applied by engineers right across the spectrum.
“I think it’s the dawn of an exciting new era of info and computer science … It’s a new world in which the ability to understand the world and people and draw conclusions will be really quite remarkable… It’s a fundamentally different way of doing computer science.” -Steve Ballmer
However, in practice there are pitfalls and dangers, especially when employing machine learning for the first time. It’s easy to become swamped by the sheer number of methods, there is a whole new vocabulary to learn, and it is often difficult to choose an algorithm for a given problem. Often, especially when data is unstructured (which is increasingly the case) it’s hard to work out which off-the-shelf method fits the problem, and instead one has to resort to coercing the data to fit the problem. It’s also often not clear how do to deal with noisy, or missing or corrupted data.
This post is about a different viewpoint called “model-based machine learning” 1, which tackles these difficulties, can solve problems in a wide range of application domains, and covers most existing machine learning algorithms as special cases. It will also allow us to deal with uncertainty that we encounter in real-world applications in a principled manner.
Since the invention of the Perceptron algorithm 2, huge numbers of algorithms have been created to solve various specialist tasks. A common approach to solving problems involves trying a few different algorithms, often in practice guided by familiarity, or due to the presence of a particular toolbox in the language being used, rather than it being the most appropriate for the problem. Each algorithm will also have parameters that often require careful tuning, and these often don’t map to intuitive concepts. As a result, practitioners often resort to using the defaults set by the authors of a given toolbox, or exhaustive searches of the parameter space.
If you can’t find an algorithm that fits your problem, you are left with two options: modify your problem until it fits some standard framework, or invent a new algorithm. Whilst the latter may get you a NIPS or ICML paper, this is not a viable option for most. Model-based machine learning instead offers tailored solutions.
The central idea underpinning the model-based approach to directly encode any problem-specific assumptions, with any available prior knowledge, in the form of a (mathematical) model. These include the number and types of variables in the problem domain, and the factors that determine their interaction. A model-specific algorithm is then (automatically) generated. This approach can be used to do any standard machine learning task, such as classification, regression, or clustering, whilst improving understanding and control over how these tasks are accomplished.
If we take the example of a network of different kinds of sensors in a real world environment, these will introduce different sources of uncertainty. We might have sensors that are simply not working, or that are giving incorrect readings. More generally, a given sensor will at any given time have a particular signal to noise ratio, and the types of noise that are corrupting the signal might also vary.
As a result we need a principled framework for quantifying and computing with uncertainty. In the model-based approach we build a model of how the data was generated, which can directly incorporate the noise models for each of the sensors. It is easy to see that probabilistic Bayesian graphical models are a natural fit to the model-based framework.
“The subjectivist states his judgements, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science.” -I.J. Good
The key insight that is often overlooked is that any off-the-shelf algorithm has underlying assumptions of its own (intentionally or not), although often these are ill-defined. The result is that the algorithm behaves like a “black box”, meaning empirical comparisons are necessary, for example by nested cross-validation. This is laborious and inefficient, and there are many pitfalls to this approach 3. If no algorithm gives adequately good results the way forward is even more unclear.
From a model-based viewpoint, to make predictions using the model we need to plug the observed data into model, and compute the probabilities of the possible values a variable can take after the relevant evidence is taken into account - a process known as “inference”.
Exact and Approximate Inference
By separating model and inference in this manner, the same method of inference can be applied to a wide variety of models, or alternatively different inference methods can be used for the same model. Perhaps the simplest approaches to inference are simulation based approaches such as Markov chain Monte Carlo (MCMC), which can shown to be correct in the long run, but slow to converge. For any “realistic” model with more than a trivial quantity of data, we may hit the limits of computational tractability. Deterministic approximate inference methods, such as Expectation Propagation and Variational Bayes, make it possible to learn models by trading off computation time for accuracy.
“An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.” -John Tukey
Comparing Models
Because we have clearly laid out the assumptions when creating models, it becomes easier to compare them both qualitatively and quantitatively. This is especially important if there are parts of the modelling process that are difficult to pin down.
“Statisticians, like artists, have the bad habit of falling in love with their models.” -George Box
If we can compute the probability of a model given the the data (not assuming any particular model parameters), this is the “Model Evidence” (or marginal likelihood). This quantity can then be used to compute the so-called “Bayes factor”, which can be thought of as a Bayesian alternative to classical hypothesis testing 45, by taking the ratio of the evidence for each model. For models where the evidence is too costly to evaluate numerically, approximate Bayesian computation can be used instead.
Case Study: The SPHERE Project6
Many countries are experiencing the effects of an ageing population, which coupled with a rise in chronic health conditions is encouraging a shift towards the managing health related issues in the home. The SPHERE (a Sensor Platform for HEalthcare in a Residential Environment) project [6] has designed a multimodal sensor system and analytics platform for this purpose. Naturally, the SPHERE setting presents many sources of uncertainty. Firstly, we are dealing with multiple sensor modalities (environmental, body-worn, video), each of which will have different noise profiles and failure modes. Secondly, annotated or labelled data is expensive and intrusive to acquire, and the resulting labels are potentially noisy and inaccurate. Lastly, patterns of human behaviour are subject to many factors that may or may not be attributed to the particular health context of a given individual. In this project we are making use of model-based methods throughout.
Tools for Model-Based Machine Learning
Because of the separation of the model from the method of inference, it also becomes possible (if by no means trivial) to create software that is able to take the model, as specified using some form of modelling language or API, and then automatically generate inference routines (possibly even by automatically generating source code!) to solve a wide variety of models. This allows a new breed of engineer - effectively a “modeller”, who does not need to know about the specifics about the inference method being used. Some examples of software packages that seek to achieve this are:
- Infer.NET. A software framework developed at Microsoft Research Cambridge for running Bayesian inference in graphical models. It can also be used for probabilistic programming.
- BUGS. A Bayesian modelling framework using MCMC methods.
- Church. A probabilistic programming language designed for expressive description of generative models .
- Stan. A probabilistic programming language implementing full Bayesian statistical inference with MCMC sampling, approximate Bayesian inference with Variational inference and penalized maximum likelihood estimation.
- GPy. Gaussian processes framework in python, from the Sheffield machine learning group.
- PyMC. A python module that implements Bayesian statistical models and fitting algorithms, including MCMC.
See the ‘early access’ model-based machine learning book at http://www.mbmlbook.com.
Summary
Machine learning is being successfully applied to a large number of real-world problems as you read this post. In some cases, all that is needed is the “best guess answer”, and there also happens to be an off-the-shelf tool for the job. However, there are also a large class of other scenarios: where either no such tool exists, where quantifying the uncertainty is of great importance or where you want to be able to introspect on the answers given by the system. Model-based machine learning provides a compelling path to tackling all of these scenarios.
References
Winn, J., Bishop, C.M., Diethe, T. (2015). Model-Based Machine Learning. Microsoft Research Cambridge. http://www.mbmlbook.com. ↩
Rosenblatt, F. (1957). The perceptron, a perceiving and recognizing automaton. Report 85-460-1. Cornell Aeronautical Laboratory. ↩
Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of cheminformatics, 6(1), 1-15. ↩
Goodman, S. N. (1999). Toward evidence-based medical statistics. 1: The P value fallacy. Annals of internal medicine, 130(12), 995-1004. ↩
Goodman, S. N. (1999). Toward evidence-based medical statistics. 2: The Bayes factor. Annals of internal medicine, 130(12), 1005-1013. ↩
Zhu, N., Diethe, T., Camplani, M., Tao, L., Burrows, A., Twomey, N., Kaleshi, D., Mirmehdi, M., Flach, P. & Craddock, I. (2015). Bridging e-Health and the Internet of Things: The SPHERE Project. Intelligent Systems, IEEE, 30(4), 39-46. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7156004 ↩