Menu:

Latest news:

December 19, 2008:
Add a section on online learning, with some references + recent literature.

September 06, 2008:
Update publication's list (IEICE transaction acceptance).

May 23, 2008:
Add notes section + early draft for multivariate calculus

April 29, 2008:
Update publication's list

July 31, 2007:
Update pyem doc, update publication list.

Stochastic approximation

Since learning can often be seen as the optimization of a function of the observations, online learning can often be seen as an online optimization problem. The problem has seen considerable attention from the control theory field, for example. Most of the theory I am aware of is based on stochastic approximation, which can be sucintely described as the optimization of an unknown function, when only noisy observations are available.

The first paper seems to be the Robbins Monro paper in 1951. See Wikipedia for references.

There is a lot of litterature on the related problems from a probability theory POV. In particular, Robbins Monro-like methods can be analysed through martingale theory, which is very rich

  • Probability with Martingale, by David Williams. A first course in probability based on Martingale. Highly recommended, very insightful !
  • See also here for more references

There is a book "Stochastic approximation and Recursive Algorithms And Applications" by Kushner and Yin, which I find difficult to read. I don't know if it is the writing, or my lack of background in martingale theory.

Online learning

Online EM

EM is very popular in Machine learning, maybe the most well known algorithm. Several online variant have been suggested. The best paper I have come across in this vein is Online EM for latent data models, by Cappé et al, available on arxiv. The paper is still readable for someone like me without a strong theoretical background, while still proving the convergence properties with some insights on practical issues (local averaging, maybe better known as Polyak averaging in the signal processing field).

Other papers:

  • On-line EM Algorithm for the Normalized Gaussian Network, by Sato and Ishi, 1999. Available on Citeseer. This one is lighter on theoretical considerations.
  • 'On-line EM and Quasi-Bayes or: How I Learned to Stop Worrying and Love Stochastic Approximation', By Kejie Bao, master thesis. If I remember correctly, that's how I got interested in online techniques at all. Consider several online-EM inspired algorithms, applied to a music retrieval task. K. Bao also coauthored other papers, with some convergence consideration.

Online gradient

A very good overview can be seen in the following paper by Bottou:

  • 'Online Algorithms and Stochastic Approximations', in Online Learning and Neural Networks, Edited by David Saad, Cambridge University Press, Cambridge, UK, 1998.

I am still not clear about the similarites and differences between the kind of online gradient descent as presented in this paper, and online EM. A key point of online-EM (and EM, for that matter), is to consider some 'completed' data to make the optimization (be it MLE, MAP, etc...) easier to deal with compared to the observation only, and online-EM as described by Cappé IS different than online descent in the parameter space of the expected log-likelihood (the usual optimized function in EM for the MLE), which is the method presented by Bottou. Differences between direct optimization in the parameter space and online EM is discussed in the paper by Cappé et al. On the other hand, the convergence properties are based on the same theory (The 'noise' is proved to be 'like' martingale in the large number of samples limit), so I may just have missed something here. This may be a topic worth being pursued here.

Online SVM

Know nothing about this. Keep the reference here when I will have more time: there is a paper by Antoine Bordes on online SVM:

  • 'The Huller: a Simple and Efficient Online SVM', by Antoine Bordes, Léon Bottou. in Machine Learning: ECML 2005,505-512. Springer Verlag. (LNAI 3720)