Machine Learning Seminar by Jan N. van Rijn – Automating the Data Science Pipeline: AutoML, Meta-learning and OpenML

Short Bio

As one of the founders of OpenML.org, Jan van Rijn developed an experiment database for Machine Learning research. By storing results of earlier Machine Learning experiments, we attempt to model what algorithms work well on what data. These meta-studies can be applied on many areas of Machine Learning, such as General Data Science, Data Stream Research and Subgroup Discovery.

Abstract

Data Science and Machine Learning are at the basis of many scientific discoveries across various Scientific domains. By allowing Machine Learning models, we are able to discover and research more complicated patterns than possible when resorting to human expertise. However, machine learning techniques are not easy to wield, and require substantial data and training time to be applied adequately, and to have the results interpreted correctly.

In this talk, I will address several developments that aim to further automate the data science pipeline, and assist data owners in applying appropriate models for their data, in particular AutoML, meta-learning and OpenML. The field of Automated Machine Learning (AutoML) develops tools that can help domain scientists and experts in applying machine learning tools to their data. The field of Meta-learning develops techniques that leverage knowledge from previous experience, and allows for building adequate models based on less data. During my PhD, we developed OpenML, an on line experiment database for storing results from previous Machine Learning experiments. In this talk I will elaborate on the knowledge that we can gain from this, and how this can be applied to further automate the data science pipeline across research domains.