When:
Friday, September 24
Where:
Aula Magna, Casa Convalescència
ECML PKDD 2010 Industrial Session will consist of invited presentations on selected topics in machine learning and data mining from industry perspective, and a panel on the future research challenges and opportunities in data mining and machine learning from leading experts in industry.
Industrial Session Invited Speakers
- Rakesh Agrawal (Microsoft Search Labs)
- Mayank Bawa (Aster Data)
- Ignasi Belda (Intelligent Pharma)
- Michael Berthold (KNIME)
- José Luis Flórez (NeoMetrics)
- Thore Graepel (Microsoft Research)
- Alejandro Jaimes (Yahoo! Research)
Industrial Session Chairs
- Taneli Mielikäinen (Nokia)
- Hugo Zaragoza (Yahoo! Research)
Industrial Session Program
Machine Learning in Microsoft's Online Services: TrueSkill, AdPredictor, and Matchbox
Thore Graepel, MSR
Machine Learning plays a crucial role in Microsoft's online services. In this talk, I will describe three powerful applications of machine learning.
- TrueSkill is Xbox Live's Ranking and Matchmaking system and ensures that gamers online have balanced and exciting matches with equally skilled opponents.
- AdPredictor is the system that estimates click-through rates (CTR) for ad selection and pricing within Microsoft's search engine Bing.
- Matchbox is a large scale Bayesian recommender system that combines aspects of collaborative filtering and content-based recommendation. It is currently being used for tweet recommendation within projectemporia.com.
All three systems have in common that they are based on techniques from graphical models and approximate Bayesian inference, yet operate at large scale. I will discuss the underlying models and algorithms as well as application-specific insights and findings. Time permitting, I will show the three systems in action. This is based on joint work with Ralf Herbrich, David Stern, Thomas Borchert, Tom Minka, and Joaquin Quiñonero Candela.
Flexible QSAR: functional machine learning in computational chemistry
Ignasi Belda, Intelligent Pharma
QSAR (Quantitative Structure-Activity Relationship) modelling is a usual step in drug discovery. QSAR methods use statistical and machine learning tools to draw out the significant relationships between the molecular structure of the drug candidates (the molecules) and its biological profile. To achieve such a goal, researchers usually describe the molecules with arrays of physico-chemical properties, such as total molecular charge, molecular weight, number of hydrogen bonds donors, etc. However, the predictive accuracy of statistical and machine learning tools in QSAR have been typically very low and more advanced tools are needed to achieve higher degrees of usage of QSAR drug discovery processes. For such a reason, at Intelligent Pharma, we have been researching in the field of functional data mining, that is, data mining of information described through functions, not only with fixed properties. By using functional data mining approaches, we can deal with physico-chemical parameters such as the volume of the molecules, which is a variable property that varies depending on the energy of the system and its flexibility. Therefore, more accurate predictive models can be built by using these approaches in the field of QSAR and drug discovery. The machine learning tool used in this research is support vector machines.
KNIME. Integrating Data, Tools, and Science.
Michael Berthold, KNIME
For years it has been a well-known fact that data analysis projects spend only a small fraction of time on actual analysis. Much more time is spent gathering, integrating and preparing the data for analysis. Still, many data analysis tools focus on the analytical parts only. In this talk we will present the core technology behind KNIME, an open source integration and analysis platform. In addition to offering comprehensive built-in ETL, analysis and visualization methods, KNIME's open API facilitates the integration of other tools. The underlying modular architecture enables a coherent and transparent fusion of the diverse data sources spread out over the corporate IT environment, while at the same time integrating existing legacy tools and other data processing and analysis methods. We will show real-world examples of KNIME being successfully deployed as an integration and analysis backbone and how it can be used to quickly deploy new science, e.g. new methods for the analysis and exploration of data at the same time. We will also take the time to provide a brief overview of how the graphical, modular representation of a data workflow enables complex data processing and analysis procedures to be documented, archived and communicated.
Data Externality.
Rakesh Agrawal, MSR
|