When: Friday, September 24, starting at 10:30
Where: Aula Magna, Casa Convalescència
ECML PKDD 2010 Industrial Session will consist of invited presentations on selected topics in machine learning and data mining from industry perspective, and a panel on the future research challenges and opportunities in data mining and machine learning from leading experts in industry.
10:30 : Welcome and Introduction
[Updated on Tuesday 21st: Pau Agulló replaces José Luis Flórez for NeoMetrics. We regret to announce that Mayank Bawa (Aster Data) cannot attend the conference.]
Industrial Session Chairs
Rakesh Agrawal, MSR
The thesis of this talk is that Internet is enabling applications that become smarter more they are used because of the data generated in the process and that data mining is the key technology for building such applications. We will provide examples from web search in support of the assertion.
Human-Centered Pattern Analysis
Alejandro Jaimes, Yahoo! Research
Human-behavior is repetitive in nature and thus lends itself nicely to the application of a variety of pattern analysis algorithms. This includes social behavior, interaction with media, with the environment, and with the web, among others. In this talk I will argue that in order to obtain a higher value from pattern analysis, both for business goals and for improving user experience, it is critical to take a Human-Centered approach that takes into consideration a variety of human factors and work in several disciplines. I will briefly give some examples in social and media interaction and place a particular focus on the analysis of large-scale search query logs. Query logs provide a very sparse picture of users' actions, but they can be a valuable resource for gaining insights into what people are doing, how they are doing it, and why they are doing it. I will discuss strategies for query-log analysis, and explain why a Human-Centered approach is required, highlighting the implications for algorithm and user interface design. Finally, I will discuss future directions and challenges and describe how integrating multiple sources of data (e.g., demographics, context, etc.) can help fill in the gap to gain better user understanding and have stronger impact.
KNIME. Integrating Data, Tools, and Science
Michael Berthold, KNIME
For years it has been a well-known fact that data analysis projects spend only a small fraction of time on actual analysis. Much more time is spent gathering, integrating and preparing the data for analysis. Still, many data analysis tools focus on the analytical parts only. In this talk we will present the core technology behind KNIME, an open source integration and analysis platform. In addition to offering comprehensive built-in ETL, analysis and visualization methods, KNIME's open API facilitates the integration of other tools. The underlying modular architecture enables a coherent and transparent fusion of the diverse data sources spread out over the corporate IT environment, while at the same time integrating existing legacy tools and other data processing and analysis methods. We will show real-world examples of KNIME being successfully deployed as an integration and analysis backbone and how it can be used to quickly deploy new science, e.g. new methods for the analysis and exploration of data at the same time. We will also take the time to provide a brief overview of how the graphical, modular representation of a data workflow enables complex data processing and analysis procedures to be documented, archived and communicated.
Mining social networks
Pau Agulló, NeoMetrics
Traditional consumer behaviour analysis considers customers as individual observations, independent from everyone else. On this framework, future behaviour is only dependent on profile and past behaviour. Reality, however, is more complex. People interact with each other to exchange information and opinions on products and services. These contacts have an impact on future customers' behaviour. Social networks analysis allows organizations to identify connections among customers, what communities exist and what roles each user plays in those communities. Certain users influence others and others are influenced. Managing correctly customer relationships using this new dimension will provide a sustainable competitive advantage to those who succeed in mastering it.
Flexible QSAR: functional machine learning in computational chemistry
Ignasi Belda, Intelligent Pharma
QSAR (Quantitative Structure-Activity Relationship) modelling is a usual step in drug discovery. QSAR methods use statistical and machine learning tools to draw out the significant relationships between the molecular structure of the drug candidates (the molecules) and its biological profile. To achieve such a goal, researchers usually describe the molecules with arrays of physico-chemical properties, such as total molecular charge, molecular weight, number of hydrogen bonds donors, etc. However, the predictive accuracy of statistical and machine learning tools in QSAR have been typically very low and more advanced tools are needed to achieve higher degrees of usage of QSAR drug discovery processes. For such a reason, at Intelligent Pharma, we have been researching in the field of functional data mining, that is, data mining of information described through functions, not only with fixed properties. By using functional data mining approaches, we can deal with physico-chemical parameters such as the volume of the molecules, which is a variable property that varies depending on the energy of the system and its flexibility. Therefore, more accurate predictive models can be built by using these approaches in the field of QSAR and drug discovery. The machine learning tool used in this research is support vector machines.
Machine Learning in Microsoft's Online Services: TrueSkill, AdPredictor, and Matchbox
Thore Graepel, MSR
Machine Learning plays a crucial role in Microsoft's online services. In this talk, I will describe three powerful applications of machine learning.
All three systems have in common that they are based on techniques from graphical models and approximate Bayesian inference, yet operate at large scale. I will discuss the underlying models and algorithms as well as application-specific insights and findings. Time permitting, I will show the three systems in action. This is based on joint work with Ralf Herbrich, David Stern, Thomas Borchert, Tom Minka, and Joaquin Quiñonero Candela.TrueSkill is Xbox Live's Ranking and Matchmaking system and ensures that gamers online have balanced and exciting matches with equally skilled opponents.