University of Minnesota
Software Engineering Center
/

You are here

Vladimir Cherkassky; CodeFreeze 2014

Prof. Cherkassky published over 120 technical papers and book chapters in the areas of computer networks, modeling and optimization, statistical learning and artificial neural networks. His current research is on theory and applications of methods for predictive learning from data, and he has co-authored a monograph Learning From Data published by Wiley in 1998.

Prof. Cherkassky is a senior member of IEEE and a member of the International Neural Network Society (INNS). He served on the Governing Board of INNS from 1996 to 1998. He served as Associate Editor of IEEE Transactions on Neural Networks (TNN) in 1998. He is currently on editorial board of Neural Networks (the official journal of INNS), Natural Computing: An International Journal and Neural Processing Letters. He was a Guest Editor of the IEEE TNN Special Issue on VC Learning Theory and Its Applications published in September 1999.

Dr. Cherkassky was organizer and Director of NATO Advanced Study Institute (ASI) From Statistics to Neural Networks: Theory and Pattern Recognition Applications held in France in 1993. He presented numerous tutorials and invited talks on statistical and neural network methods for learning from data at various conferences/scientific meetings in Europe, North America and Asia. He received IBM Faculty Partnership Award in 1996 and 1997 for his work on learning methods for data mining.

From Big Data to Little Knowledge
The main intellectual appeal of ‘Big Data’ is its promise to generate knowledge from data. This talk will provide critical evaluation of the popular view ‘more_data = more_knowledge’, using both philosophical and technical arguments. In the philosophy of science, data-driven knowledge discovery is known as the problem of induction (or inductive inference). It has been known and studied by scientists and philosophers for ages. In particular, the problems of induction and (classical) knowledge discovery have been thoroughly investigated in Western philosophy of science. Later, in the 20-th century, two different technical methodologies for making (mathematically) rigorous inferences from data have been developed by Ronald Fisher (~ classical statistics) and by Vladimir Vapnik (~ VC-theory). Recent growth of digital data produced many data-analytic techniques developed by mathematicians/statisticians, engineers, biologists, computer scientists, economists etc. Yet current understanding of the important methodological aspects of these data-analytic algorithms (among practitioners and researchers) is very rudimentary or non-existent.

My talk will expand on:
The philosophical aspects of data-driven knowledge discovery, e.g. the difference between classical scientific knowledge and modern data-analytic knowledge. The difference between classical statistics and predictive (VC-theoretical) methodology. In particular, VC-theoretical methodology is more appropriate for estimating predictive models, and it has clear philosophical interpretation (which is very different from classical statistics). Unfortunately, confusion often arises when machine learning algorithms (that implement VC-theoretical framework) are presented/interpreted via classical statistical framework. Practical importance of VC-theoretical methodology for data mining applications. These practical aspects include: (a) formalization of application domain requirements, (b) parameter tuning (aka model complexity control) and (c) interpretation of predictive (black-box) models.

All philosophical and methodological points presented in this talk will be illustrated using application examples ranging from image recognition to financial engineering and life sciences.

Return to CodeFreeze 2014 Program