Social Media, Data Mining & Machine Learning: Machine Learning: Two interesting trends

There are two general Machine Learning articles that I think are very relevant to understand actual and possible future trends in Machine Learning:

[1] Dietterich, Thomas G., Machine Learning Research: Four Current Directions. The AI Magazine, Volume 18, pp 97-136. 1997.

[2] Domingos, Pedro, Machine Learning. In W. Klosgen and J. Zytkow (eds.), Handbook of Data Mining and Knowledge Discovery (pp. 660-670), 2002. New York: Oxford University Press.

Now we are on 2006, almost 10 years later than the publication date of the first article, but some of its reasonings are also valid today. But when thinking about actual and future trends in Machine Learning, I prefer to start from Pedro Domingo's point of view. Machine Learning has found in Data Mining (KDD) its business partner and it has been a crucial meeting just because KDD means more money for Machine Learning research and also means real applications (not only theoretical research).

¿And what is and can be useful from Machine Learning for KDD? First at all, the preprocessing step in KDD can be considered as (or even much) important than the knowledge discovery stage, and that's why new, effective and fast Feature Extraction, Construction and Selection ([3]) algorithms seem to be as important as new Classifiers or Clusters. From my point of view, there is a need for theoretical/practical papers comparing the performance and scalability of actual Feature Selection algorithms including filters, wrappers, embedded and even genetic based approach.

There are a lot of classifiers, clusters and regression algorithms and that's (with the good results with actual algorithms) why I think developing new algorithms is not a crucial task, but as [1] exposes, Emsemble Learning can be very useful as it studies how to combine existing algorithms to achieve better results. More useful, even, is study distribute multi agent ensemble of classifiers (as there are many intrinsic distributed problems).

[3] JMLR Special Issue on Variable and Feature Selection

2 comments:

AnonymousJanuary 17, 2007 at 1:54:00 PM GMT+1
"From my point of view, there is a need for theoretical/practical papers comparing the performance and scalability of actual Feature Selection algorithms including filters, wrappers, embedded and even genetic based approach."

During my studies, I've done such a comparison, but there's a need to do it better.

The article:
http://emotion.inrialpes.fr/bibemotion/2005/DBS05/
JoSeKJanuary 28, 2007 at 4:16:00 PM GMT+1
Thanks a lot for the reference, Pierre :)

Wednesday, December 06, 2006

Machine Learning: Two interesting trends

2 comments: