Wednesday, June 25, 2008

The Need for Open Source Software in Machine Learning

Reading Undirect Grad blog, I found an interesting paper about the need of more Open Software in Machine Learning. The abstract:
Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for diverse applications. However, the true potential of these methods is not used, since existing implementations are not openly shared, resulting in software with low usability, and weak interoperability. We argue that this situation can be significantly improved by increasing incentives for researchers to publish their software under an open source model. Additionally, we outline the problems authors are faced with when trying to publish algorithmic implementations of machine learning methods. We believe that a resource of peer reviewed software accompanied by short articles would be highly valuable to both the machine learning and the general scientific community.
I think this paper addresses a very interesting problem, not only for the ML community. As said in the paper, "Open Source model allows better reproducibility of the results, quicker detection errors, innovative applications, faster adoption of ML methods in other disciplines", but it also avoids a constant reinvention of the wheel, and is a fairer model because if most of the researchs are funded by public money, why should researchers stop the access to the code?

The same happens with publications. Open Access should be a neccesary condition for every public funded research. Luckily, there are several iniciatives all around the globe trying to spread the benefits of the Open Access model, as Harvard's addoption of Open Access or the support of the Comunidad de Madrid (a Spanish region) to several Open Access iniciatives (sorry for the link in Spanish).

In recent years, the ML community has improve in this aspects. We count on a very good Open Source ML framework as Weka, we have a top Open Access Journal as JMLR that also supports ML Open Source software and a very good Open Source software repository like MLOSS.

3 comments:

  1. Very interesting. I've also had good experience with the open source libraries Joone (neural networks) and RapidMiner (wide variety of data mining / machine learning tasks).

    ReplyDelete
  2. I haven't used RapidMiner. Some time ago I tested Yale (I think RapidMiner is the Yale evolution) but I had some problems to make it work on my Mac. I've been watching RapidMiner tutorial and it seems they've worked hard on it. I'll spend some time testing it again as it seems very interesting.

    Thanks for the reference :)

    ReplyDelete
  3. Wonderful information, Very impressive and interesting blog. Keep on updating/posting nice article like this...
    Basic & Advanced Excel Training in Chennai
    Excel VBA Macro Training in Chennai
    MOS Certification Training in Chennai

    ReplyDelete