Machine Learning for Newbies

. Saturday, June 10, 2006

In [1] Pedro Domingos recopilates some Machine Learning related stuff that can be useful for a beginner in the field. There is a list of journals

  1. Machine Learning
  2. Journal of Artificial Intelligence Research
  3. Neural Computation
  4. IEEE Transactions on Pattern Analysis and Machine Intelligence
  5. Journal of Machine Learning Research
  6. Journal of Machine Learning Gossip
A list of conferences
  1. International Conference on Machine Learning
  2. European Conference on Machine Learning
  3. International Joint Conference on Artificial Intelligence
  4. National Conference on Artificial Intelligence
  5. European Conference on Artificial Intelligence
  6. Annual Conference on Neural Information Processing Systems
  7. International Workshop on Multistrategy Learning
  8. International Workshop on Artificial Intelligence and Statistics
  9. International Conference on Computational Learning Theory (COLT)
  10. European Conference on Computational Learning Theory
Other resources
  1. UCI repository of machine learning databases
  2. Online bibliographies of several subareas of machine learning
  3. Machine Learning List
  4. AI and Statistics List
  5. MLC++
  6. Weka
There are more useful resources for a researcher, I add my favourites
  1. Citeseer
  2. DBLP
  3. Rexa
  4. DataSets for Data Mining and Knowledge Discovery
  5. DataMining Conferences
  6. Machine Learning (Theory)

[1] Pedro Domingos, "E4 - Machine Learning" In W. Klosgen and J. Zytkow (eds.), Handbook of Data Mining and Knowledge Discovery (pp. 660-670), 2002. New York: Oxford University Press. (Download)

Bias in Machine Learning


In statistics, the term bias is used in two different ways

  1. A biased sample is a statistical sample where their members have not the same probability to be chosen.
  2. A biased estimator is one estimator that over or understimates the quantity to be estimated.
In Machine learning the term bias is more related to the biased estimator as it is applied to the classifiers. As can be seen in [1], the bias can be expressed as

reflecting sensivity to the target function f(x). The bias represents "how closely on average the estimate is able to approximate the target". The bias has direct effects on the predicted error as we can decompose it as follows [2]

[1] J. H. Friedman, "On bias, variance, 0/1 loss, and the curse-of-dimensionality", Data Mining and Knowledge Discovery vol.1, nº 1, 55-77, 1997. (Download).
[2] G. M. James, "Variance and Bias for General Loss Functions", Machine Learning 51, nº 2, 115-135, 2003. (Download)

Mixture of Experts


Mixture of Experts is based on the "Divide and Conquer" doctrine. The problem is divided into manageable sizes for several experts and each expert learns locally from a part of the problem domain and then the outputs from these experts are combined to provide a global output.

Mixture of Experts are oriented to Neural Network, being each expert a neural network that learns only from a part of the problem and the outputs are combined by human knowledge or by gating networks. But Mixture of Experts seems to be an abstract paradigm and could be applied with other classifiers.

Basic References
[1] Jong-Hoon Oh and Kukjin Kang, "Experts or an Ensemble? A Statistical Mechanics Perspective of Multiple Neural Network Approaches" (Download)
[2] Jordan, M. I. "Hierarchical Mixtures of Experts and the EM algorithm" Neural Computation 6, 181-214, 1994 (Download)

Linear Transformation Methods


I'm interested on Linear Transformation Methods that allow us to transform an initial representation into another representation where the components are, in some way, independents. Our work with FBL (a Wrapper to improve Naive Bayes by deleting the dependent attributes) tries to do something similar and I would compare all those methods to FBL. The Linear Transformation Methods I've found are

  1. Principal Component Analysis
  2. Factor Analysis
  3. Projection Pursuit
  4. Independent Component Analysis
  5. Independent Factor Analysis
  6. Generalized Additive Models