Reference: “The Role of Occam's Razor in Knowledge Discovery.” Data Mining and Knowledge Discovery, 3, 409-425, 1999.
I found this paper, by Pedro Domingos, very interesting. It analyzes the role developed by the Occam’s Razor in KDD and studies (referencing a lot of papers where Occam’s Razor is explicitly or implicitly applied) the correctness of applying it.
Pedro Domingo says the Occam’s Razor in KDD can be seen as
- “Given two models with the same generalization error, the simpler one should be preferred because simplicity is desirable in itself” or...
- ...“Given two models with the same training-set error, the simpler one should be preferred because it is likely to have lower generalization error.”
The second version of the Occam’s Razor is clearly mistaken, as a low training-set error usually derives into overfitting, and a high generalization error. Pedro Domingos gives some arguments to favor and against this second one, concluding that “the second version is provably and empirically false”. The first version seems right but uses simplicity as a proxy to comprehensibility, resulting in some quite different from Occam’s Razor.