Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

AUC as Performance Metric in ML

. Friday, July 04, 2008
11 comments

ROC analysis is a classic methodology from signal detection theory used to depict the tradeoff between hit rates and false alarm rates of classifiers (Egan 1975, Swets 2000). ROC graphs has also been commonly used on medical diagnosis for visualizing and analyzing the behavior of diagnostic systems (Swets 1998). Spackman (Spackman 1989) was one of the first machine learning researchers to show interest in using ROC curves. Since then, the interest of the machine learning community in ROC analysis has increased, due in part to the realization that simple classification accuracy is often a poor metric for measuring performance (Provost 1997, Provost 1998).

The ROC curve compares the classifier's performance accross the entire range of class distributions and error costs (Provost 1997, Provost 1998). A ROC curve is a two-dimensional representation of classifier performance, which can be useful to represent some characteristics of the classifiers, but makes difficult to compare versus other classifiers. A common method to transform ROC performance to a scalar value, that is easier to manage, consists on calculate the area under the ROC curve (AUC) (Fawcett 2005). As the ROC curve is represented in a unit square, the AUC value will always be between 0.0 and 1.0, being the best classifiers the ones with a higher AUC value. As random guessing produces the diagonal line between (0,0) and (1,1), which has an area of 0.5, no real classifier should have an AUC less than 0.5.

Fig. 1. Example of ROC graphs, figure extracted from (Fawcett 2005). Subfigure a shows the AUC of two different classifiers. Subfigure b compares the graph of a scoring classifier B, and a discrete simplification of the same classifier, A.

Figure 1a shows two ROC curves representing two classifiers, A and B. Classifier B obtains higher AUC than classifier A and, therefore, it is supposed to behave better. Figure 1b shows a comparison between a scoring classifier (B) and a binary version of this classifier (A). Classifier A represents the performance of B when it is used with a fixed threshold. Though they represent almost the same classifier, A's performance measured by AUC is inferior to B. As we have seen, it can not be generated a full ROC curve from a discrete classifier, resulting in a less accurate performance analysis. Regarding this problem, in this paper we focus on scoring classifiers, but there are some attempts to create scoring classifiers from discrete ones (Domingos 2000, Fawcett 2001).

Hand and Till (Hand2001) present a simple approach to calculating the AUC of a given classifier.





REFERENCES

Performance Metrics

. Monday, June 30, 2008
0 comments

Performance metrics are values calculated from the predictions of the classifiers that allow us to validate the classifier's model. Definitions of these performance metrics are usually calculated from a confusion matrix. The figure 1 shows a confusion matrix for a two-class problem, that serves as example for describing the basic performance metrics. In the figure
  • π0 denotes the a priori probability of class (+).
  • π1 denotes the a priori probability of class (-); π1 =1-π0
  • p0 denotes the proportion of times the classifier predicts class (+).
  • p1 denotes the proportion of times the classifier predicts class (-); p1=1-p0.
  • TP is the number of instances belonging to class (+) that the classifier has correctly classified as class (+).
  • TN is the number of instances belonging to class (-) that the classifier has correctly classified as class (-).
  • FP is the number of instances that, belonging to class (-), the classifier has classified as positive (+).
  • FN is the number of instances that, belonging to class (+), the classifier has classified as negative (-).


Fig. 1
: Confusion matrix that generates the needed values for standard performance metrics


The precision is the percentage of true positive instances from all the instances classified as positive by the classifier; precision=TP/(TP+FP).The accuracy is the percentage of correctly classified instances; accuracy=(TP+TN)/π1.There are other approximations to estimate the classifier's performance that are used when dealing with a large set of classes. One of those approaches is Fβ that tries to compensate the effect of no uniformity in the instances' distribution among the classes. Fβ is calculated as follows



Van Rijsbergen in (vanRijsbergen, 1979) states that Fβ measures the effectiveness of retrieval with respect to a user who attaches $\beta$ times as much importance to recall as precision. One of the most typical uses of Fβ is the harmonic mean of precision and recall, F1.Traditionally, evaluation metrics like recall, precision and Fβ have been largely used by the Information Retrieval community. Classification accuracy has been the standard performance estimator in Machine Learning for years. Recently, the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, traditionally used in medical diagnosis, has been proposed as an alternative measure for evaluating the predictive ability of learning algorithms.

REFERENCES