|
Title: |
On the Stratification of Multi-Label Data |
Author(s): |
K. Sechidis, G. Tsoumakas, I. Vlahavas.
|
Availability: |
Click here to download the PDF (Acrobat Reader) file (15 pages).
|
Keywords: |
|
Appeared in: |
Proceedings of ECML PKDD 2011, Athens, Greece, 2011.
|
Abstract: |
Stratified sampling is a sampling method that takes into account the existence of disjoint groups within a population and produces samples where the proportion of these groups is maintained. In single-label classification tasks, groups are differentiated based on the value of the target variable. In multi-label learning tasks, however, where there are multiple target variables, it is not clear how stratified sampling could/should be performed. This paper investigates stratification in the multi-label data context. It considers two stratification methods for multi-label data and empirically compares them along with random sampling on a number of datasets and based on a number of evaluation criteria. The results reveal some interesting conclusions with respect to the utility of each method for particular types of multi-label datasets. |
See also : |
|
|
|