François Waldner1, Yang Chen2, Roger Lawes2, Zvi Hochman1
1 CSIRO Agriculture and Food, 306 Carmody Rd, St Lucia QLD 4067, Australia; email@example.com
2 CSIRO Agriculture and Food, Underwood Ave, Floreat WA 6014, Australia; firstname.lastname@example.org
Most cropping systems around the world are organised around few dominant crops and a larger number of less frequent crops. Data about the location of infrequent crops derived from satellite data are generally inaccurate, largely owing to the class imbalance problem. Class imbalance occurs when only few instances of some classes are available for classifier training and leads to large classification errors of the infrequent classes. Here, we assessed the magnitude of the class imbalance problem in crop classification and evaluated data-level treatments to combat it by creating synthetic minority instances. We generated 18 unbalanced data sets from Sentinel-2 time series and crop type observations in Victoria, Australia. These data sets covered a wide range of complexity, number of classes, number of samples per class and spectral separability. Classification accuracy was assessed with two metrics: the Overall Accuracy (OA), which gives more weight to majority classes, and the G-Mean accuracy (GM), which is more sensitive to minority classes. We found that data-level treatments boosted GM by 0.1-0.35 and that the price for increasing the accuracy of minority classes is a drop in OA. While oversampling methods have clear potential to improve the classification of minority crop types, more control over the loss of overall accuracy needs to be gained before transitioning these methods to operations.