Forecasting Yield of Coffee Crop Varieties C×R, Sln3 and Sln5B: A Stochastic Machine Learning Model Based on Agro-Ecological Factors using Multivariate Feature Selection Approach
Document Type
Article
Publication Title
Organic Farming
Abstract
Accurate forecasting of coffee crop yield is essential for enhancing agricultural decision-making, ensuring food security, and mitigating environmental risks. India cultivates both Arabica and Robusta across more than one hundred registered varieties. In this study, yield forecasts were developed for three representative varieties—C×R, Sln3, and Sln5B—using agro-ecological data collected from 2015 to 2022 at the Central Coffee Research Institute (CCRI), Coffee Research Station, Balehonnur, Karnataka, India. A stochastic machine learning framework was employed to identify and evaluate the most influential agro-ecological predictors through a multivariate feature selection approach coupled with correlation matrix analysis. Optimal predictors were organized into three distinct parameter groups, which were then used as inputs to four regression models: Extra Trees (ET), Gradient Boosting (GB), Random Forest (RF), and Decision Tree (DT). Independent testing revealed that the ET model consistently provided the highest accuracy. For C×R, yield was most accurately predicted using Group-1 parameters, such as coffee leaf rust (CLR), minimum temperature (Tmin), maximum temperature (Tmax), relative humidity (Rh), rainfall (Rf), organic carbon (OC), phosphorus (P), potassium (K), pH, plant spacing (Sp), and plant age (Ag), achieving a coefficient of determination (R2) of 0.98 with a Root Mean Square Error (RMSE) of 8.61 kg ha-1. For Sln3, Group-3 parameters, such as CLR, Tmin, Tmax, Rh, Rf, OC, P, K, pH, Ag, Sp, minimum sunshine hours (SSmin), maximum sunshine hours (SSmax), vapor (Vp), and dew point (Dp), produced an R2 of 0.98 with an RMSE of 8.27 kg ha-1, while for Sln5B, Group-3 parameters yielded an R2 of 0.97 with an RMSE of 7.79 kg ha-1. These results demonstrate the superiority of the ET algorithm compared with GB, RF, and DT models, which exhibited comparatively lower predictive accuracy. Simulation outcomes further revealed that age, rainfall, and the incidence of CLR were among the most decisive agro-ecological determinants of yield. These findings underscore the potential of stochastic machine learning models, particularly the ET model, for enhancing yield prediction and identifying agro-ecological drivers of coffee productivity.
First Page
203
Last Page
226
DOI
10.56578/of110305
Publication Date
9-1-2025
Recommended Citation
Santhosh, Chandagalu Shivalingaiah; Umesh, Kattekyathanahalli Kalegowda; Hemanth, Venkatesh; and Narendra, Khatri, "Forecasting Yield of Coffee Crop Varieties C×R, Sln3 and Sln5B: A Stochastic Machine Learning Model Based on Agro-Ecological Factors using Multivariate Feature Selection Approach" (2025). Open Access archive. 12662.
https://impressions.manipal.edu/open-access-archive/12662