Forecasting Yield of Coffee Crop Varieties C×R, Sln3 and Sln5B: A Stochastic Machine Learning Model Based on Agro-Ecological Factors using Multivariate Feature Selection Approach

Document Type

Article

Publication Title

Organic Farming

Abstract

Accurate forecasting of coffee crop yield is essential for enhancing agricultural decision-making, ensuring food security, and mitigating environmental risks. India cultivates both Arabica and Robusta across more than one hundred registered varieties. In this study, yield forecasts were developed for three representative varieties—C×R, Sln3, and Sln5B—using agro-ecological data collected from 2015 to 2022 at the Central Coffee Research Institute (CCRI), Coffee Research Station, Balehonnur, Karnataka, India. A stochastic machine learning framework was employed to identify and evaluate the most influential agro-ecological predictors through a multivariate feature selection approach coupled with correlation matrix analysis. Optimal predictors were organized into three distinct parameter groups, which were then used as inputs to four regression models: Extra Trees (ET), Gradient Boosting (GB), Random Forest (RF), and Decision Tree (DT). Independent testing revealed that the ET model consistently provided the highest accuracy. For C×R, yield was most accurately predicted using Group-1 parameters, such as coffee leaf rust (CLR), minimum temperature (Tmin), maximum temperature (Tmax), relative humidity (Rh), rainfall (Rf), organic carbon (OC), phosphorus (P), potassium (K), pH, plant spacing (Sp), and plant age (Ag), achieving a coefficient of determination (R2) of 0.98 with a Root Mean Square Error (RMSE) of 8.61 kg ha-1. For Sln3, Group-3 parameters, such as CLR, Tmin, Tmax, Rh, Rf, OC, P, K, pH, Ag, Sp, minimum sunshine hours (SSmin), maximum sunshine hours (SSmax), vapor (Vp), and dew point (Dp), produced an R2 of 0.98 with an RMSE of 8.27 kg ha-1, while for Sln5B, Group-3 parameters yielded an R2 of 0.97 with an RMSE of 7.79 kg ha-1. These results demonstrate the superiority of the ET algorithm compared with GB, RF, and DT models, which exhibited comparatively lower predictive accuracy. Simulation outcomes further revealed that age, rainfall, and the incidence of CLR were among the most decisive agro-ecological determinants of yield. These findings underscore the potential of stochastic machine learning models, particularly the ET model, for enhancing yield prediction and identifying agro-ecological drivers of coffee productivity.

First Page

203

Last Page

226

DOI

10.56578/of110305

Publication Date

9-1-2025

This document is currently not available here.

Share

COinS