Crop yield prediction is critical for agricultural insurance, better risk management, and efficient production strategies. In this study, we propose a novel machine-learning framework to predict county-level corn yield using daily weather data, weather-derived features based on zigzag topology persistence, and static soil information. Our approach integrates a Long Short-Term Memory (LSTM) network to capture sequential weather patterns and a shallow feed-forward network for yearly weather topology persistence. Two network outputs are concatenated with soil data to feed into a deep feed-forward network optimized to predict the corn yield at a county level for each year. The model was trained on datasets of 1391 county-year pairs and tested on 50 pairs. Performance was compared to a Convolutional Graph Neural Network (CGNN), a Transformer, Support Vector Regression (SVR) with a radial basis function (SVR-RBF), Extreme Gradient Boosting (XGBoost), and combined Vector Autoregression-SVR (VAR-SVR) models. Among the models, the proposed LSTM-DNN and CGNN models excelled. The goodness of fit (R2), root mean square error (RMSE, Mg/ha), mean absolute error (MAE, Mg/ha), and mean absolute percentage error (MAPE, %) were 0.79, 21.2, 15.5, and 5.49% for the LSTM-DNN model, whereas the nearest competitor, the CGNN model, performed those metrics as 0.65, 22.2, 17.4, and 7.09, respectively. The LSTM-DNN model’s accuracy and computational simplicity were slightly better than the CGNN model. The LSTM-DNN model explained around 79% of the county average yield variability from weather and soil data. The MAPE of 5.49% from observed yields reflects a reliable tool for estimating the yield. Future studies could fine-tune the model with more accurate and high-resolution data instead of county-level records.
https://ieeexplore.ieee.org/document/11118506

Figure: Proposed contextualized LSTM-DNN model.