Document Type

Thesis - Open Access

Award Date


Degree Name

Master of Science (MS)

Department / School

Agronomy, Horticulture, and Plant Science

First Advisor

Kristopher Osterloh


Soil sampling and analyses play a crucial role in optimizing nutrient management and enhancing crop productivity. However, collecting representative samples across diverse landscapes is challenging due to knowledge gaps about spatial variability of soil properties, large fields, multiple samples, and analysis costs. Collecting soil samples based on the management zones can help farmers gather precise information about soil properties with fewer samples. Recent developments in precision agriculture and machine learning. This study aimed to develop machine learning models that can learn, analyze, and refine landscape and soil properties data for automated selection of soil sampling zones and generating prediction maps. Accordingly, random forest regression and classification models were built using data from four individual fields each with 12 features and five management zones for each field as target for model training and testing. Later a generalized model was developed by combining data from seven corn fields and seven soybean fields to improve predictive performance of the model which was evaluated on two new fields. The classification model for the four fields achieved overall accuracies of 0.71, 0.61, 0.75 and 0.69, kappa scores of 0.69, 0.58, 0.65 and 0.6, and F-scores of 0.7, 0.58, 0.75 and 0.59, respectively. Regression model yielded R2 values of 0.71, 0.67, 0.83 and 0.76 and RMSE values of 6.7, 7.94, 5.45 and 2.4, respectively. The generalized model achieved overall accuracy of 0.8 and 0.75, Kappa score of 0.71 and 0.59, F-1 score of 0.93 and 0.95 for soybean and corn field respectively. Despite achieving higher results generalized models failed to predict the management zones accurately. This could be due to limitation of model transferability and adaptability to various field conditions. This demonstrates the need to create and utilize high-resolution data with more spatial variability which will provide a comprehensive dataset for model training. Addition of other features including environmental variables, biomass indices that are more correlated to yield helps to improving the predictive performance of the models. Overall, this work establishes a foundational framework for novel applications of remote sensing data and machine learning techniques in addressing soil sampling challenges.


South Dakota State University



Rights Statement

In Copyright