Document Type

Thesis - Open Access

Award Date


Degree Name

Master of Science (MS)

Department / School


First Advisor

Matthew Elliott


The USDA survey-based Quarterly Agriculture Stocks (QAS) reports are the primary source of information regarding the relative supply of U.S. corn, soybeans, and wheat for the last fifty years. Research has examined USDA stock reports and their relevancy to the market (e.g., Isengildina-Massa et al., 2021). In addition, private industry analysts estimate expected quarterly grain stock reports before USDA releases them. Market information firms such as Bloomberg and Reuters publish a subset of these estimates a few days before the USDA reports. Previous research has found that when industry analysts have significant differences in stock expectations compared to what the USDA releases for grain stocks, market prices adjust rapidly to what the USDA found in their survey. Many media outlets and previous research attribute the differences in expectations and changes in market prices to a "market surprise" (e.g., Karali et al. (2020)). Market analysts, USDA officials, and researchers have offered four reasons for market surprises in the grain stocks reports. First, USDA surveys may need to account for grain in transit when surveying stocks. Second, the market often uses weight (e.g., 60 lbs per bushel) to determine supply, while survey estimates ask how much volume (e.g., bushels) is on the farm or in commercial storage. When there is a deviation in the average weight of a commodity for a season, there could be discrepancies between surveyed stocks and actual stocks by weight. Third, errors in estimating what portion of existing stocks are from old or new crop production may cause surprises in the final annual report before a change in the marketing year. For example, USDA asks in their survey how much old crop corn is on hand on September 1st, although some crops taken in by grain wholesalers can be new crops by this date. There can be discrepancies when the survey respondent must accurately segregate the new and old crop amounts. Fourth, USDA survey-based stock reports contain survey noise. Market analysts may need to account for survey noise in sequential estimates. This paper seeks to use AI methods and large datasets on grain movement to understand the primary reason market analysts are frequently surprised by USDA QAS reports. Given the recent surge in grain movement data, available grain quality data, and data on the output of significant demand sources of grain, particularly at a state level, it is possible to use advances in analyzing high dimensional data (e.g., random forest, gradient boosting) to develop an objective artificial intelligent (AI) market analyst. This paper aims to explore additional public data sources related to commodity demand and supply in the corn, wheat, and soybean markets and apply AI techniques to determine whether data analytics improves the prediction of QAS reports released by USDA for corn, soybeans, and wheat compared to market analysts estimates. Our primary research objective is to determine if AI can more accurately predict QAS estimates from USDA than the survey of Market analysts that Bloomberg and Reuters have historically provided. Our secondary objective is to decompose the surprise by the source of the surprise. In this effort, we use the Extreme Gradient Boosting ML model to predict the stock estimate of the three major commodities (Corn, Soybean, and Wheat). We used grain stocks and production by state, carry-over stock from the previous year, weekly grain loaded on trains and barges, weekly ethanol production, monthly ethanol crushed, and weekly accumulated exports, market analysts' estimates from Bloomberg and Reuters from the year 2007 to the 4th quarter of 2022. We aggregated all these features every quarter to understand the estimate of stock. After accumulating all the features, we cross-checked the values with the national report of these particular years we found consistency among them. This means the features show actual values from each quarter to understand the accurate estimate of the stock. We also grouped each feature according to 10 Agricultural Regions. We found through our machine learning algorithm that production is the most important one to estimate the quarterly stock, with carry-over and accumulated exports in 2nd and 3rd most essential features of the model. We also found that ethanol production and grain exports have an inverse relation with the grain stock every quarter.


South Dakota State University


Rights Statement

In Copyright