Session Details

Plenary and Panel Sessions

Plenary I:

Plenary II:

Plenary III:

Plenary IV: Agency Leadership on Policy and Practice

Moderated Panel: Agency Leadership Moderated Panel on Policy and Practice, Setting and Communicating Boundaries for Appropriate Use of AI

Session Date/Time: Wednesday, November 20th, 9:30am-10:20am Session Location: Hall of Champions at Kyle Field

Session Description:

Moderated Panel: AI Opportunities at Federal Agencies

Long Session Title

Session Date/Time: Session Location:

Session Description:

Plenary V: Scientific Closing

Oral Presentations

Applied Tools

Session Date: Wednesday, November 20th
Session Time: 10:30am-12:00pm
Session Location: Ross
Session Moderator:

1. (10:30-11:00) - Raster Tools: Using Predictive AI To Create Useful Information
Hogland, John, USFS Jesse Johnson, Fredrick Bunt
Big Data streams and predictive AI are fundamentally changing the way resource management decisions can be made. The use of remotely sensed data, ever expanding computer technology, and enhanced processing techniques can provide natural resource managers with depictions of ecosystems at unprecedented spatial and temporal resolutions. While these sources of information are somewhat being leveraged to inform decision making, the sheer amount of data currently being collected has outpaced our abilities to efficiently manipulate and use those data to their fullest, for decision making. Newer tools, algorithms, and processing approaches are needed to realize the potential of predictive AI coupled with the volume, variety, and velocity of big data streams for natural resources. Important questions related to data scale, relevance, transformation, as well as the types of tools needed to efficiently extract useful information for decision making are at the forefront data science and natural resource management. To that end we have developed a python based geospatial processing library called Raster Tools that automates delayed reading and parallel processing while seamlessly integrating popular machine learning libraries and predictive modeling techniques through python’s software ecosystem. To demonstrate the utility of our processing paradigm we highlight two case studies that use Raster Tools to perform spatial, statistical, and predictive AI. Within a natural resource setting, representative training data can be expensive to collect. Our first case study uses Raster Tools, Landsat 8 remotely sensed imagery, and USGS’s national elevation dataset to address this issue by creating a well spread and balanced sample. Our spread and balance technique encodes multidimensional predictor variable space into one dimension using pseudo-Hilbert Space Fitting curve distances, orders those distances, and systematically select sample observations from the ordered distances. To illustrate these concepts, we present a Jupyter notebook on Google’s Colab. Our second case study uses Raster Tools, and the sample collected in our first case study to build an ensemble of K-nearest neighbor (EKNN) models. We then further demonstrate how to use our EKNN with Rater Tools to estimate percent forest cover and standard error for every 30 m2 in the area around Custer Gallatin National Forest located in Montana. Like our first case study, we demonstrate our processing approach and further evaluate our outputs using Colab. At the forefront of data driven decision making is the development of spatial, statistical, and machine learning techniques that fully leverage existing hardware and adopt newer processing strategies to integrate big data sources seamlessly and easily with the decision-making process. Packages such as Raster Tools facilitate this integration while also providing functionality that can be used to further our understanding of natural resources while simultaneously providing the computational framework to optimize and justify management decisions at both scale and extent. While these tools facilitate Big Data analytics, they also necessitate a broader understanding of the role of spatial data and analyses within decision making. Moreover, they highlight the need for easy access to and integration with various open source and proprietary software systems.

2. (11:00-11:30) - AIFARMS / CropWizard: Generative AI Techniques and Applications for Agriculture
Adve, Vikram S., AIFARMS National AI Institute, University of Illinois
AIFARMS Team
I will start with a very brief overview of the research in the AIFARMS national AI institute for Agriculture, funded by USDA NIFA. In the rest of the talk, I will present the CropWizard project in AIFARMS, which is exploring ways in which generative AI can be used for a wide range of agricultural tasks. Today, the CropWizard system supports interactive question answering and research for agricultural professionals, by consulting a large database of over 400,000 technical documents, including Extension publications and open-access research documents. CropWizard can also answer questions about images and invoke computational tools for quantitative questions; these capabilities are being improved and enhanced for accuracy and scope. Several external users and companies are exploring uses for CropWizard in production settings. Ongoing research is exploring ways in which generative AI can be used for more advanced reasoning, planning, and data discovery, in order to support advanced quantitative decision making for complex, open-ended questions in modern agriculture. More information about these topics is available at: https://aifarms.illinois.edu/ https://uiuc.chat/cropwizard-1.5/

3. (11:30-12:00) - An AI-driven decision support tool for real-time Integrated Pest Management in agriculture
Singh, Arti, Iowa State University
Soumik Sarkar, Baskar Ganapathysubramanian, Muhammad Arbab Arshad, Hossein Zaremehrjerdi, Timilehin Ayanlade, Shivani Chiranjeevi, Lucas Nerone Rillo, Venkata Naresh Boddepalli, Yanben Shen, Talukder Jubery, Asheesh K Singh, Adarsh Krishnamurth.
We present an end-to-end artificial intelligence-driven decision support tool that revolutionizes Integrated Pest Management (IPM) by combining state-of-the-art computer vision models with an intelligent conversational agent. Our InsectID model, trained on 16 million images across 4,000 insect species, achieves robust identification capabilities with 97.2% accuracy under field conditions. The companion WeedID system, leveraging 15 million training images spanning 1,581 weed species, demonstrates 96.8% accuracy in diverse agricultural settings. These deep learning models incorporate uncertainty quantification and out-of-distribution detection to ensure reliable real-world performance. The PestIDBot decision support tool integrates InsectID and WeedID applications to our specialized AgLLM trained on comprehensive IPM literature and expert knowledge bases. PestIDBot provides context-aware responses to farmers’ queries about pest management strategies, chemical interventions, biological control options, and economic thresholds. The platform delivers real-time insights through a unified mobile interface, enabling farmers to make rapid, informed decisions about pest management interventions. Our solution bridges the gap between advanced AI models and practical agricultural applications, demonstrating how integrated technological solutions can enhance sustainable pest management practices while remaining accessible to end-users.

Collaboration and Education

Session Date: Wednesday, November 20th
Session Time: 1:00pm-2:20pm
Session Location: Corps Session Moderator:

1. (1:00-1:20) - Smart Demo Farm Sites and Testbeds for AI enabled Technology Education
Khot, Lav, Washington State University
Bernardita Sallato, Markus Keller, R. Troy Peters, Manoj Karkee
This presentation will cover our team’s efforts tied to Washington (WA) Tree Fruit Research Commission, WA Wine Commission, and the USDA NIFA funded AgAID Institute in regard to establishing Smart Demo Farm Sites. These sites (e.g., Smart Apple Orchard, Smart Vineyard) are established for not only collecting user-inspired cases specific data generation to develop AI models but also serve as testbeds for testing, evaluating, & validate emerging smart agricultural technologies through synergies public-private partnerships. Presentation will also cover how these testbeds help disseminate knowledge in K-12 students, teachers and undergraduate internship trainings as well as grower education through on-site field days, workshops, etc. to help realize meaningful adoption of the relevant smart agriculture technologies.

2. (1:20-1:40) - Assessing the Performance of Generative AI in Retrieving Information against Manually-Curated Genetic and Genomic Data
Sen, Taner, ARS
Elly Poretsky, Victoria Blake, Carson Andorf
Curated resources at centralized repositories provide high-value service to users by enhancing data veracity. Curation, however, comes with a cost, as it requires dedicated time and effort from personnel with deep domain knowledge. In this paper, we investigate the performance of a Large Language Model (LLM), ChatGPT, in extracting and presenting data against a human curator. In order to accomplish this task, we used a small set of journal articles on wheat genetics, focusing on traits, such as salinity tolerance and disease resistance, that are becoming more important as climate change is continuously impacting agriculture globally. The 21 papers were then curated by a professional curator for the GrainGenes database (https://wheat.pw.usda.gov). In parallel, we developed a ChatGPT-based retrieval-augmented generation (RAG) question-answering (QA) system and compared how ChatGPT performed in answering questions about traits and quantitative trait loci (QTLs). Our findings show that on average GPT4 correctly categorized manuscripts 90% of the time, correctly extracted 82% of traits and 63% of marker-trait associations (MTAs). Furthermore, we assessed the ability of a ChatGPT-based DataFrame agent to filter and summarize curated wheat genetics data, showing the potential of human and computational curators working side-by-side. In one case study, our findings show that GPT4 was able to retrieve up to 91% of disease related, human-curated QTLs across the whole genome and up to 96% across a specific genomic region through prompt engineering. Also, we observed that across most tasks, GPT4 consistently outperformed GPT3.5 while generating less hallucinations, suggesting that improvements in LLM models will make Generative AI a much more accurate companion for curators in extracting information from scientific literature. Despite their limitations, LLMs demonstrated a potential to extract and present information to curators and users of biological databases, as long as users are aware of potential inaccuracies and the possibility of incomplete information extraction.

3. (1:40-2:00) - Building an AI and Climate-Smart Agriculture and Forestry Community of Best Practice: A Collaborative Approach
Haag, Shawn, University of Minnesota
This presentation proposes a framework for creating a community of best practice focused on integrating AI tools into climate-smart agriculture and forestry. The initiative aims to foster collaboration among USDA researchers, external partners, and practitioners to share insights, align efforts, and accelerate the adoption of impactful AI applications. Through this community, stakeholders will engage in peer learning, applied research exchanges, and knowledge-sharing activities that promote practical AI use cases, such as carbon sequestration monitoring, soil health prediction, and precision farming. This session will explore how USDA and partner organizations can collaborate to build this community, and I will seek input on the community’s structure, implementation, and alignment with USDA priorities.

4. (2:00-2:20) - Uses of Artificial Intelligence in Agricultural Statistics
Abreu, Denise, NASS
Luca Sartore, Linda J. Young
Artificial intelligence (AI) is revolutionizing the agricultural sector in many impactful ways: from precision farming that helps farmers make data-driven decisions by analyzing information from sensors, drones, and satellites; to drones equipped with AI and computer vision technology that can monitor crop health and detect diseases; to autonomous tractors that plant, water, and harvest crops with minimal human intervention thereby increasing efficiency and reducing labor costs for famers. The National Agricultural Statistics Service (NASS) is the statistical arm of the United States Department of Agriculture (USDA) with a mission to provide timely, accurate, and useful statistics in service to U.S. agriculture. To achieve this mission, the Agency has been exploring the multifaceted uses of AI in statistical techniques with an emphasis on innovating both research and production activities in the development and dissemination of official statistics. The Agency’s subject matter experts are using AI techniques, including neural networks, Random Forest, XGboost, and quantum-inspired neural networks, to provide accurate crop predictions, aid in the development of environmental policy making, automate routine tasks, streamline workflows, and enhance quality control measures, among other tasks. This presentation will showcase multiple research and production endeavors at NASS and discuss the Agency’s vision for the future.

Computer Vision: Detection of Foreign Objects

Session Date: Tuesday, November 19th
Session Time: 3:00pm-4:20pm
Session Location: Traditions
Session Moderator:

1. (3:00-3:20) - Getting Deep in the Weeds (with DeepWeeds)
McCollam, Gerald A., ARS

This presentation addresses a significant obstacle to adoption of automated weed control: the robust classification of weed species in their natural environment. We consider a multiclass image dataset of weed species known as DeepWeeds. While most datasets capture their targets under ideal lab conditions, DeepWeeds intentionally captures its subject as it occurs in nature. Comprising 17,509 images, DeepWeeds is a relatively small dataset that reflects the natural variation found between weeds and crops in situ. Our work seeks to improve performance in this context through deep transfer learning and by means of various data enhancement techniques. In addition to image repetition and transformation, we explore multiple Generative Adversarial Network (GAN) models to generate synthetic images. Our findings with GANs show that while high visual quality can be achieved for low-resolution, unlabeled images, only mediocre visual quality is possible for high-resolution images. We used Frechet Inception Distance (FID) to assess our synthetic images. The generated sets scored poorly on this metric (lower is better), which ultimately proved to be unsuitable for model training. A surprising result was the high accuracy achieved by our adapted ResNet-50 model using k-fold cross-validation in the absence of augmentation. We attempt to show how this finding ultimately coheres with a more nuanced understanding of the class distribution and design goals of the DeepWeeds dataset. In addition, we provide a path forward towards improving our approach.

2. (3:20-3:40) - Semi-Automated Training of High-Speed Embedded AI Vision-Transformer Model
Holt, Greg, ARS
Mathew Pelletier, John Wanjura
This presentation details our advancements in machine-vision systems for detecting and removing plastic contamination from cotton, a critical issue for the U.S. cotton industry. The primary source of plastic contamination in marketable cotton bales stems from plastic module wrap used by John Deere round module harvesters. Despite efforts by cotton ginning personnel to remove these plastics during module unwrapping, contamination persists within the gin’s processing system. To address this, we initially developed a machine-vision detection and removal system using low-cost color cameras. The system identifies plastic on the gin-stand feeder apron and activates air jets to remove it from the cotton stream. However, the system required extensive manual calibration and tuning, involving 30-50 computers running on low-cost ARM Linux systems—a challenging task for typical gin workers due to its technical complexity. To streamline this, we developed AI models to eliminate the need for manual calibration, significantly reducing manual input and improving system performance. As robust AI models require an extensive number of images upon which to train, for example the Git and Blip foundational Vision-to-Caption models utilized over a million images in their training. Even with transfer learning techniques, a robust model still requires 10’s of 1000’s of images; each of which has to be manually classed by a human technician. Given the inordinate amount of time and cost associated with manually annotating this many images; we developed a novel approach where we leveraged Git and Blip, by running them in parallel and passed their outputs into a custom semantic classifier (used 500 images in training the semantic classifier, validated it on 2500 images). The result was a slow by high accuracy AI image classifier capable of automatically classifying our larger image dataset. This approach allowed for automatically classifying 20,000 images that were then used to develop a high-speed Vision Transformer (ViT) model that was designed to autonomously detect difficult-to-identify plastics and “Hand-Intrusion-Events” (HID). The high-speed AI ViT model enables real-time image classification for our plastic detection system by providing it with the capability to eliminate false-positives and thereby perform self-calibration, eliminating the need for skilled personnel and simplifying system operation and enabling wider stake-holder adoption.

3. (3:40-4:00) - Artificial Technology (AI)-Driven Approaches to Phytoplasma Disease Diagnosis in Agriculture
Shao, Jonathan, ARS
Wei Wei
Phytoplasmas are unculturable, plant-pathogenic bacteria responsible for severe crop diseases, such as little cherry and grapevine yellows. These diseases lead to significant crop losses, affecting agricultural productivity, farmers’ livelihoods, and global food security. Currently, diagnosing phytoplasma infections relies on labor-intensive molecular techniques, which are impractical for farmers and growers in the field due to the need for expert knowledge and specialized equipment. In response to these limitations, this study aims to develop an AI-based diagnostic system to improve early detection of phytoplasma infections. The study employs tomato plants infected with potato purple top (PPT) phytoplasma as a test case. A training dataset of 8000 images (4000 healthy and 4000 PPT phytoplasma-infected) and a testing/unseen dataset of 1600 images (800 healthy and 800 PPT phytoplasma-infected) were collected, respectively. The 8000 images were implemented in TensorFlow to train five convolutional neural network (CNN) models: four pre-trained architectures/models (VGG-16, Google Inception V3, NASNet, DenseNet201) using transfer learning techniques, and a custom CNN model. Pre-trained models achieved accuracy rates between 94% and 99%, while the custom model achieved 90% to differentiate healthy and phytoplasma-infected tomato plants. Validation using unseen data (1600 images) demonstrated strong performance, and ensemble learning is being explored to further enhance the model’s accuracy. This research highlights the potential for AI to revolutionize the diagnosis of phytoplasma diseases, making detection faster, more accurate, and accessible to farmers in the field.

Computer Vision II

Session Date: Wednesday, November 20th
Session Time: 1:00pm-2:20pm
Session Location: Corps
Session Moderator:

1. (1:00-1:20) - Automatic Species Classification and Diversity Analysis Using Physical Features and Machine Learning
[Serfa Juan, Ronnie O.], ARS
Lester O. Pordesimo, Alison R. Gerken
In agricultural environments, even post-harvest, effective pest management relies on timely identification and monitoring of insect species to minimize crop damage and economic losses. This study proposes an automated approach to species classification and diversity analysis by leveraging machine learning (ML) and physical feature extraction from high-resolution images. Several species of beetles are major pests in stored grain environments. Here, we develop a system that uses identification and classification through image processing techniques to extract key morphological features such as body shape, antennae structure, elytra texture, and coloration. These features are then fed into advanced ML models, including support vector machine, kNN, random forest, Naïve Bayes, and decision trees, to classify species with high accuracy. A significant challenge in pest management is that multiple similar-looking insect species often coexist, contributing to infestations at varying levels of severity. To address this, the proposed system not only classifies individual species but also performs diversity analysis. By identifying and quantifying species richness and abundance, it enables the monitoring of mixed populations, which may require different pest control strategies. Features such as body aspect ratio, circularity, texture patterns, and segment length are automatically detected from images and used to distinguish between species, even in complex mixed-population scenarios. Techniques such as contour detection and segmentation are employed to isolate and measure specific body parts, which serve as key identifiers for each species. Machine learning models are trained on annotated datasets of five common stored-product pest beetle species—Maize Weevil, Rusty Grain Beetle, Sawtoothed Grain Beetle, Red Flour Beetle, and Lesser Grain Borer. By incorporating both species-specific morphological characteristics and machine learning techniques, the system achieves robust classification performance. Additionally, diversity analysis is applied to monitor temporal shifts in species composition, aiding farmers in understanding changes in pest populations over time. This not only helps in early detection of new pest species but also tracks the effectiveness of pest control methods by observing species abundance before and after treatments. The integration of physical feature extraction with machine learning offers an efficient, automated solution for pest species identification and diversity analysis. The system can be deployed in real-time to aid farmers in making informed pest control decisions, enhancing sustainable agricultural practices and reducing the impact of insect infestations on crop yield.

2. (1:20-1:40) - Turfgrass Germplasm Improvement by Leveraging AI-Based High-Throughput Phenotyping Technologies
Barnaby, Jinyoung, ARS
Yonghyun Kim
Evaluating stress responses in large breeding populations is both labor-intensive and time-consuming, often leading to phenotypic assessments being conducted only at the final stage of stress. However, stress progression rates can vary significantly by genotype. The development of a low-cost, automated, greenhouse-based red/green/blue (RGB) imaging system, coupled with a machine learning-based image processing platform, has enabled a time- and labor-efficient evaluation of daily drought progression to assess genetic performance in drought tolerance within a bentgrass hybrid population. This extensive temporal phenotypic data, integrated with genomic information, supports quantitative trait loci (QTL) mapping—an essential first step toward identifying candidate genes associated with physiological drought tolerance mechanisms. This will ultimately lead to more effective and rapid selection of drought-resilient turfgrass germplasm. Tiller number is a key yield indicator in all grass species. Many of our bentgrass hybrid lines produce over 1,000 tillers, necessitating the development of an efficient method for counting them. The creation of a bentgrass tiller counting model using the YOLOv8 CNN framework has made tiller counting a feasible yield metric in large-scale analyses of bentgrass species, which was previously impractical due to labor inefficiencies. Both machine learning-based image phenotyping systems developed will greatly enhance the efficiency of plant breeders. Moreoever, the data generated from these systems will also aid in the genetic mapping of bentgrass breeding populations, facilitating the identification of genomic regions linked to drought tolerance and yield improvements. This will ultimately support the development of improved turfgrass varieties for consumers.

3. (1:40-2:00) - Spatial predictions of soil moisture across a longitudinal gradient in semiarid ecosystems using UAVs and RGB sensors
Duarte, Efrain, ARS Alexander Hernandez, Peter Porter and Holden Brecht
Unmanned aerial vehicles (UAVs) offer an efficient method for assessing and monitoring physical phenomena, including soil moisture (SM), particularly in semiarid regions. UAV-based RGB sensors were used to collect high-resolution imagery, and hundreds of SM samples were gathered concurrently with the UAV flights across nine study sites over a large latitudinal gradient in the western USA. We evaluated the predictive power of RGB bands, texture metrics, and vegetation indices for estimating SM using machine learning algorithms. The model showed a moderately acceptable predictive accuracy (R² = 0.63 using cross validation) and (R² = 0.53 using a fully independent validation). Texture metrics such as “mean” and “entropy,” as well as the Excess Green Index (ExG) vegetation index showed the maximum predictive power while RGB bands showed minimal performance. The resulting spatial predictions showed high reliability (α < 0.01) for the States of Utah and California but exhibited a poorer performance for Idaho and Montana. We provide linear equations for the conversion of raw digital number (DN) values to reflectance, facilitating remote sensing applications that benefits from UAV simple and highly affordable RGB imagery. Our protocol provides a robust pathway to modelling SM with cost-effective solutions for monitoring semiarid ecosystems.

DASH: Enterprising AI and Phenotyping through Digital Ag Systems Hub

Session Date: Tuesday, November 19th
Session Time: 3:00pm-4:30pm
Session Location: Oak
Session Moderator: Amanda Hulse-Kemp

1. (3:00-3:15) - Enterprising AI and Phenotyping through DASH
Hulse-Kemp, Amanda, ARS Steven Mirsky, Chris Rehberg-Horton
As new methods for artificial intelligence and machine learning become available for utilization it is critical to apply and integrate these methods into breeding and production agriculture. The integration allows acceleration of breeding programs, more precise and sustainable agricultural management and potential for mitigating and adapting to climate change. All activities in agriculture rely on accurately being able to phenotype, or measure plant (or animal) characteristics in order to make decisions. There is currently a limitation of translation into the field in this space, leading to wasteful duplication of labor and resources and a mismatch between target and technology deployed. We are proud to launch the new Agricultural Research Service initiative, the Digital Ag Science Hub (DASH) to target enterprising AI and ML technologies into breeding and production agriculture. We will share some initial targets of DASH, including working with USDA’s SciNet team to expand capacity and utilization of the system. We look forward to working with groups across the agency and across USDA, to address stakeholder driven challenges.

2. (3:15-3:30) - MLOps for deploying and improving models
Reberg-Horton, Chris, North Carolina State University
Steven Mirsky and Amanda Hulse-Kemp
Computer vision is transforming agricultural research, sparking interest across diverse fields such as pest management and plant breeding, with the potential to revolutionize automated plant phenotyping. However, the challenge remains to move from proof-of-concept projects to practical, everyday tools for researchers. The DASH initiative is addressing this gap by developing a unified architecture that facilitates the training, reproduction, improvement, and deployment of machine learning (ML) models. This framework will integrate with platforms such as SciNet for large-scale model training, cloud and edge devices for deployment, and partnerships with key ARS initiatives. DASH will collaborate with SciNet, the AI Center of Excellence, and the Partnerships for Data Innovation (PDI) to ensure that these models are effectively scaled and accessible for agricultural scientists.

3. (3:30-3:45) Mirsky, Steven, ARS - PlantMap3D: DASH use-case

4. (3:45-4:00)- Ag Image Repository: A resource for the Ag Community
Kutugata, Matthew, ARS
Maria Laura Cangiano, Søren Kelstrup Skovsen, Muthu Bagavathiannan, Steven Mirsky, Chris Reberg-Horton
The Agricultural Image Repository (AgIR) is a resource designed to advance computer vision, artificial intelligence, and image-based phenotyping in precision agriculture. AgIR offers a diverse collection of annotated images covering key weeds, cover crops, and cash crops under various growth conditions. Images are categorized into two main scene types: Semi-Field, which allows high-throughput image collection using the BenchBot system, and real-world Field settings that capture natural growth conditions but are limited by manual collection. This combination provides scalability while ensuring real-world relevance, enabling the development of robust AI models for weed detection, crop monitoring, and plant trait analysis. AgIR’s collaborative effort spans nine states and ten research institutions, making it a valuable tool for researchers, developers, agronomists, and industry professionals working on data-driven solutions for sustainable agriculture.
(4:00-4:15) Demo - Visualize automated annotation pipelines

(4:15-4:30) Q&A Session

Data Integration and AI in Knowledge Management - a soil carbon use case

Session Date: Tuesday, November 19th
Session Time: 1:00pm-2:20pm
Session Location: Ross
Session Moderator: Dan Roberts

1. (1:00-1:20) - Ingestion and Integration of Soil Carbon Using PDI
Stewart, Cathy, ARS

2. (1:20-1:40) [Woodward-Greene, Jennifer], NAL - Development of Controlled Vocabularies for Data Interoperability
[Woodward-Greene, Jennifer], NAL

3. (1:40-2:00) - AI-Empowered Knowledge Graph for Accessing Soil Carbon Data
Li, Chengkai, University of Texas at Arlington

4. (2:00-2:20) - Knowledge Mesh Graphs at NOAA
Berkheimer, Ryan, NOAA

Disease Transmission Applications

Session Date: Tuesday, November 19th
Session Time: 1:00pm-2:20pm
Session Location: Traditions
Session Moderator:

1. (1:00-1:20) - Modeling Disease Transmission Trends: A Graph Neural Network Approach
Mooney, Amber, ARS
John Humphreys, Brian Stucky, Lee Cohnstaedt, Mel Boudreau, Chad Fautt, Amy Hudson
West Nile Virus (WNV) poses a significant public health threat, with its transmis- sion dynamics influenced by various environmental factors. This study employs geospatial analysis and artificial intelligence techniques to identify and char- acterize WNV hotspots in the Southern Climate Region, including the states of Kansas, Oklahoma, Texas, Arkansas, Louisiana, and Mississippi, aiming to enhance our understanding of high-risk areas for targeted intervention strategies. We applied a comprehensive approach to identify areas with elevated transmis- sion risk by utilizing spatial data on reported WNV cases, mosquito abundance, and climate variables. Our results reveal spatial clusters of WNV incidence, demonstrating the importance of environmental factors in shaping transmission patterns. The study also evaluates the contribution of specific vector species and climatic conditions to hotspot formation. Furthermore, we identify an arti- ficial intelligence model that can preserve the spatial structure and temporal trends associated with WNV data. Identifying high-risk areas through geospa- tial artificial intelligence analysis provides valuable insights for public health authorities and agricultural and livestock management to implement proactive measures, including targeted mosquito control and public awareness campaigns. This research contributes to the ongoing efforts to mitigate the impact of WNV by refining our understanding of the spatial determinants of virus transmission and informing evidence-based interventions.

2. (1:20-1:40) - Development and evaluation of a machine learning model to predict Rift Valley fever virus transmission risk for livestock
Willard, Lory, ARS
Heidi Tubbs, Karlyn Harrod, Bhaskar Bishnoi, Stephanie Schollaert Uz, Claudia Pittiglio, Assaf Anyamba, and Seth Gibson
Rift Valley fever (RVF) is a mosquito-borne viral hemorrhagic zoonosis largely confined to Africa and the Arabian Peninsula, which poses significant threats to public health, the agricultural economy, and food security worldwide. Rift Valley fever virus (RVFV) is transmitted to ungulate livestock primarily through the bite of infectious Aedes and Culex spp. mosquito vectors. Humans can become infected via handling or consumption of fluids or tissue of infectious livestock. Accurate forecasting of RVFV livestock transmission risk is crucial for mobilizing timely interventions and mitigating impacts. We previously implemented a threshold RVFV transmission model based on satellite derived normalized difference vegetation index (NDVI) data that has been deployed for use for over a decade. Recently, to improve both spatial and temporal accuracy of transmission risk, multiple machine learning models were developed utilizing a comprehensive suite of variables including: satellite-derived NDVI and rainfall datasets, human population and livestock distributions, soils and hydrologic data, and records of historical RVFV livestock cases. Recognizing that RVF outbreaks have been associated with El Niño–Southern Oscillation (ENSO) events across Africa, our model considers the unique teleconnection patterns of ENSO (El Niño and La Niña) across three regions of Africa (Southern Africa, Eastern Africa, and the Sahel). Classification models based on the presence or absence of RVF livestock cases were developed using random forest, XGBoost, K nearest neighbor, support vector machine, and neural network algorithms. Model performance was evaluated by comparing accuracy and ROC curves generated with an independent test set of livestock case data. All models have accuracy scores of 80% or greater. Validation against historical livestock case data demonstrates the model’s capability to identify high-risk periods and regions with improved precision and lead time, increasing the time available for health officials to implement mitigation measures in a more precise location.

3. (1:40-2:00) Fenster, Jacob, ARS - Protein-protein interaction prediction and design in virus and antibody systems
Fenster, Jacob, ARS
Paul A. Azzinaro, Mark Dinhobl, Manuel V. Borca, Edward Spinard, Douglas P. Gladue
Recent AI protein structural prediction models (AlphaFold, RosettaFold, etc.) have enabled a significant increase in the success of computational prediction and design of protein-protein interactions. While these tools have high performance on many protein complexes, the successful prediction of virus-virus protein-protein interactions and antibody-antigen interactions remains challenging due in part to the lack of genetic evolutionary information that is present in proteins that undergo traditional evolution through clonal or sexual reproduction in cellular organisms. This talk presents benchmarking data on predicting genome-wide virus-virus protein-protein interactions in the model Vaccinia virus to gain insight to the performance of AlphaFold2 in African Swine Fever Virus. In addition, in silico success rates of de novo designed viral epitopes to bind neutralizing antibodies for subunit vaccine development will be discussed.

4. (2:00-2:20) Simmons, Gregory, APHIS: Use of machine vision in an existing fruit packing house system for a quarantine pest as part of systems approach for export to the U.S

Food Science Applications

Session Date: Wednesday, November 20th
Session Time: 10:30am-11:50am
Session Location: Reveille
Session Moderator:

1. (10:30-10:50) - Machine Learning with Ingredient-Level Food Trees Reveals Contributors to Systemic Inflammation in the American Diet
Larke, Jules, ARS Danielle Lemay
Background: Methods for modeling the relationship between self-reported diet records and inflammation are limited and lack the rigor to adequately assess dietary complexity. Machine learning (ML) combined with alternative representations of diet may help to improve predictions of health outcomes over traditional methods. Objective: To determine if hierarchical ingredient-level representations of diet improve predictive models of systemic inflammation from a cross-sectional analysis using data on US adults (N=19,460) from the National Health and Nutrition Examination Survey (NHANES). Analysis: Mixed meal disaggregation was performed to generate an ingredient level representation of diet which was further annotated to produce a hierarchical data structure, or food tree. Hierarchical feature engineering selected the most informative food tree features for predicting systemic inflammation CRP. ML models were used to assess the accuracy of predicting CRP from the food tree features compared with the Dietary Inflammatory Index (DII) score and logistic regression was used to calculate marginal effects of ingredients identified from ML models. Results: Representation of diet as an ingredient-level food tree reduced dietary features from 6,412 unique foods to 566 unique ingredients. ML classifiers trained on food tree data predicted high versus low systemic inflammation (CRP tertile) with marginally higher accuracy (0.761) on held out data compared with models trained using DII scores (0.757). Individual dietary components revealed contributions towards increased inflammation including fruit punch, soda, and high-fat milk (marginal effects: 0.001 – 0.005, P < 0.05), and foods associated with decreased inflammation such as herbal tea, coffee, brown rice, and pasta (marginal effects: -0.08 – -0.001, P < 0.05). Conclusions: Specific ingredients, selected from a food tree, perform as well as the DII at predicting systemic inflammation. Choice of common beverages and staples associated with inflammation varied in magnitude and direction, implying specific dietary swaps (e.g. soda for tea/coffee, white rice for brown rice, etc.) have practical use for dietary guidance.

2. (10:50-11:10) - AI enabled detection of microbes in food systems Nitin, Nitin, University of California Davis
Luyao Ma, Howard Park, Jiyoon Yi, Nicharee Wisuthiphaet
The presentation will focus on AI approaches using optical imaging to improve speed, sensitivity, and specificity for detecting microbes, including bacteria and yeast, in food systems and their applications to enhance food safety and quality. The optical imaging approaches will focus on low-cost imaging measurements to acquire microbial data, and the data analysis methods will discuss various AI/machine learning approaches to detect and quantify the presence of target microbes in food systems. The presentation will also discuss the opportunities for industrial applications by simulating the detection of bacteria and yeast in different food matrixes, including fresh produce, dairy, and meat products. The presentation will also discuss the future steps to develop these technologies and their translation to field applications.

3. (11:10-11:30) - Application of artificial intelligence to enhance potato breeding and genetics
Feldman, Max ARS
Collins Wakholi, Devin Rippner, Mark Pavek, Manoj Karkee
Quantitative genetics and predictive breeding are data-intensive methods that associate haplotype inheritance and phenotypic characteristics in structured breeding populations. Scientists in the Temperate Tree Fruit and Vegetable Research Unit located in Prosser, WA and cooperators at Washington State University are using automation and machine vision to rapidly and inexpensively capture biologically important measurements from potato tubers. Our team developed an RGB-D imaging conveyor system that utilizes artificial intelligence to detect, track, capture, and extract measurements from images of individual potatoes. This platform was used to evaluate >75,000 individual tubers derived from ~1,300 samples (~32 breeding families). Our approach enables us to rigorously assess the inheritance of potato yield components (size of tubers, number of tubers), tuber shape descriptors, and potato skin color characteristics. We are currently working to train additional deep learning models to detect potato tuber defects including sprouting, growth cracks, secondary growth, and tuber greening.

4. (11:30-11:50) - Dietary polyphenol intake is associated with an altered gut microbiome and lower gastrointestinal inflammation and permeability

Wilson, Stephanie, Texas A&M University
Andrew Oliver (USDA ARS WHNRC), Danielle G. Lemay (USDA ARS WHNRC)
Background: Polyphenols are dietary bioactive compounds that can have anti-inflammatory and anti-oxidative properties. As most polyphenols reach the large intestine, they may influence inflammation within the gastrointestinal (GI) environment and impact the microbiome by shaping microbial community structure. However, few studies have assessed how polyphenol intake relates to GI health and the gut microbiome, particularly at a resolution higher than total polyphenol intake. Thus, we mapped diet data to FooDB to estimate intake of total polyphenols, polyphenol classes, and individual polyphenols in adults, then examined the relationship between polyphenol intake and markers of GI health and the gut microbiome. Methods: Healthy adults (n = 350) were recruited into an observational, cross-sectional study balanced for age, sex, and BMI (ClinicalTrials.gov: NCT02367287). We examined diet using multiple 24-hr dietary recalls (ASA24) then mapped ingredients to polyphenols within FooDB to estimate polyphenol intake. We paired intake data with microbial community profiles derived from fecal shotgun-sequenced metagenomes (n = 313). We analyzed whether dietary polyphenol intake at various resolution levels - total, class, compound - relates to systemic and GI inflammatory markers using standard and machine learning analyses. We also analyzed the relationship between microbial composition, engineered to remove redundant taxa, and polyphenol intake (lower vs upper intake quartiles) with PERMANOVA. Results: Mean total polyphenol intake was approximately 914 +/- 50 (SE) mg/1000 kcal per day with flavonoids as the greatest class contributor at 495 +/- 38 mg/1000 kcal per day. Total polyphenol intake negatively associated with the GI inflammation marker, fecal calprotectin (Beta =-0.004, p=0.04). At the class level, polyphenols classified as prenol lipids (Beta =-0.94, p<0.01) and phenylpropanoic acids (Beta =-0.92, p<0.01) negatively associated with lipopolysaccharide-binding protein, a measure of GI permeability. Using random forest regression, we found a positive relationship between C-reactive protein and the “cinnamic acids and derivatives” polyphenol class. Furthermore, we found that gut microbial composition differed between upper and lower quartiles of polyphenol intake (p=0.007), accounting for age, BMI, sex, and diet quality. The top differentiating taxa were a greater abundance of the family Clostridiaceae with high polyphenol intake and greater abundance of Bacteroides stercoris with low intake. Conclusion: Our results indicate that polyphenol consumption is associated with GI inflammation and permeability as well as gut microbial community structure in healthy adults.

Genomics I

Session Date: Tuesday, November 19th
Session Time: 1:00pm-2:20pm
Session Location: Reveille
Session Moderator:

1. (1:00-1:20) - Use of Artificial Intelligence for Vaccine Development Against Vector Pests: Challenges and Opportunities
Saelao, Perot, ARS
Bodine, D.M., Leucke, D., Bendele, K.G.
Conventional vaccine development can be costly and time intensive. In a field where candidate antigens are rapidly evolving, the pace of development and identification of vaccine targets needs to quick and efficient. Machine Learning (ML) and Artificial Intelligence (AI) have quickly become essential resources to identify candidate antigens through methods such as reverse vaccinology. This presentation will provide a broad overview of the use of AI in vaccine development against pathogens and describe several examples from ARS research at the Veterinary Pest Genetics Research Unit. The overall goals of this presentation will be: to describe a potential use of AI in genomic data, foster ideas for collaboration in enhancing or expanding datasets for these methods, and identify areas and other systems within ARS research that could benefit from these applications.

2. (1:20-1:40) - The Promises and Pitfalls of Deep Learning Methods for Plant Phenotype Prediction
Washburn, Jacob, ARS
Daniel Kick
Phenotype prediction is a grand challenge of 21st century biology! Predictive models and frameworks touch nearly every area of modern research and are particularly critical in agriculture for assessing crop loss risks, developing climate smart and sustainability agricultural solutions, and informing breeding decisions. Within plant agriculture, the substantial influence of gene-by-environment effects and diverse growing conditions compound the challenge of prediction. Deep Learning offers a promising approach to phenotypic prediction as it allows for incorporation of large amounts of data and diverse data types into a single model. We will present recent findings in the application of deep learning for agriculture supporting both trait measurement and prediction as well as the limitations and promises of these techniques. The effects of different data types and qualities, “ensemble” methods, training methods, and the potential for biological insights from these models will be discussed. We demonstrate that deep learning models can, but do not always, outperform more traditional genomic prediction methods. We also show that ensembles of heterogenous models, including both deep learning and traditional statistical models can reduce error by approximately 7% relative to the best single model in a large multi-environment data set containing thousands of public maize hybrids. While not a panacea, this work suggests that deep learning is a promising tool to be used in conjunction with other methods to further phenotypic prediction in research and applied agriculture.

3. (1:40-2:00) - Machine learning and analysis of genomic diversity of ‘Candidatus Liberibacter asiaticus’ strains
Chen, Jianchi, ARS
Adalberto A. Perez de Leon
We are currently performing active research projects to study genetic diversity of “Candidatus Liberibacter asiaticus” (CLas), an alpha-proteobacterium associated with citrus Huanglongbing (HLB). HLB is a highly destructive disease in citrus production around the world. Because CLas is not culturable in vitro, DNA sequencing and analyses are the primary tools to study the bacterial diversity, which is crucial for HLB management. Genome sequence analyses involve data sets of Mbp or Gbp level that generate multi-dimensional data outputs. Handling and interpreting such complex data sets are highly challenging through conventional approaches. Machine learning (ML) is a technology that uses computational statistics to resolve complex data problems. To illustrate this point, our recent publication of Huang et al. 2022, entitled Machine learning and analysis of genomic diversity of “Candidatus Liberibacter asiaticus” strains from 20 citrus production states in Mexico, Front. Plant Sci. 13:1052680. doi: 10.3389/fpls.2022.1052680, is discussed here. CLas samples were collected from 20 citrus-producing regions in Mexico and sequenced using HiSeq platform with multi-Gbp data. An unsupervised ML was implemented through principal component analysis (PCA) on average nucleotide identities (ANIs) of CLas whole genome sequences; And a supervised ML was implemented through sparse partial least squares discriminant analysis (sPLS-DA) on single nucleotide polymorphisms (SNPs) of coding genes of CLas. Two CLas Geno-groups were established that extended the current classification system of CLas strains. We are currently looking for neural network-based algorithms to further evaluate the CLas population diversity.

4. (2:00-2:20) - AI-ML in Genome-Phenome associations for Crop Resiliency
Hudson, Matthew, University of Illinois
Zhihai Zhang, Ryan Disney, Lucas Borges, Joao Viana, Kim Walden, Samuel Mintah, Todd Mockler, Andrew Leakey, Lisa Ainsworth, Andrea Eveland, Alex Lipka, Kaiyu Guan, Heng Ji
The Crop Resiliency thrust in AIFARMS focuses on the use of AI and machine learning to optimize our ability to predict, optimize and improve crop yields in a changing environment. In particular, we focus on improving genomic breeding methodologies and environmental sensing using AI, and several examples of this in different disciplines will be provided. Our ultimate goals are to improve genetic resiliency of crops and larger agricultural systems to environmental and weather extremes, to allow early prediction of yields and supply problems, and to measure carbon sequestration on local and regional scales.

Genomics II

Session Date: Wednesday, November 20th
Session Time: 10:30am-11:50am
Session Location: Oak
Session Moderator:

1. (10:30-10:50) - Quantify alfalfa digestibility with YOLO8 and segment anything model (SAM)
[Xu, Zhanyou], ARS
Brandon J. Weihs, Zhou Tang , Somshubhra Roy , Zezhong Tian, Deborah Jo Heuschele, Zhiwu Zhang, Zhou Zhang, Garett Heineck
The low digestibility of fiber in alfalfa (Medicago sativa L.) limits dry matter intake and energy availability in ruminant animal production systems. Previously, alfalfa plants were identified for low or high rapid (16 h) and low or high potential (96 h) in vitro neutral detergent fiber digestibility (IVNDFD) of plant stems. Here, two cycles of bidirectional selection for 16 h and 96 h IVNDFD were carried out. Two hundred fifty genotypes from the resulting populations were evaluated for solid vs. hollow stem characterization at three maturity stages. Each genotype was photographed with an RGB camera to record the number of stems, the size of each stem, and the area of the internal polygon hole. The number of stems was counted using You Only Look Once version 8 (YOLOv8) with more than 91% accuracy. Otsu’s automatic image thresholding algorithm and the segment anything model (SAM) were used to segment each stem into two parts: the hollow stem’s central polygon and the stem’s outer layers. The medoid of the central polygon was identified by 5-cluster k-mean classification, and the total number of pixels for the polygon hole and each whole stem was counted. The percentage of hollow pixels was estimated and associated with digestibility and used as inputs for GWAS, genomic selection, and machine learning predictions. The application of AI changes breeders’ subjective phenotyping into digital phenotyping and precision agriculture.

2. (10:50-11:10) - AI in Action: Case Studies in Genomic Annotation, Structural Variation Identification, and Classification Tasks
Perkin, Lindsey, ARS
Adama Tukuli, Zachary Cohen, Tyler Raszick, Gregory Sword, Charles Suh, Robert Jones, Xanthe Shirley, Julien Levy, Kiley Stout, Jayda Arriaga
Texas leads cotton production in the United States, generating over $2 billion annually. The largest historical threat to cotton production has been the boll weevil, Anthonomus grandis grandis (Agg), but a multi-state effort, the Boll Weevil Eradication Program, has nearly eliminated this pest from the U.S. except for persistent populations in south Texas (and Mexico). These populations threaten cotton production in Texas and other cotton-growing states. Pheromone-baited traps are used to monitor for Agg throughout the Cotton Belt. Unfortunately, these traps also catch other weevils as by-catch that look similar to Agg. This is particularly a problem with the Agg subspecies, Anthonomus grandis therberiae (Agt), which is nearly identical to Agg but is not a threat to commercial cotton. To alleviate confusion around pest identification, we propose to use artificial intelligence (AI) models to improve genome annotation, identify structural variation, and predict protein structure to pinpoint genetic elements unique to Agg versus Agt. This data is crucial to design in-field molecular diagnostics, such as lateral flow devices (LFD), to differentiate Agg from Agt. We also work to develop an AI-driven image classification model to identify and eliminate non-suspect weevils quickly. This initiative directly supports eradication programs and cotton producers throughout the Cotton Belt. In this presentation we highlight improvements made by these AI tools and show how we integrated AI into our current research program. The models developed in this research can be applied to other systems and demonstrate the potential of AI to transform pest identification and management.

3. (11:10-11:30) - AI-infused Breeding To Enhance Fruit Quality and Nutrition Colantonio, Vincent, ARS
Anna Hermanns, Jillian Belluck, Andrew Horgan, James Giovannoni
Fleshy fruits serve as a crucial source of healthy foods in our diet. Improving access, affordability, and nutritional content of fleshy fruit products requires plant breeders to prioritize fruit quality traits in their breeding programs. However, the highly complex, quantitative nature of many quality components, such as color, flavor, texture, and nutritional composition, has historically led to the de-prioritization of these traits. Fortunately, machine learning and AI approaches are proving to be particularly useful for the measurement, prediction, and genetic dissection of fruit quality characteristics. Here, we will explore the use of machine learning and AI algorithms for improving quality in the model fruit crop, tomato. Examples from successful applications will be presented, including the development of AI-based phenomic prediction models for enhancing fruit flavor, computer vision tools for the measurement of fruit quality components, and the use of machine learning algorithms for genomic characterization of nutritional composition. Lastly, we will discuss considerations for setting up AI experiments useful for the improvement of fruit quality traits and how to successfully deploy these models in plant breeding programs.

4. (11:30-11:50) - Ideotype Breeding v2.0
Schnable, Patrick, Iowa State University
Nasla Saleem, Mozhgan Hadadi, Yan Zhou, Yawei Li, Adarsh Krishnamurthy, Baskar Ganapathysubramanian
Breeders have been tremendously successful at improving the performance of crops via selection and geneticists can now readily identify genes responsible for traits of interest and can use these genes to modify these traits. The challenge today is determining which traits and trait values should breeders be selecting for, particularly in a world facing climate change? This presentation explores the potential of ideotype breeding to provide data-driven answers to this question. An ideotype is an idealized plant model expected to exhibit improved performance relative to existing plants. When first proposed in the late 1960s it was not possible to define an optimal plant, and to a large extent, this remains challenging. In response we propose “Ideotype Breeding v.2.0”, in which we define the existing ranges of phenotypic variation for all characteristics of a trait such as canopy architecture. We then use HTP procedural modeling (an approach used to create 3D models from sets of rules) to create 3D models of plants with all possible architectures based on trait values from existing germplasm. To define breeding target(s), we model the efficiency of light capture by each type of virtual canopy. Subsequently, a genetic algorithm and further selection is used to optimize the canopy architecture.

Large Language Models

Session Date: Wednesday, November 20th
Session Time: 10:30am-11:50am
Session Location: Corps
Session Moderator:

1. (10:30 - 10:50) - All We Need to Know About LLMs: Towards Securely Harnessing the Power of Generative AI in USDA
Park, John Y., ARS
Large Language Models (LLMs) are revolutionizing the way we work, offering unprecedented potential for improved communication, productivity, and personalization. However, to fully harness this transformative technology while mitigating potential risks, a deeper understanding of LLMs is crucial. While commercial LLMs offer powerful capabilities, they raise concerns about data privacy and security. User interactions with these models inevitably involve the transmission and storage of potentially sensitive data, including Personally Identifiable Information (PII), creating opportunities for misuse or breaches. This risk underscores the importance of exploring alternative solutions, such as open-source LLMs. Open-source models empower users with transparency and control over their data. By deploying LLMs within organizational data centers and leveraging open-source code, organizations can minimize data leakage and ensure the privacy of sensitive information. This presentation offers a comprehensive exploration of Large Language Models (LLMs), demystifying their inner workings, data utilization, and associated policies. We’ll examine popular closed-form LLM services such as ChatGPT, Claude, and Gemini, alongside open-source alternatives like Llama and Mistral, delving into their development and deployment. Our analysis will cover core architecture, training datasets, model characteristics, and weighting mechanisms, all presented from a user-centric perspective. Furthermore, we’ll conduct a SWOT analysis to evaluate the strengths, weaknesses, opportunities, and threats associated with these technologies. Next, we’ll highlight key LLM functionalities and discuss how they can be adapted to meet the unique research needs of the USDA. Finally, we aim to provide a roadmap the steps for utilizing open-source LLMs within the organization, including secure storage and customization of user data. By understanding the nuances of LLMs, including their potential benefits and risks, we can make informed decisions about their deployment and usage, ensuring a future where this powerful technology could possibly be utilized responsibly, ethically, and effectively to meet the unique needs of the USDA.

2. (10:50 - 11:10) - Navigating Genomes of Cereal Crops with Generative AI
Lazo, Gerard R., ARS
Devadharshini Ayyappan, North Carolina State University, Raleigh, NC 101010 USA,
Parva K. Sharma, and Vijay K. Tiwari, University of Maryland, College Park, College Park, MD 20742 USA.
The age of generative artificial intelligence (GenAI) is now upon us, and we are learning new ways to incorporate it into our daily life. The GrainGenes database has long housed information for the small grains (since 1992) on topics such as genomes, genes, and traits, and has evolved incrementally with technological advances along the way. We are working on enhanced methods to access information and wish to determine if GenAI can serve as a useful tool to propel our knowledge further. Improvements in the high-throughput sequencing technologies have added greatly in recent times to the quality and depth of genome coverage and the breadth of species covered to gauge the diversity of cereal germplasm. Within many of the species represented there are highly studied germplasm and progenitors which will become crucial for better understanding the biology of these systems for developing crop improvement strategies. There are now over seventy pseudo-molecules available, or soon to be available, for wheat, barley, rye, and oat. Having these genomes available will allow us to survey relatedness between species in a pan-genome sense utilizing the annotated transcriptomes of high-quality reference genomes. Early within the GenAI world (2023) there were constraints on the user with regard to required equipment, use fees, limited token-length access which reduced the parsed data volume, and the quality of training associated with the large language models (LLMs) then available. The ability to step-up these efforts have incrementally blossomed over 2024 with the availability of open-sourced tools with enhanced capabilities to provide a plethora of adaptive approaches to incorporate and analyze locally-sourced data collections. We have approached these studies on multiple fronts; through collections of research articles, building graphs based on database queries, and utilizing sequence annotations as paths for querying genome structure based on relevant genes associated with identifiable traits. Inter-specific crosses have played a role for bringing in new traits via gene introgression. The ability to survey cereals on a pan-genome scale may allow for new discoveries to aid this process. We have developed a capability of integrating the GFF3 files associated with genome descriptions as a roadmap for querying associated genes and traits using GenAI. We have also used collections of research papers resourced for a context-oriented topic of “cereal rust disease” to determine the extent for which it could deliver on problem solving. Specially crafted prompts provided interesting guides about the topic and were able to point to resources and aided expanded discussions to enhance information discovery. Such prompts have been able to direct attention to molecular markers, chromosome locations, dominant and recessive alleles, and some descriptions based on the context provided. As this technology evolves, it is hopeful even further enhancements to GenAI will extend our capabilities. We plan to present the state of our findings and open discussions on how such tools might be useful for our future.

3. (11:10 - 11:30) - Use of AI in Agricultural Studies: Examples in leveraging LLMs to help enable scientists to conduct research more effectively
Lau, Jeekin, ARS

Large language models (LLMs) are the poster child for Artificial intelligence. For example, much of the general public is familiar with ChatGPT. As a relatively new emerging technology, use cases are being deployed in many fields including the field of agricultural research. Three use case scenarios will show how we can leverage LLMs for in our agricultural research. 1) Using LLMs for exploring different mathematical models for agricultural data, 2) Use of LLMs to help analyze large complex trials including different annotation for different software packages, and 3) comments and annotation of code. These three showcase examples may help other researchers see potential new uses of LLMs in their own research.

4. (11:30-11:50) - Large Language Models (LLMs) for research data curation
Campbell, Jacqueline, ARS
Dr. Steven Cannon, Research Geneticist (Plants), Corn Insect & Crop Genetics Research Unit

USDA databases like SoyBase, MaizeGDB, GrainGenes, and the i5k Workspace@NAL play a critical role in providing well-organized, easily accessible biological data to researchers across the world. However, their ability to extract knowledge within a published manuscript from unstructured data into a structured, human and machine-readable form, known as biocuration, is a slow and painstaking process. Large Language Model (LLM) has emerged as a promising tool to accelerate biocuration by automatically extracting knowledge from a published manuscript. Several research groups have evaluated LLMs for biocuration in human medicine and genetics, and a number of databases funded by the NSF have started using LLMs for biocuration. The databases within the USDA are an invaluable resource because each database has built and continues to build upon previous data in a well-organized and structured way. The large amount of curated data within each of these databases is optimal for AI-driven meta-analysis. I would like to present an introduction to two important topics at the USDA Forum on AI in Federal Agricultural Research about the responsible use of AI involving LLMs for biocuation and the future trends in AI research using large amounts of professionally-curated data is optimal for AI-driven meta-analysis.

Modeling I

Session Date: Tuesday, November 19th
Session Time: 3:00pm-4:20pm
Session Location: Hullabaloo
Session Moderator:

1. (3:00-3:20) - Prediction of aflatoxin contamination outbreaks in Texas corn by using mechanistic and machine learning models
Castano-Duque, Lina, ARS
Angela Avila, Brian Mack, H. Edwin Winzeler, Joshua Blackstock, Matthew D. Lebar, Geromy G. Moore, Phillip Ray Owens, Hillary L. Mehl, James Lindsay, Kanniah Rajasekaran, and Jianzhong Su
Aflatoxins are carcinogenic and mutagenic mycotoxins produced by fungi that contaminate the food supply under field or storage conditions. To predict mycotoxin outbreaks, we employed an ensemble of models to estimate the probability of high or low aflatoxin contamination in corn (maize) at the county level across Texas. Our models utilized high-throughput dynamic geospatial data from remote sensing satellites, soil property maps, and meteorological data at county levels. We developed three model ensemble analysis pipelines: two mechanistic models that used weekly aflatoxin risk indexes (ARI) as inputs, and one weather-dependent model. The ARI was determined by using two approaches: (a) the AFLA-MAIZE mechanistic model, and (b) the Ratkowsky mechanistic model. The third model relied solely on weather input features (temperature, precipitation, and relative humidity). For the ARI-dependent models, the ARIs were weighted based on a corn phenological model that estimated the planting times per growing season at a county level. The phenology model used satellite-acquired data of normalized difference vegetation index (NDVI) to estimate corn growth curves via a 3rd degree polynomial. In the second stage of the pipelines, we trained, tested and validated gradient boosting and neural network models by using as inputs ARI-only or weather-only and soil properties, and county geodynamic latitude and longitude references. Our finding indicated that the AFLA-MAIZE and Ratkowsky mechanistic models had similar accuracy, sensitivity and specificity to predict aflatoxin outbreaks, and they were at par with weather-only model. We recommend considering model sensitivity and specificity when evaluating mycotoxin outbreak models, as these metrics provide insights into false and positive rates. Our study concluded that Texas exhibits significant geographical variability in ARI and ARI-hotspots responses due to its diverse ecoregions across the state (hot-dry, hot-humid, mixed-dry and mixed-humid). This diversity leads to high temporal and latitudinal variability in weather and planting times, resulting on a wide variation of corn temporal developmental. For instance, peak corn flowering, which is crucial for predicting aflatoxin outbreaks (April, June and July), occurs 2-3 months earlier in southern Texas compared to northern Texas. Our weather-only-nnet model identified a correlation and hot-spot prevalence in the hot-humid areas of TX, where high relative humidity in March and October led to increased AFL events. Similarly, our Ratkwosky-ARI-GBM-standard model detected that high ARI and consequently high temperature in early mid mid-time corn growing season resulted in high AFL contamination. We found that depending on the ecoregion in Texas, there is a positive correlation between aflatoxin outbreaks and soil organic matter, pH, soil erodibility and soil sodium adsorption ratio. Conversely, there is a negative relationship between aflatoxin outbreak and available water holding capacity (AWC) and cation exchange capacity. Our results demonstrate intricate relationships between AWC, fungal communities and plant health. It is possible that soil fungal communities are more diverse, and the plants are healthier in high AWC, leading to lower AFL outbreaks. These findings suggest that any implementation of prediction and prevention strategies of mycotoxin outbreaks should consider this complex interaction throughout geographical ecoregions in Texas.

2. (3:20-3:40) - Field to Fiber: AI-Driven Cotton Yield Predictions
Mitra, Alakananda, ARS
Sahila Beegum, David Fleisher, Vangimalla R. Reddy, Wenguang Sun, Chittaranjan Ray, Dennis Timlin, Arindam Malakar
The United States cotton industry is committed to implement sustainable production strategies reducing water, land, and energy consumption and enhancing soil health and cotton yield. More climate-smart agriculture solutions are being developed across the globe to increase crop productivity and lower operational costs. However, accurate prediction of crop yield is complex due to the intricate and nonlinear interplay of factors such as cultivar, soil type, management, pest and disease, climate, and weather patterns on crops. In this study (Mitra et al.), we used a machine learning (ML) method to predict yield, considering climatic change, soil diversity, cultivars, and fertilizer applications, to tackle this challenge. This study used two types of data: field data and synthetic data. Field data from the US southern cotton belt was collected during the 1980s and 1990s. Synthetic data using process-based crop model, GOSSYM also has been generated to reflect the recent impacts of climate change from 2017 to 2022. The study areas were located in three southern states: Texas, Mississippi, and Georgia. A total of nine locations have been selected based on the cotton productivity. In order to decrease the number of computations, the accumulated heat units (AHU) for each set of experimental data were utilized as a substitute for time-series weather data. Random Forest (RF) regressor, Support Vector Regression (SVR), Light Gradient Boosting Machine (LightGBM) regressor, Multiple Linear Regression (MLR), and neural networks were tested to find the best ML algorithm for the work. Cross-validation was performed to avoid overfitting. RF Regressor performed best and was able to achieve an accuracy of 97.75%, with an R2 of roughly 0.98 and a root mean square error of 55.05 kg/ha. The results demonstrate that a simple and robust model may be developed and used to aid the cotton climate-smart effort.

3. (3:40-4:00) - A Case Study Of Predicting Midrange Precipitation Using The K-Nearest Neighbors Method In Southern Plain
Guidry-Stanteen, Sean, University of Texas at Arlington
Jianzhong Su, John Zhang, Paul Flanagan
Weather, particularly precipitation, is a force on which agriculture is wholly dependent. Our hypothesis is that weather variables such as precipitation and temperature have their intrinsic periodicity and repetition over time. k-Nearest Neighbor (kNN) attempts to do this by looking into the past and seeing if what happened recently ever happened before, and if it did, what would happen next. This is done by compiling “features,” relevant data expressed numerically, into “feature vectors,” and matching with historical data sets. While looking at daily data tends to be the most accurate, it can be skewed by extreme bouts of precipitation. A novel method of grouping data by several days was developed and tested. Overall, the method can generate desirable results given the correct number of days grouped in just the right way. The method has been tested on two weather stations in Southern Plain region (Oklahoma) and predictions are found to be reliable.

4. (4:00-4:20) - AI-Driven Continental-Scale Modeling of Soil Moisture Dynamics: A Vision Transformer-LSTM Approach for CONUS
Rahman, Mashrekur, ARS
Menberu B Meles, Scott Bradford, Grey S Nearing
Accurate modeling and prediction of soil moisture dynamics at continental scales remain significant challenges in hydrology and earth system sciences. These challenges stem from the complex interactions between soil properties, climate, vegetation, and human activities, as well as the spatial and temporal heterogeneity of these factors across large areas. Traditional approaches often struggle to capture these intricate relationships effectively, leading to limitations in our ability to provide reliable soil moisture forecasts for agricultural management, water resource planning, and climate change adaptation. Furthermore, the integration and interpretation of diverse data sources, including in-situ measurements, remote sensing products, and reanalysis datasets, present additional hurdles in developing comprehensive soil moisture models. This study addresses these challenges by developing a novel, AI-driven approach to continental-scale soil moisture dynamics modeling across the Continental United States (CONUS). Our ML model integrates multiple data sources, including in-situ measurements, remote sensing products, and reanalysis datasets, combining static attributes (e.g., soil properties, topography) with dynamic inputs (e.g., meteorological data, vegetation indices). Our approach utilizes an enhanced spatiotemporal architecture that fuses Vision Transformers (ViT) for capturing spatial dependencies with Long Short-Term Memory (LSTM) networks for temporal dynamics, incorporating a novel spatial attention mechanism to account for the influence of neighboring locations. The model employs multi-task learning to simultaneously predict soil moisture at multiple depths, leveraging inter-layer correlations to enhance overall performance. We apply advanced interpretability techniques to provide insights into the relative importance of different factors influencing soil moisture dynamics. Results demonstrate the model’s efficacy in capturing complex spatiotemporal patterns of soil moisture across diverse landscapes and climatic regions. The implications of this research extend to multiple domains, including agricultural decision support, drought monitoring and prediction, flood risk assessment, climate change impact studies, water resource management, and ecosystem services assessment. By providing high-resolution, interpretable soil moisture forecasts, this model offers a powerful tool for informed decision-making across various sectors, contributing to enhanced agricultural productivity, improved water resource management, and increased resilience to climate-related challenges across the CONUS region.

Modeling II

Session Date: Wednesday, November 20th
Session Time: 10:30am-10:50am
Session Location: Traditions
Session Moderator:

1. (10:30-10:50) - Transparent artificial intelligence-based enviromic prediction: Predicting crop performance through understandable deep neural networks
Benke, Ryan, ARS
Linqian Han, Kimberly A. Garland-Campbell, Xianran Li
We tested the capacity of using artificial intelligence to predict crop performance based on whole-season external weather profiles. Deep neural networks were trained using weather parameters and corresponding spring wheat phenotype data from a variety testing program spanning over two decades across twenty locations. The trained models could accurately predict average spring wheat grain yields, plant heights, heading dates, and protein contents in all environments included in the program. To discern how the models function, we identified the weather conditions most important for prediction accuracy by monitoring the decay in model performance after permuting associations between select weather parameters and their corresponding trait values. This approach provided insights into the inner workings of the deep neural networks and identified key environmental conditions associated with phenotypic variation. Additionally, we demonstrated how these trait predictions could be applied with Finlay-Wilkinson regressions to forecast variety-level performance in novel environments. This study highlights the potential of leveraging historical data and deep learning to develop interpretable models that can accurately predict crop performance, offering guidance towards optimizing agricultural practices under changing climate conditions.

2. (10:50-11:10) - Improvements to Deep Isolation Forests for Identifying Anomalous Records
Sartore, Luca, NASS
Valbona Bejleri
The presence of outliers in a dataset can bias the results of statistical analyses. To correct for outliers in agriculture data collected through repeated surveys, micro edits are manually performed. A set of constraints and decision rules is used to simplify the editing process. However, agricultural data are characterized by complex relationships that make revision and vetting challenging. Also, outlier detection methods used in survey data to identify the records that need editing do not address the mixed (i.e., continuous or categorical) nature of variables. Isolation Forests (IF) have gradually increased in popularity as a distribution-free algorithm for screening high volumes of data with mixed-type variables. Although several variations have been proposed in the past decade, these improvements have been seldom tested at once. In this paper, deep random architectures are used within generalized isolation forests to perform nonlinear dimensionality reduction and outlier detection at the record level. Nested complex-value nonlinear transformations using activation functions are performed on random projections. The outputs of these nested processes are successively classified by improved generalized isolation trees and then combined using a scoring technique based on fuzzy logic. The performance of the proposed algorithm is tested on “raw” survey data for automatic early identification of anomalous records. Also, to assess the algorithm’s potential performance on a production environment, its outputs are compared to finalized “human-edited” data.

3. (11:10-11:30) - Inverse modeling techniques for predicting airborne crop disease risk
Ulmer, Lucas, ARS
Walter F. Mahaffee
Outbreaks of airborne crop diseases involve significant spatiotemporal uncertainty. Many crop diseases are difficult to detect until it is too late, necessitating conservative fungicide spray programs. The ability to identify where and when disease is present or likely to appear can help minimize downsides associated with disease control programs, including fungicide resistance, financial cost, and human and environmental health risks. Combining field data with predictive models may help narrow this search. Aerial spore sensors (“spore traps”) passively sample large masses of air over crops. Atmospheric dispersion models can be used to invert trap data and gain information about the likely origins of intercepted spores, and thus the spatial extent of a current infection. This work investigates the feasibility of one class of inversion techniques known as source term estimation (STE) for identifying the origins of particles sampled during a plume release experiment in an Oregon vineyard. A probabilistic “risk map” of the artificial infection is constructed from the trap data using Bayesian inference. We explore the impact of sensor network density and spatial layout, as well as dispersion model quality, on the accuracy of this risk map. Future prospects for regional-scale disease monitoring networks will also be discussed. By integrating multiple data streams (e.g., trap data, manual scouting results, weather and rainfall data, and host phenology from remote sensing and grower-provided photos) with physics-based models for dispersion and host-pathogen interaction in a Kalman filter-like procedure, we may be able to provide growers with better estimates of the spatiotemporal distribution of current and nascent outbreaks. Additionally, prospects for accelerating the dispersion simulations with regression-based ML models will be discussed.

4. (11:30-11:50) - Modeling Climate Change Impacts on Agriculture: Integrating CEAP, APEX, and Machine Learning for Adaptive Strategies
[Osorio-Leyton, Javier M.] Texas A&M University
Karen Maguire, Siwa Msagni
This research proposal integrates the Conservation Effects Assessment Program (CEAP) and the Agricultural Policy Environment eXtender (APEX) model to project climate scenarios and assess their impacts on agriculture. By using the APEX model in conjunction with the Regional Environment and Agriculture Programming (REAP) model at the Economic Research Service (ERS), the study aims to improve our understanding of how agricultural markets and systems respond to climate change, providing insights into how farmers can adapt practices and land use strategies. This work will strengthen the REAP model’s ability to address key research questions related to agricultural adaptation. Agricultural systems are complex, and the APEX model helps simulate these dynamics. However, its effectiveness is limited by the availability of spatially specific management data. To address this, the research introduces a machine learning-based methodology using the k-nearest neighbors (k-NN) algorithm. This algorithm imputes missing management practices by matching environmental variables such as topography, soil properties, and climate between the CEAP data and target sites across the continental U.S. The study uses CEAP data from two survey periods, combining data from 2003-2006 (CEAP-I) and 2012-2016 (CEAP-II). By calculating Euclidean distances between target and donor sites, the k-NN algorithm identifies the most suitable management practices, enabling more realistic simulations in the APEX model. This approach helps expand the geographic applicability of CEAP data and provides crucial inputs for climate adaptation research. Overall, the project enhances the REAP model’s capacity to explore questions about agricultural adaptation to climate change and environmental outcomes, offering valuable insights for the USDA and other stakeholders to support sustainable agricultural practices in a changing climate.

Multimodal Learning

Session Date: Wednesday, November 20th
Session Time: 2:30pm-3:50pm
Session Location: Corps
Session Moderator:

1. (2:30-2:50) - On the Shoulders of Pre-trained Giants: Versatile Visual and Multimodal Learning with Less Supervision Wang, Yuxiong, University of Illinois Data fuels modern computer vision models. But, the challenge of limited supervision from human-created annotations never ends. The traditional pre-training-and-fine-tuning paradigm becomes inadequate when developing models for fine-grained visual recognition and localization, generalist visual comprehension, and more. In this talk, I discuss our recent efforts towards bridging the gap between limited supervision and increasingly complex visual and multimodal tasks, based on a variety of foundation models including visual foundation models, large language models, large multimodal models, and generative models. We develop versatile strategies to adapt or transfer knowledge from these foundation models, minimizing dependency on expensive human supervision. We address several key questions that have recently arisen in computer vision: 1) Can we develop foundation models capable of tackling more complex tasks with reduced supervision? 2) Is there inherent synergy between models trained on different modalities, and can we further leverage such synergy to create a more powerful supermodel by composing heterogenous foundation models? 3) How can we advance existing foundation models into the 3D world? Throughout the talk, I demonstrate the potential of scaling up in-the-wild visual and multimodal learning but with minimal human supervision.

2. (2:50-3:10) - AIIRA: AI Institute for Resilient Agriculture Ganapathysubramanian, Baskar, Iowa State University AI Institute for Resilient Agriculture: Case study of building models using large scale multimodal data.

3. (3:10-3:30) - Advances in Multimodal AI and its Applications to Livestock and Beyond Ahuja, Narendra, University of Illinois AIFARMS Team We will review some major contributions made at AIFARMS to the state of the art in AI methods, how these are motivated and validated by problems in Agriculture, and how such co-developed methods simultaneously enhance the quality of the existing solutions available to agriculture scientists and farms.

### Protein Structure Prediction Applications {-} Session Date: Tuesday, November 19th Session Time: 3:00pm-4:20pm Session Location: Oak Session Moderator:

1. (3:00-3:20) - Exploiting AI/ML-based protein structure prediction tools for data-driven design of gluten digestive enzymes
Weigle, Austin, ARS
Chris P. Mattison, Gerard R. Lazo, Brenda Oppert
The status of AI-based molecular modeling tools has ushered a new era of computational molecular design. Synthetic data can now be generated, and annotated, at high-throughput to guide the experimental validation of valuable protein or molecular products. As a case study, we exploited multiple sequence alignment (MSA) subsampling in AlphaFold2 and RoseTTAFold2 to accurately predict diverse protein structure conformations. We applied this methodology for structure-based design of an enzymatic digestive aid to remediate gluten sensitivity. Currently, gluten sensitivity is primarily treated by a strict gluten-free diet. To this end, we selected the main digestive cathepsin L enzyme from Tribolium castaneum (Tc) as our molecular design target. We employed a structure-based engineering strategy to shortlist mutations that can (1) improve Tc cathepsin pH-optimum activity in the acidity of the human stomach for probiotic use; and (2) improve Tc cathepsin thermostability for commercial applications. From our resulting conformational ensemble, we observed transitionary conformations relevant to cysteine protease structural biology. Given the residues associated with functional conformational change, we prepared Tc cathepsin variants for in silico annotation with desired pH optimum and thermostability. Our multi-objective approach will motivate experimental validation of Tc cathepsin variants with fine-tuned function for the digestion of consumed and commercially-prepared gluten products.

2. (3:20-3:40) - Enhancing Gene Discovery With AlplaFold Multimer
Ingram, Thomas, ARS
Matthew N. Rouse, Matthew J. Moscou
Innate host resistance against wheat stem rust relies on recognition of the pathogens secreted proteins (avirulence genes or Avr’s) by host resistance genes (Sr genes). Discovery of Sr and Avr genes is currently reliant on DNA/RNA based bioinformatics techniques to narrow down candidates for experimental validation. In organisms with large genomes gene discovery is hampered by linkage disequilibrium, and often generates too many candidate genes to practically screen. AlphaFold Multimer provides in silico prediction of protein-protein binding. In wheat, secreted pathogen Avr’s bind to the leucine-rich repeat (LRR) of nucleotide-binding site–leucine-rich repeat (NLR) proteins. Pathogen protein recognition is followed by a defense cascade, often leading to cell death. AlphaFold Multimer 2.3.1 was used to predict the interaction between known LRR host proteins against their known Avr counterpart, with their signal peptide removed, and also against control proteins. In four out of five gene for gene combinations, AlphaFold Multimer correctly predicted the host-pathogen protein interaction. In some scenarios the LRR region alone provided the best iptm+ptm (interface predicted template modeling) scores, while in others the LRR and C-terminus provided the best prediction. Most known Avr-Sr combinations consistently scored in the top 10% when screened against an arbitrary set of NBS-LRRs. Candidate Avr and Sr genes were screened using AlphaFold Multimer and contrasted with protoplast transfection assays.
3. (3:40-4:00) - Empowering plant breeding: generative AI in the fight against plant diseases
Edwards, Jeremy, ARS
Li Wang, Yulin Jia
Plant diseases significantly threaten global food security, and there is a pressing need for effective resistance strategies in crops. Central to plant immunity is the interaction between pathogen avirulence (AVR) proteins and plant resistance (R) proteins, often following a gene-for-gene model. Identifying these specific protein interactions is challenging due to the limitations of traditional experimental methods. This presentation will explore the application of artificial intelligence, particularly AlphaFold 2, to predict protein structures and interactions between plant R proteins and pathogen AVR proteins. We demonstrate how AI accelerates basic discovery and mapping of gene networks and facilitates the discovery of new disease resistance alleles. Additionally, we discuss the potential of generative AI methods to achieve wide-spectrum durable resistance through design and optimization of novel R genes for deployment via gene editing. Our findings highlight the transformative potential of AI in advancing plant breeding and developing robust disease resistance.

Remote Sensing Applications

Session Date: Tuesday, November 19th
Session Time: 1:00pm-2:20pm
Session Location: Oak
Session Moderator:

1. (1:00-1:20) Reeves, Matthew USFS - Leveraging Cloud Computing to Forecast Rangeland Fuel Dynamics
Reeves, Matthew USFS
Robb Lankston
In this project we document our use of remote sensing and weather data to project fine fuel on a deployed AI platform for the Mojave Desert.

2. (1:20-1:40) - Using deep learning to map prairie dog colonies from remote sensing imagery
Kearney, Sean ARS
Lauren Porensky, David Augustine, David Pellatz, Erika Peirce, Mikael Hiestand, Justin Derner
The ability of deep learning models to detect individual objects from air- and space-borne remote sensing imagery provides unprecedented opportunities to produce new map products for use by land managers at fine spatial scales. Much work remains to understand what kinds of objects can be detected, how well models can transfer across locations and time periods, and what image specifications (e.g., resolution, spectral ranges, pre-processing, etc.) provide optimal results. We present methods, results, and lessons learned from studies conducted in two US Forest Service National Grasslands in which we trained deep convolutional neural networks (dCNNs) to detect individual black-tailed prairie dog (Cynomys ludovicianus) burrows using imagery from unoccupied aerial vehicles (UAVs; i.e., drones). The black-tailed prairie dog is both a keystone species of conservation concern and an agricultural pest. Thus, it is a wildlife species for which detailed monitoring is a high priority, especially in areas where public and private land ownership converges. Cost-effective ground-based mapping is difficult to achieve due to the remote and vast landscapes occupied by prairie dogs. We set up several studies to analyze (1) how burrow-detection accuracy changes depending on image resolution (from 2 – 30 cm ground sampling distance) and inputs (e.g., red-green-blue channels [RGB], Normalized Difference Vegetation Index [NDVI] derived from multispectral channels, and a photogrammetrically-derived topographic position index [TPI]) and (2) how well models can transfer across seasons, years and geographical regions with different vegetation plant communities. Specifically, we conducted UAV flights over multiple sites, seasons, and years totaling an area of more than 30,000 acres, and trained a dCNN in Python using the DeepLabV3+ architecture with a Resnet34 encoder initialized with pretrained weights from the imagenet dataset. Burrow detection accuracy remained stable up to 5-10 cm image resolution, but declined substantially at coarser (<10 cm) resolutions. The RGB and TPI together provided the best results; addition of NDVI did not improve results. Models transferred better across time than across regions, although adding more data nearly always improved models, regardless of whether additional data were added from a new region or time period. In addition to presenting the results from our studies, we also discuss some takeaways from our model training procedures and considerations for developing secondary map products from our dCNN object-detection output.

3. (1:40-2:00) - Remote sensing of microbial quality of irrigation water for food safety: machine learning applications to the UAV-based imaging
[Pachepsky, Yakov], ARS
Seokmin Hong, Mathew Stocker, Billie Morgan, Jaclyn Smith, Moon Kim
The growth of the proportion of produce-associated sicknesses has been connected to the microbial quality of irrigation water. Escherichia coli is used as the microbial water quality (MWQ) indicator. The MWQ monitoring is complicated due to the high spatial and temporal variation of E. coli concentrations, requires substantial resources, and yet results remain uncertain. UAV-based imaging provides the dense spatiotemporal coverage of inland water. Essential metrics of E. coli aquatic habitats, such as chlorophyll concentration, turbidity, and dissolved organic matter content, can be retrieved using remote sensing imagery. That indicates the possibilities of characterizing spatial variation of E. coli habitats, discovering persistent spatial patterns, and designing more efficient monitoring. Our goal was to research the applicability of RGB (GoPro), multispectral (Micasense), and hyperspectral (Headwall) imaging to estimate spatial E. coli patterns. The study was performed on commercial irrigation ponds in MD and Georgia. Imaging of surface water sources can be complicated for various reasons, including spectral complexity, reflectance properties, calibration and validation needs, cloud cover, etc. The data are primarily imbalanced. In the work with irrigation ponds, we determined that the efficiency of E. coli estimation can be enhanced by applying the data pre- and postprocessing. Specifically, classifying large number of multispectral images by their quality with the successive projection algorithm and the random forest algorithm leads to the objective selection of imaging for further processing. In the project with multispectral imagery, reflectance performs much worse as the input for E. coli estimation with the random forest (RF) algorithm compared with situ water quality variables. Replacing the reflectance with remote sensing indices led to a drastic improvement in E. coli estimation accuracy. The quantile splitting used to define train and test datasets substantially improved E. coli retrieval results. Postprocessing of modeling results was required due to the dependence of residuals on the absolute value of E. coli concentration early proposed for nitrogen also improved E. coli modeling results with RGB data and several machine learning algorithms, including the Gradient Boosting Machine and Extreme Gradient Boosting (XGB). New challenges in the MWQ arena arise with the growing attention to antibiotic-resistant bacteria and cyanotoxins. Our pilot projects showed that the combination of UAV imaging and machine learning can effectively evaluate spatial distributions and find persistent spatial patterns of emerging MWQ pollutants.

4. (2:00-2:20) - Combining satellite and targeted UAS imagery for cost-effective precision farming at scale through super-resolution methods
Masrur, Arif, ESRI
Peder A. Olsen, Paul R. Adler, Matthew W. Myers, Nathan Sedghi, Ray R. Weil
Unmanned Aircraft Systems (UAS) and satellites provide important tools in precision management practices. However, while satellite imagery is too coarse for targeted applications, UAS also becomes impractical for large areas and/or frequent coverage. Furthermore, a broad spectral range for UAS is currently only available with expensive sensors. This study proposed and demonstrated novel methodological approaches with super-resolution convolutional neural network (SRCNN) and statistical projection methods to fuse targeted UAS with satellite imagery across spatial, temporal, and spectral domains, allowing the best characteristics of both to be extracted and merged into a cost-effective platform for precision management. The main objective was to compare the performances of the reconstructed high-resolution Sentinel-2 and spectrally extended UAS RGB (red, green, blue) imagery for an improved estimation of cover crop biomass and nitrogen (N), compared to the low-resolution Sentinel-2 and very high-resolution hyperspectral ground truth. Using an example case from winter cover cropping in Maryland, United States, our cross-validated random forest modeling results suggest that the UAS data does not have to be collected at all locations and time points that satellite data is available for, rather a farmer could fly a subset of their fields with an RGB camera and leverage this to extend 1) the spectral range and resolution of available satellite imageries and 2) coverage across their whole farm and 3) growing season at the frequency of satellite flights. This would allow them to extend spectral range of UAS-RGB over the critical red edge, near infrared, and short-wave infrared regions which improve biomass and N estimation (as suggested by an error reduction between ~14% and ~68%) and could also increase detection sensitivity to weeds, disease, and insect infestations. A spectral resolution of about 100nm is sufficient for characterizing both biomass and N, however, spectral range matters more for N than for biomass, whereas spatial resolution is very important for both. We have also shown that, without flying, but by using the proposed Spatial-SRCNN, the biomass and N predictions are better than predicting from actual UAS-RGB data. Thus, the farmer could stop flying UAS once a specialized spatial extension model has been trained from targeted UAS-RGB data. The robustness of our proposed strategies should be further investigated with other potential applications (e.g., detecting weeds, disease, and insect infestations) in precision farming. Future studies should also focus on further improving spectral, spatial, and temporal extension models using datasets with larger spatiotemporal coverage and various combinations of spectral bands and sensors.

Responsible Use of AI In USDA Research

Session Date: Tuesday, November 19th
Session Time: 3:00pm-4:20pm
Session Location: Corps
Session Moderator:

1. (3:00-3:20) - Responsible Use of AI for Researchers
Parr, Cynthia, NAL
Ricardo Millan
USDA has a robust artificial intelligence strategy that drives innovation through workforce empowerment and capacity-building, and that provides risk-based frameworks to ensure ethical and responsible use of AI. But USDA is not the only stakeholder in the arena, and a plethora of federal and industry policies and resources challenges researchers to understand what all of this means for them. When should they be contributing to an AI inventory? When and from whom do they need approval to proceed? The answers often depend on the nature of the work (scholarly research versus research towards operations; generative AI versus other kinds of AI) and which component agency funds the work. Ultimately, researchers want to know how they can spend more time doing exciting research, with appropriate safeguards, and less time on bureaucracy. In this talk, we review current USDA policies, analyze definitions, and highlight their impact on the research enterprise. We compare policies and practices from other federal research funders, and identify additional needs for guidance, shared tools, and processes for USDA-funded researchers. What are considerations for proposal or peer-review preparation? How can we avoid the risks of bias, privacy loss, cybersecurity threats, and unsustainability? We provide nuggets answering these questions, complementing deeper discussion in training workshops. Finally, we describe opportunities to engage on any of these topics with USDA staff by participating in the USDA Center of Excellence, and with the international community by participating in the Research Data Alliance.

2. (3:20-3:40) - Data Leakage and Dataset Shift: the twin gremlins of AI
Rivers, Adam, ARS

Papers that apply predictive machine learning to biological questions are increasingly filling journals. How do we evaluate the quality of this research and catch potential errors in the AI methods we apply? While there are many ways to flub an AI analysis, most published mistakes arise from data leakage and dataset shift. These errors falsely inflate performance metrics, so they are “naturally selected” in our current publishing ecosystem. Data leakage arises when information used to evaluate the model is shared with it during training. This issue can occur in subtle and unintuitive ways, even when data is split into testing and training data sets. The second issue arises when the data a model was trained on differs from the data it was applied to. This talk will explain the different types of errors associated with data leakage and dataset shift and provide examples and practical advice for identifying these issues in scientific literature and in your experimental designs.

3. (3:40-4:00) - Assessing Use of GenAI in Converting Legacy Software to Python and R
[Tarter, Alex], NASS
Linda J. Young
USDA’s National Agricultural Statistics Service (NASS) uses hundreds of programs in various computer languages with thousands of lines of code to draw samples, analyze the data collected through surveys, and produce official statistics for its reports. Various computer programming languages have been used over time, some of which are expensive, no longer supported, or otherwise not recommended for contemporary use, including SAS AF, Visual FoxPro, and Perl. The modernization process of converting NASS’s legacy production code to freeware, such as Python and R, is challenging and time consuming. Our research focuses on assessing whether Generative Artificial Intelligence (GenAI) can be used effectively to assist in the timely and accurate conversion of scripts into Python and R. As an early proof of concept, we are exploring the proportional time savings in the conversion process of the Genesis software from SAS AF into Python or R in which study participants from across divisions of NASS are assisted by GitHub Copilot. Perceived time savings and challenges associated with the learning curve of the built-in language assistant are also explored to help establish best practices for future NASS-wide GenAI usage. Analysis of the completion time metrics intend to assess the viability of using Copilot to convert NASS’s legacy systems.

4. (4:00-4:20) - AI-assisted rapid reviews and summaries of ARS’s scientific research
Stucky, Brian, ARS
Lorna Drennen, Troy Hamilton, Haitao Huang, Stan Kosecki, Simon Liu, Joon Park, Cyndy Parr, Heather Savoy, Pam Starke-Reed
As USDA’s principal scientific research agency, the Agricultural Research Service (ARS) generates an enormous volume of research outputs every year. At any given time, ARS scientists are working on hundreds of active research projects which result in peer-reviewed publications, news releases, annual research accomplishments reports, and collaboration agreements with external research partners. ARS’s scientific leadership is frequently called upon to provide review-style syntheses or briefings of ARS’s scientific research on particular topics or questions. Writing scientific reviews is typically an extremely laborious process. We are investigating the efficacy of combining semantically indexed, text-based research products with generative large language models to reduce the labor required to write research reviews and summaries. In this talk, we will present our implementation approach, source datasets, and progress to date. We will also discuss methods for assessing and ensuring the quality of the results generated by our system.

Robotics and Sensors

Session Date: Wednesday, November 20th
Session Time: 1:00pm-2:20pm
Session Location: Oak
Session Moderator:

1. (1:00 - 1:20) - Enhancing Precision Agriculture: The Symbiosis of Sensors and AI for Smarter, Sustainable Farming
[Tabassum, Shawna], University of Texas at Tyler
Kai-Shu Ling, USDA-ARS, U.S. Vegetable Laboratory, Charleston, SC 29414
The precision agriculture technology including image sensors, unmanned aerial vehicles, and crop and soil sensors, rely on surface-level information and lack real-time chemical profiling, which makes them inadequate for detecting the early onset of crop or soil health issues. Advanced sensors can significantly improve artificial intelligence (AI) models in agriculture by enhancing quality and accuracy of the data used for analysis. The Center for Smart Agriculture Technology (CeSAT) of University of Texas at Tyler is developing low-cost and field deployable sensors and integrated systems for monitoring crop biotic and abiotic stresses, water quality, and soil nutrient dynamics, which will enable AI on accurate data collection, multi-modal data integration, and real-time monitoring and feedback. In collaboration with ARS scientists at the USDA-ARS, U.S. Vegetable Laboratory at Charleston, SC, CeSAT is currently conducting research on mountable sensors to help develop an autonomous robotic crop production system for fruiting crops (tomato and strawberry) in controlled environment agriculture. Data from these sensors will be integrated into a machine learning model to develop an extensive database. The clean, accurate sensor data enhances AI model training by allowing the model to identify patterns more reliably. Well-trained models are less prone to overfitting or underfitting, thus improving generalization.

2. (1:20-1:40) - From Labor to Automation: AI Innovations in Horseradish Weed Management
Shajahan, Sunoj, University of Illinois
Abhinav Pagadala, Elizabeth Wahle, Dennis Bowman, John F Reid
Illinois has a rich history as a leading producer of horseradish, dating back 150 years. However, a persistent challenge in horseradish production is the lack of effective weed management solutions throughout the growing season. While AI-based commercial solutions for weed management, such as John Deere’s See & Spray, are emerging in commodity crop production, specialty crops like horseradish remain underserved. Horseradish requires season-long weed control, as it is planted from April to May and harvested from October through April of the following year. The application of post-emergent herbicides is restricted. Currently, weed control relies heavily on manual labor and tool implements that can only pull weeds taller than the crop. Compounding these challenges, increasing herbicide resistance among weeds, such as Waterhemp and Palmer Amaranth, further complicates management efforts. This project aims to develop AI vision-based weed control solutions explicitly tailored for horseradish producers, addressing the weed challenges they face. We are developing a mechanical weeding robot based on Farm-ng’s Amiga platform, integrating advanced technologies for improved weed management. Our approach comprises three stages: lane detection and autonomous navigation, weed detection through AI-based object detection, and an actuator mechanism for removing weeds between rows. Currently, the platform is in the development phase. We aim to have a fully functional weeding robot ready by next summer. We plan to perform frequent data collection in horseradish farms next year. These datasets will be valuable in developing a reliable AI model for object detection with improved inference times and the mechanical weeder’s response capabilities. By integrating advanced AI solutions, we hope to offer horseradish growers an effective alternative for weed management, ultimately improving productivity and sustainability in the industry.

3. (1:40-2:00) - Leveraging Anomaly Detection and Timeseries Data in Expert-In-The-Loop Self-Supervised Learning for Powdery Mildew Classification in Microscopy Imaging
Bidese, Rafael, ARS Anna Underhill, Lance Cadle-Davidson, Yu Jiang
Blackbird is a microscopic imaging robot that significantly enhances the throughput of acquiring and analyzing microscopic images, originally developed for selecting disease-resistant grape cultivars. While this system has garnered interest across various crops and diseases, its reliance on supervised learning requires numerous labeled data instances for model training. Self-supervised learning (SSL) can leverage the available data but is dependent on the quality of the training dataset to learn the relevant features for a given task. Previous methods leverage techniques such as similarity to generate better datasets for the task. In this work, we leverage the spatiotemporal changes of disease infection to generate an optimized dataset for learning binary classifiers to enable genetic insights into disease resistance mechanisms. We aim to develop an expert-in-the-loop self-supervised learning-based pipeline for binary classification of powdery mildew in microscopy imaging. Specifically, we evaluate anomaly detection (AD) to improve dataset selection for SSL and evaluate different input modalities to the SSL pipeline. Leaf disk images were collected using a leaf disk assay with Blackbird, capturing time-series images from 0 to 9 days after inoculation (DAIs) with a time interval of 1 day. Anomaly detection is trained in the first day of imaging when the leaves are healthy, and the reconstruction error for subsequent days is used to prepare the dataset for SSL. After SSL, an expert selects the best clusters in the projected dataset for the extraction of pseudo-labels. The best SSL model can achieve similar performance to zero-shot inference from a fully supervised model. Pseudo-labels are then used for supervised training and further evaluation. We show that AD provides better embedding for the extraction of pseudo-labels that lead to a higher F1-score. Further, we demonstrate that when the SSL pipeline leverages the time series data for improved embedding generation, it generates a projection that provides superior pseudo-labels and encodes different disease progressions. Furthermore, the developed pipeline enables genetic analysis of PM infection dynamics, providing valuable insights into plant disease response mechanisms. This innovative approach holds promise for advancing plant pathology research and breeding programs.

Soil Science Applications

Session Date: Tuesday, November 19th
Session Time: 3:00pm-4:20pm
Session Location: Ross
Session Moderator:

1. (3:00-3:20) - Soil Health Classification Framework for Florida Soils using K- Means Clustering
Chatterjee, Amitava, ARS
Yaslin Gonjalez, and Gabriel Maltais-Landry
Soil health assessments aim to quantify soil health indicators linked to specific ecosystem services such as crop production, nutrient dynamics, and water storage. The main challenge to interpret soil health indicators is distinguishing between properties corresponding to management and inherent soil properties like topography. The goal of grouping similar soils is to increase likelihood of detecting changes in soil health indicators linked to management by accounting for variation arising from inherent soil properties. This study explores utility of cluster analysis to group similar soils and aid in soil health classification for the state of Florida. Soil Property data from The Soil Survey Geographic Database (SSURGO) was used for the k-means cluster analysis. Components (map units) that covered 15% of the map unit were selected and the depth weighted and component weighted averages from 0-10 cm for selected soil properties were calculated. The Calinski-Harbasz index (variance ratio criterion) was used for comparing clustering solutions by identifying the values of K at the index is at absolute maximum. Clustering into six conceptual groups based pm top five soil properties was determined to be the optimal algorithm output. The clusters could differentiate between zones of topsoil variation at the field scale. This approach could be easily adopted at other locations and scales to produce conceptual soil groups and associated maps to support soil health sampling and interpretation at the field scale.

2. (3:20-3:40) - Using a random forest regression model for determining plant available nutrient concentrations from portable x-ray fluorescence and Mehelich-III measurements
Blackstock, Joshua M., ARS
T. G. Brewer; M. Mancini, D. M. Miller; M. Berry; H. E. Winzeler; Z. Libohova; P. R. Owens
Plant available soil nutrient concentrations (PASNC) determined by the Mehlich-III (MIII) method is time and resource intensive. Application of machine learning for developing pedotransfer functions (PDF) that use paired portable x-ray fluorescence (pXRF) and MIII measurements building predictive PASNC models has been successfully demonstrated in tropical and temperate soil studies, but the accuracy of estimations varies depending on pedogenic characteristics, requiring further testing. In this study, we developed a random forest model (RF) to predict PASNC of Ca, K, Mg, P, S, and Zn from the total elemental concentrations of soils measured by pXRF. Soil samples were collected in the central U.S. state of Missouri at the Cornett Research Farm. Coefficient of determination (R2) values using RF for Ca, K, Mg, P, S, and Zn were 0.76, 0.19, 0.71, 0.008, 0.79, and 0.29, respectively. Root mean square error percent error (RMSEPE) values for Ca, K, Mg, P, S, and Zn were 9.9, 36.5, 22.9, 41.4, 17.2, and 35.6%, respectively. Model performance for Ca, Mg, and S was better compared with K, P, and Zn predictions. The most common important pXRF parameters were transition metals, silica, and aluminosilicate and we inferred this relation to be driven by parent material mineralogy and differential weathering across the study area. Preliminary findings from this study are encouraging and suggest that pXRF-MIII PDF could be used in predicting PASNC in temperate soils. Increasing the number of samples and the spatial extent of temperate soils sampled would yield more accurate predictions. Increased accuracy will allow more rapid soil fertility and soil health assessments at finer spatial resolutions in temperate agricultural regions.

3. (3:40-4:00) - Machine Learning Algorithms for complex distributed soil hydrology models and digital soil mapping
Libohova, Zamir, ARS
Jiaxiao Wei, Jainy Bhatt, Joshua Blackstock, Edwin H. Winzeler, Marcelo Mancini, Jianzhong Su, Phillip R. Owens, Kabindra Adhikari, Amanda Ashworth
Machine-learning and geostatistics are widely used tools in digital soil mapping (DSM) to predict soils and their properties at higher resolution suitable for precision agriculture and other applications. Soil water movement, as one of the major drivers of soil development is represented by terrain features derived from high resolution digital elevation models (DEM) like LiDAR and remote sensing (RS). DSM reliance on DEM and RS data as well as quality and density of point field observations often leads to high resolution but static 2-dimensional (2D) maps of soils and properties. Distributed hydrological models are processed based and physically driven models that capture soil water movement over and through the soil at fine spatial and temporal resolutions. Using such models provides an opportunity to add two other dimensions(depth and time) to mapping soils and their properties leading to a four-dimensional (4D) digital soil mapping (4DSM) approach, but utilizing outputs from hydrological model simulations for DSM is not well understood owing to model output complexity. Use of a machine learning algorithm for time series classification of model cells, i.e. pixels, based on spatiotemporal variation among model cells could elucidate patterns not recognized through simple visual or single pixel time series analysis. Distributed Hydrology Soil Vegetation Model (DHSVM) was applied to a pasture watershed to simulate soil moisture (SM) distribution with depth and water table (WTD) at a daily time step. The simulation generated moisture time series for three depths and multiple pixels across the watershed. Temporal patterns simulated by DHSVM matched measurements given by moisture sensors and wells installed throughout the watershed. The multidimensional data set of SM and WTD were analyzed using Dynamic Time Warping (DTW) machine learning algorithm, which identified similar time series at varying timescales for each depth and clustered them annually and seasonally. The distinct clusters among seasons and with depth showed that spatiotemporal soil variability is lost when statically assessing soils. This approach provides a step forward towards the transformation of US Soil Survey Geographic Database (SSURGO) from 2-D to 4-D. However, the distributed hydrological models are data demanding and comprise of multiple feedback and links for representing the complexity of physical laws and processes governing the soil water movement. And their potentials can be fully realized only when the impact of hundreds of parameters (individually or in combination) on the prediction power and accuracy can be assessed. Such sensitivity analysis can be substantiated using machine learning to improve the predictive ability concerning hydrological dynamics and soil variability. By employing refined hyperparameters, a model-independent neural network can be established for digital soil mapping as demonstrated from some of the preliminary results presented here.

4. (4:00-4:20) - Leveraging Random Forest Models to Optimize Phosphorus Fertilization in Rainfed Corn: Insights from Texas Blackland Prairie Fields
Stevens, Bo Maxwell, ARS
Kabindra Adhikari, Peter J.A. Kleinman, Joshua M. McGrath, Kyle R. Mankin, Douglas R. Smith
Soil phosphorus (P) management is crucial for both crop yield optimization and environmental sustainability. In this study, we leveraged machine learning techniques, specifically random forest models, to analyze soil test data and optimize P fertilization strategies for rainfed corn (Zea mays L.) grown in Central Texas. The focus was on comparing predictions from two commonly used soil tests: Mehlich-3 and Haney (H3A) in five fields with varying soil P levels. Traditional recommendations often yield conflicting results, presenting challenges for precision agriculture. Random forest models were applied to four years of high-resolution yield data (2018-2022), including spatiotemporal soil characteristics, environmental variables, and soil test results. Our analysis demonstrated that random forest models, when incorporating multiple soil test metrics alongside environmental data, significantly outperformed traditional methods for predicting yield responses to P application. The Mehlich-3 test provided more consistent predictions of yield response, accurately identifying P needs 85% of the time, compared to 65% for H3A. A key insight from our analysis is the potential to reduce P fertilization rates by up to 50% without compromising yield in low-P fields. This approach not only optimizes yield but also minimizes environmental impacts associated with over-application of fertilizers.

Sustainability

Session Date: Wednesday, November 20th
Session Time: 10:30am-11:50am
Session Location: Hullabaloo
Session Moderator:

1. (10:30-10:50) - Integrating domain knowledge and artificial intelligence for sustainable agriculture

Peng, Bin, Assistant Professor, Crop Sciences, University of Illinois
Kaiyu Guan, Licheng Liu, Zhenong Jin
This presentation will review different applications of AI in food safety including food safety risk prediction and monitoring as well as food safety optimization throughout the supply chain. However, AI technologies in food safety lag behind in commercial development because of obstacles such as limited data sharing and limited collaborative research and development efforts. Future actions should be directed toward applying data privacy protection methods, improving data standardization, and developing a collaborative ecosystem to drive innovations in AI applications to food safety. While the causes of limited use of AI in food safety are multi-faceted, the lack of publicly available and well curated data sets that can be used by different groups to develop and validate AI data sets represents one likely barrier and contributor to this issue that has been addressed in other fields, including through the distribution of publicly available datasets that can be used to develop and train AI tools. To address this issue we have developed an initial dedicated Food Safety Machine learning Repository, which will also be discussed as part of this presentation.

2. (10:50-11:10) - Prediction of turfgrass quality using multispectral UAV imagery and Ordinal Forests: Validation using a fuzzy approach
Hernandez, Alexander, Research Computational Biologist, ARS
Shaun Bushman, Paul Johnson, Matthew Robbins and Kaden Patten
Protocols to evaluate turfgrass quality rely on visual ratings that depending on the rater’s expertise can be subjective, and susceptible to positive and negative drifts. We developed seasonal (spring, summer and fall) as well as inter-seasonal machine learning predictive models of turfgrass quality using multispectral and thermal imagery collected using unmanned aerial vehicles for two years as a proof-of-concept. We chose ordinal regression to develop the models in-stead of conventional classification to account for the ranked nature of the turfgrass quality assessments. We implemented a fuzzy correction of the resulting confusion matrices to ameliorate the probable drift of the field-based visual ratings. The best seasonal predictions were rendered by the fall (multiclass AUC: 0.774, original kappa 0.139, corrected kappa: 0.707) model. However, the best overall predictions were obtained when observation across seasons and years were used for model fitting (multi-class AUC: 0.872, original kappa 0.365, corrected kappa: 0.872), clearly highlighting the need to integrate inter-seasonal variability to enhance models’ accuracies. Vegetation indices such as the NDVI, GNDVI, RVI, CGI and the thermal band can render as much information as a full array of predictors. Our protocol for modeling turfgrass quality can be fol-lowed to develop a library of predictive models that can be used in different settings where turfgrass quality ratings are needed.

3. (11:10-11:30) - Development and implementation of AI tools for farm-to-table food safety
Wiedmann, Martin, Cornell University
Renata Ivanek, Jayadev Achary, Qing Zhao, Luke Qian
This presentation will review different applications of AI in food safety including food safety risk prediction and monitoring as well as food safety optimization throughout the supply chain. However, AI technologies in food safety lag behind in commercial development because of obstacles such as limited data sharing and limited collaborative research and development efforts. Future actions should be directed toward applying data privacy protection methods, improving data standardization, and developing a collaborative ecosystem to drive innovations in AI applications to food safety. While the causes of limited use of AI in food safety are multi-faceted, the lack of publicly available and well curated data sets that can be used by different groups to develop and validate AI data sets represents one likely barrier and contributor to this issue that has been addressed in other fields, including through the distribution of publicly available datasets that can be used to develop and train AI tools. To address this issue we have developed an initial dedicated Food Safety Machine learning Repository, which will also be discussed as part of this presentation.

4. (11:30-11:50) - prediction of hydrological and sediment transport processes
[Livsey, Daniel], ARS

Systems-Level Applications

Session Date: Tuesday, November 19th
Session Time: 1:00pm-2:20pm
Session Location: Corps
Session Moderator:

1. (1:00 - 1:20) - Examples for AI use and Emerging Trends in Systems Dynamics Modeling
Papanicolaou, Thanos, ARS
Keshav Basneta, Peter L. O’Brien , Ken M. Wacha , Robert W. Malone , David W. Archer
Agroecosystems comprise environmental, economic, and social components with complex interactions that affect systemwide performance. Attempts to describe or predict how agroecosystems respond to management must account for these interconnected components, so approaches that are limited to a single discipline cannot capture the complexities necessary for a holistic understanding of performance. The goal of this research is to develop a system dynamics (SD) modeling framework that can provide quantitative measures of consequences of management on each component of an agroecosystem. A SD framework is proposed with a description of model components, as well as an illustration of methodological steps to evaluate model performance through calibration, validation, and sensitivity testing. The structure of the model relies on a complex web of (i) stocks that describe the system status, (ii) flows that represent the directionality and rates of change, and (iii) auxiliary parameters that provide quantitative values to each component. The capacity of the model to adequately evaluate agroecosystem response is demonstrated using a case study investigating environmental, economic, and social indicators while manipulating multiple management practices, including cover crops, tillage, and integration of crop and livestock operations. Importantly, the SD model identified tradeoffs in the three indicators that accurately reflects producer experiences when making management decisions.

2. (1:20-1:40) - Multi-Task Learning for Low-Data Problems: Case Study in Cold-Hardiness Prediction
Fern, Alan, Oregon State University
Aseem Saxena, Kristen Goebel, Paola Pesantez-Cabrera, Rohan Ballapragada, Kin-Ho Lam, Markus Keller
There are many prediction problems in agriculture where large datasets are not available. One approach to addressing this issue is to combine multiple small datasets for related problems via multi-task and transfer learning. This talk will describe two applications of these approaches to the problem of cold-hardiness prediction for grapes and cherries, which is critical for frost-mitigation decision making. The results demonstrate the utility of combining multiple small datasets and have produced models that are currently available to crop managers on AgWeatherNet.

3. (1:40-2:00) - Machine Learning Algorithms for Digital Soil Mapping of Soil Properties
Winzeler, H. Edwin, University of Texas at Arlington
Marcelo Mancini, Joshua M. Blackstock, Jianzhong Su, Phillip R. Owens, Amanda Ashworth, Zamir Libohova
Accounting for spatial heterogeneity of soil properties relevant to soil fertility and crop management is a daunting task. Point samples must be extracted at appropriate density to account for changes over geographic space. The extent to which soils vary over space for a site is often not fully known, leading to problems of over-sampling and/or under-sampling to produce continuous prediction surfaces or maps. Over-sampling occurs when an excess of soil samples is extracted, leading to expensive and time-consuming analysis for many samples that are sometimes similar in value and geographic space to neighboring samples that they contribute little to understanding the variability. Under-sampling occurs when the number of samples is insufficient to account for spatial patterns of geographic heterogeneity, leading to holes or gaps in the understanding of a site’s variability. In this study we intentionally developed an over-sampling approach to characterizing the variability of soil properties at the Stoneville ARS Long-term agricultural research (LTAR) site in the Lower Mississippi River Basin (LMRB) network. We extracted and analyzed 2,145 soil samples for a 250-ha research site and analyzed soil properties including soil organic matter (SOM) content. We then applied a random forest machine learning on the full dataset (RF), and on subsequent subsets that were randomly generated by iteratively and progressively omitting data points and assessing model performance. The objective was to find the sampling density whereby over-sampling grades into under-sampling. The machine learning models were trained using the point samples and reflectance data from the ESA Sentinel-2 program. To assess the usefulness of the machine learning models we also applied traditional geostatistical techniques of ordinary spherical kriging (OSK), Empirical Bayesian Regression Kriging (EBK), and inverse distance weighting (IDW). We also assessed their performance with subsets of the original data to determine the inflection point or region at which over-sampling graded into to under-sampling. The degradation of the performance of the RF models as the sample size decreased (progressively smaller datasets) was compared to the same degradation of performance for models made without machine learning, IDW, EBK, and OSK. This use-case study shows that models of soil properties made with machine learning informed by spatial statistics of satellite reflectance are much more robust than models made without machine learning, and therefore require a smaller sampling density to achieve accurate results. Additionally, the predicted spatial patterns of SOM reflected signatures of well understood soil-landscape processes indicating that incorporation of prior knowledge to machine leaning can further contribute to optimization of sample size.

4. (2:00-2:20) - Modeling the dynamic fire management environment with FireCon, a daily fire control suitability surface.
O’Connor, Christopher, USFS
Rahul Wadhwani, Eduardo Rodriguez, Matthew Whitley, Zack Holden
Successful wildfire containment is influenced by a range of physiographic and risk factors that can be know in advance, as well as dynamic factors that evolve throughout the incident. In the Western United States, a widely adopted approach utilizes the Potential Control Location suitability (PCL) model, an annually produced wildfire control probability surface that incorporates terrain, vegetation, and fire behavior under 90th percentile fire weather conditions. Although this model demonstrates significant effectiveness, it does not account for changing weather, fuel, and fire behavior conditions. This presentation introduces a framework that integrates daily weather, soil moisture, vapor pressure deficit, and potential evapotranspiration into the existing PCL model. We use daily hold and breach locations from fires sampled from the years 2020 to 2022 to train a neural network to adjust PCL maps up or down depending on current inputs. ROC statistics for the independent test set demonstrated improved differentiation of control and breach locations by up to 10% (ROC 0.47 – 0.59 ) over static PCL scores (ROC 0.49 ). We present a case study applied to the 2023 Six Rivers Complex fire in, California, demonstrating its potential to enhance situational awareness and inform fire management strategies through real-time data integration.

Lightning Talks + Poster Sessions

Lightning Sesssion I

Session Date: Tuesday, November 19th
Session Time: 1:00pm-2:20pm
Session Location: Hullabaloo
Session Moderator:

1. (1:00-1:05) - Machine learning assisted functional protein identification focused on binding Per-and poly-fluoroalkyl substances(PFAS)
Maul, Jude E., ARS
Clifton K. Fagerquist
Using a combination of open source AI protein folding models, next-generation space filled molecular models and machine learning assisted kinetic simulations we will create libraries of predicted protein binding sites for all PFAS compound from real and/or theoretical proteins.

2. (1:05-1:10) - PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models
Poretsky, Elly, ARS
Carson M. Andorf, Taner Z. Sen
Protein phosphorylation is a dynamic and reversible post-translational modification that regulates a variety of essential biological processes. The regulatory role of phosphorylation in cellular signaling pathways, protein–protein interactions, and enzymatic activities has motivated extensive research efforts to understand its functional implications. Experimental protein phosphorylation data in plants remains limited to a few species, necessitating a scalable and accurate prediction method. Here, we present PhosBoost, a machine-learning approach that leverages protein language models and gradient-boosting trees to predict protein phosphorylation from experimentally derived data. Trained on data obtained from a comprehensive plant phosphorylation database, qPTMplants, we compared the performance of PhosBoost to existing protein phosphorylation prediction methods, PhosphoLingo and DeepPhos. For serine and threonine prediction, PhosBoost achieved higher recall than PhosphoLingo and DeepPhos (.78, .56, and .14, respectively) while maintaining a competitive area under the precision-recall curve (.54, .56, and .42, respectively). PhosphoLingo and DeepPhos failed to predict any tyrosine phosphorylation sites, while PhosBoost achieved a recall score of .6. Despite the precision-recall tradeoff, PhosBoost offers improved performance when recall is prioritized while consistently providing more confident probability scores. A sequence-based pairwise alignment step improved prediction results for all classifiers by effectively increasing the number of inferred positive phosphosites. We provide evidence to show that PhosBoost models are transferable across species and scalable for genome-wide protein phosphorylation predictions. PhosBoost is freely and publicly available on GitHub.

3. (1:10-1:15) - Generalizing applications of the Blackbird microscopy robot
Cadle-Davidson, Lance, ARS
Rafael Bidese, Yu Jiang
For many foliar pathogens, uncontrolled variables of field phenotyping prevent detection of genetic loci of minor and moderate effect sizes. In the SCRI-funded VitisGen grape breeding project, we developed controlled laboratory automated microscopy phenotyping pipelines to quantify accurately powdery mildew and downy mildew disease severity. Each microscopy robot (which we call Blackbird) images about 100 to 200 discs per hour depending on sample topology. While patch-based convolutional neural network analysis of hyphae or spores outperformed experts at microscopes for QTL detection, saliency mapping even further improved the precision and confidence in disease quantification. In some cases, repeated measures over a timecourse revealed resistance loci that were undetected at single timepoints. Blackbird is now commercially available, and our collaborators have drastically expanded the diversity of applications. To help biologists make progress on many new use cases, we are developing unsupervised machine learning strategies to guide model training.

4. (1:15-1:20) - Exploring putative enteric methanogenesis inhibitors using molecular simulations and a graph neural network
Chowdhury, Ratul, Iowa State University
Randy Aryee, Noor S. Mohammed, Supantha Dey, Arunraj B., Swathi Nadendla, Karuna Anna Sajeevan, Matthew R. Beck, A. Nathan Frazier, Jacek A. Koziel, Thomas J. Mansell
Atmospheric methane (CH4) acts as a key contributor to global warming. As CH4 is a short-lived climate forcer (12 years atmospheric lifespan), its mitigation represents the most promising means to address climate change in the short term. Enteric CH4 (the biosynthesized CH4 from the rumen of ruminants) represents 5.1% of total global greenhouse gas (GHG) emissions, 23% of emissions from agriculture, and 27.2% of global CH4 emissions. Therefore, it is imperative to investigate methanogenesis inhibitors and their underlying modes of action. We hereby elucidate the detailed biophysical and thermodynamic interplay between anti-methanogenic molecules and cofactor F430 of methyl coenzyme M reductase and interpret the stoichiometric ratios and binding affinities of sixteen inhibitor molecules. We leverage this as prior in a graph neural network to first functionally cluster these sixteen known inhibitors among ~54,000 bovine metabolites. We subsequently demonstrate a protocol to identify precursors to and putative inhibitors for methanogenesis, based on Tanimoto chemical similarity and membrane permeability predictions. This work lays the foundation for computational and de novo design of inhibitor molecules that retain/ reject one or more biochemical properties of known inhibitors discussed in this study.

5. (1:20-1:25) - Using AI, machine learning, and ‘omics’ methodologies to explore the ruminant microbiome, enteric methane emissions and mitigation interventions
[Frazier, Anthony Nathan], ARS
Ratul Chowdury, Matthew R. Beck, Aeriel D. Belk, Randy Aryee, Noor S. Mohammed, Supantha Dey, Arunraj B, Swathi Nadendla, Karuna A. Sajeevan, Thomas Mansell, Jacek A. Koziel
Agricultural food systems and livestock operations account for 30-40% of anthropogenic greenhouse gas (GHG) emissions. Enteric fermentation represents 46% of GHG emissions in the form of methane (CH4), primarily in ruminant livestock operations. Due to an increase in global population, there is a high probability of an increase in ruminant livestock to meet the necessary food demands. Given this, many research efforts have been made to combat enteric CH4 emissions. One area of recent interest is the ruminant microbiome. The rumen microbiota has a unique purpose in serving its host with the required nutrients and energy needed for performance and growth. However, these interactions also result in CH4 production and emissions. While many efforts have shown promise in reducing enteric CH4 , there remains a need for integrating new methodologies into CH4 mitigation. Leveraging the strides of artificial intelligence (AI) and computational methods is a potential way to design novel anti-methanogenic compounds that empirical data have yet to elucidate. In our preliminary work, machine learning (ML) frameworks reasoning over bovine metabolites associated with the rumen microbiome has been shown to extract important molecular features for the intended functionality. Past research has indicated the ability of complex calculations via AI, and back-of-the-envelope Fermi calculations to enhance our understanding of the microbiome’s relationship to animal production and allowing microbial biomarkers to be examined for CH4 emissions. Therefore, current collaborative efforts are exploring AI, ML and ‘omics’ measurements to reduce CH4 emissions in ruminant livestock. These efforts are geared to increase our abilities to both predict microbial interactions resulting in CH4 emissions, as well as explore the biochemical signatures of putative methanogenesis inhibitors.

6. (1:25-1:30) - Image-Based Honey Bee Larvae Viral and Bacterial Diagnosis Using Machine Learning
Copeland, Duan, ARS
Brendon M. Mott, Oliver L. Kortenkamp, Robert J. Erickson, and Kirk E. Anderson
Honey bees are essential pollinators of ecosystems and agriculture worldwide. It is estimated that 50-80% of crops are pollinated by honey bees, generating a market valuation of approximately $20 billion in the U.S. alone. However, commercial beekeepers often face an uphill battle, losing around 50% of their hives annually, and must effectively manage disease and parasites to remain economically viable. Colony losses are described as multifactorial, involving combinations of environmental factors and various disease agents including bacteria, virus, and fungi.. Strongly associated with colony decline, these disease agents largely target honey bee larvae and are commonly known as brood diseases. Antibiotics are used to combat brood disease and are effective for the treatment of bacterial pathogens European Foulbrood (EFB), and American Foulbrood (AFB) but AFB has evolved antibiotic resistance. Although efforts are in place to control and verify the use of antibiotics on honey bees, many undiagnosed brood diseases with a superficial resemblance to EFB (EFB-like disease caused by virus) are often misdiagnosed due to similar symptomology. Under these circumstances, commercial beekeepers often prophylactically treat entire apiaries with antibiotics based on the field diagnosis of one or two weak colonies. This action results in dysbiosis of the native gut microbiome, and over the long term, continues selection for antibiotic resistance. Thus, the correct field diagnosis of brood disease is challenging and requires years of experience to identify and differentiate various disease states according to subtle differences in larval symptomology. To explore the feasibility of an AI diagnosis tool, we collaborated with apiary inspectors and researchers from around the country to survey brood disease in their local apiaries. We photographed and sampled diseased larvae identified in the field as EFB or virus. Using next generation sequencing of the larval microbiome and molecular viral screening, we created a dataset of imaged larvae with correct diagnoses to generate a machine learning/AI algorithm. Our approach leveraged transfer learning techniques, utilizing deep convolutional neural networks pre-trained on large-scale datasets. These networks, originally designed for general image classification tasks, were fine-tuned to discriminate between EFB and viral infections in unclassified diseased honey bee larval images. This proof-of-concept study highlights the potential of AI-driven diagnostics in apiculture, offering a tool that could significantly improve the accuracy and speed of brood disease diagnosis in the field. By enabling more precise diagnoses, this technology could lead to more targeted treatments, reduce unnecessary antibiotic use, and slow the development of antibiotic resistance in bee pathogens. Furthermore, this approach opens possibilities for continuous learning and improvement as more data is collected, potentially leading to even more accurate and comprehensive diagnostic capabilities in the future.

7. (1:30-1:35) - Bayesian Learning for Predicting Causal Relationships Among Rice Traits Under Varying Growing Conditions
Richardson, Jared, University of Texas at Arlington
Dr. Shannon Pinson, Dr. Jeremy Edwards, Dr. Jianzhong Su
This collaborative study leverages data-driven agriculture and computational tools to predict causal relationships among rice traits under various growing conditions. Utilizing a combination of multi-omic datasets, we developed unique mixed models for the calculations of Best Linear Unbiased Predictors (BLUPs) and employed Bayesian Networks for the construction of Directed Acyclic Graphs (DAGs). By integrating these datasets with additional single nucleotide polymorphism (SNP) data, we created probabilistic graphical models that illustrate relationships between traits and genetic SNP markers. These findings potentially provide valuable insights for plant pathology and agronomy, with implications for optimizing growing conditions and enhancing our understanding of genomic-trait interactions.

8. (1:35-1:40) - Leveraging AI and High-Performance Computation for Structural Prediction and Dynamics Analysis of Foodborne Bacterial Colicin-Immunity Protein Complexes in Food Production and Safety
Koirala, Mahesh, ARS
Clifton K. Fagerquist
Artificial Intelligence (AI) is revolutionizing our understanding of foodborne bacteria and playing a greater role in enhancing food production and safety. Our work emphasizes the importance and use of AI-powered tools such as AlphaFold2 to predict the 3D structures of bacterial protein complexes, specifically colicin D, E3 and E8 and their immunity protein cognates. Colicins are bacterial toxins produced by pathogenic Escherichia coli that target and kill competing bacteria, while immunity proteins protect the host bacteria from their own colicins. These protein complexes are crucial to microbial competition, survival and regulation within bacterial communities, directly affecting the safety of food production environments. To explore the dynamics of interactions of these protein complexes, molecular dynamics (MD) simulations were performed using GROMACS, using the computational power of SCINet’s Ceres and Atlas systems with GPU nodes. These MD simulations provided detailed insight into the structural stability and binding behavior of colicin-immunity complexes over time. The ability to model such dynamics allows for better understanding how bacteria regulate intra- and inter-species competition, which is essential for controlling microbial populations in food-related environments. By integrating AI and high-performance computing, this research provides valuable knowledge on how colicin-immunity complexes function, offering potential strategies for managing the microbial population in food safety. This will contribute to the advancement of food safety protocols, enhancement of microbial control and promote better practices in food safety, ensuring positive benefits for public health.

9. (1:40-1:45) - Developing AI-Enhanced Statistical Process Control Tools for Controlling Salmonella In Poultry
Stasiewicz, Matthew, University of Illinois
Cecil Barnett-Neefs, Minho Kim, Erin Kealey, Brad Yang, Renato Orsi, Cristina Resendiz Moctezuma, Martin Wiedmann
Food safety microbiology is a challenging area to apply AI tools because classic laboratory-based data streams are too slow, costly, and limited-scope to generate large data. Yet there are urgent needs for novel food safety approaches, particularly for this test case of Salmonella in poultry where public health progress has largely stalled and the USDA is working through a process to modernize regulation including requiring formal statistical process control (SPC) for microbiological indicators of sanitary processing. This project is an innovative collaboration between industry and academic partners to build towards an AI approach to poultry safety through applying modern SPC methods to poultry processing, using machine-learning to identify additional promising features for process control, and ultimately clarify potential for AI in this domain. Specifically, one large US poultry processors provided at least daily Salmonella sampling data from one plant for about one year, as well as total bacterial count data and some processing parameter data from the same time frame. SPC methods identified at least 5 clusters of special causes of variation in bacterial count data; two of these clusters corresponded with obvious process changes the company could explain, such as a switch in wash chemical formulation, yet none obviously correlated to changes in Salmonella. To address this apparent deficiency in the utility of SPC, explored machine learning on the data set. Different types of tree analyses identified two significant factors associated with higher Salmonella prevalence: long times between bird harvest and cutting into parts, and statistically out of control high chiller pH levels. For both features, the company collaborators could provide a clear mechanistic justification for the risk factor, suggesting these could be useful additional targets for food safety control. Future work in this area could include building our privacy assured data sharing and model development approaches (such as federated learning at plant levels) to scale up this approach. Broadly, this work shows how academic-industry collaboration can build towards non-obvious use cases for AI.

Lightning Talk Session II

Session Date: Tuesday, November 19th
Session Time: 3:00pm-4:20pm
Session Location: Reveille
Session Moderator:

1. (3:00-3:05) - Exploring DeepVariant Space: Assessing the accuracy of custom-trained variant-calling models across species.
Arnold, Haley, ARS
Sheina Sim
Variant-calling has become so ubiquitous to genetic studies that is has become rare to find one that does not rely on accurate calls. While there have been many advancements in this process over the years, we are now able to gain an even greater accuracy in variant-calling using machine learning. DeepVariant not only has a highly accurate built-in variant calling model for whole-genome sequences, but also has the ability to custom-train variant-calling models for specific datasets. DeepVariant has been shown to be able to improve the accuracy of variant calls made by other software and through custom model training. Here, we explore the performance of custom models trained with DeepVariant with varying amounts of training data, parameters, and species, in order to assess the wider applicability of machine learning-trained models of variant-calling.

2. (3:05-3:10) - Using Machine Learning to Predict Planting Dates
Avila, Angela, University of Texas at Arlington
Jianzhong Su, Lina Castano-Duque, Gary Marek , Prasanna Gowda
This study presents an approach to estimating crop planting dates by integrating ground-based time series Leaf Area Index (LAI) measurements with satellite images, using machine learning. Eighteen years of time-series ground-measured LAI data, collected in Bushland, Texas, are used to represent the pure growth of crops. This data is costly and time-consuming to collect. To unify each year of LAI growth overtime, we use third-degree polynomials. To leverage the computationally fast model, neural networks, in associating crop time series growth with their planting dates, more data is necessary. Data augmentation is used to simulate possible LAI growth curves, for training the neural network. Training also requires the output, planting dates, which necessitates theoretical planting dates for the theoretical curves. Features of the curves are extracted and used in multiple linear regression to predict these theoretical planting dates. In time comparison, predicting 17,000 planting dates using feature extraction from growth curves would take approximately 30 minutes, compared to 1 second using a trained neural network. While the neural network model is based on pure LAI growth, satellite data is more practical and abundant. To link satellite information with LAI, we use Orthogonal Canonical Correlation Analysis (OCCA), which maps satellite data to LAI by finding optimal linear transformations that maximize the correlation between the two data views. The OCCA map combined with the trained neural network then allows us to estimate crop planting dates using purely time series satellite images.
3. (3:10-3:15) - Identification of plant-parasitic nematodes using deep learning tools
Waldo, Benjamin, ARS
Vikram Rangarajan and Fereshteh Shahoveisi
Plant-parasitic nematodes are an important threat to turfgrass. Left unmanaged, they can cause serious reductions in the quality and playability of golf greens. Nematode diagnostics are dependent on accurate identification of individuals extracted from soil, but expertise in nematology required to make reliable identifications is often limited in plant diagnostic laboratories. In this study, we evaluated the performance of EfficientNet V2, MobileNetV3, ResNet101, and Swin Transformer V2 convolutional neural network model architectures on classification of nematode images. Models were trained, tested, and validated from a dataset of 5,400 plant-parasitic nematode images across seven genera associated with turfgrass obtained with inverted and compound microscopes. Images were cropped and augmentation was performed using image rotations, random brightness, and gaussian noise addition. Classification accuracy was highest for EfficientNetV2 and Swin Transformer V2 at 95% and lowest for ResNet101 at 86%. The current findings demonstrate the potential application of machine learning tools for accurate nematode identification to aid in diagnostics.
4. (3:15-3:20) - Landscape Ecological Site Group Mapping Using Gradient Boosted Learning Applied to Climate and Soil Data
Meles, Menberu, ARS
D. Phillip Guertin
An Ecological Site Group (ESG) is a framework used in rangeland and ecosystem management to classify similar Ecological Sites (ESs) based on their responses to land management, disturbances, conservation practices, and environmental changes. Traditionally, ESGs have been assigned by expert groups. This study demonstrates the use of machine learning, specifically the XGBoost algorithm, to predict ESGs, replacing the need for expert-assigned classifications. By utilizing key soil properties and climate data, XGBoost accurately predicted ESGs at selected NRI points within Major Land Resource Areas (MLRAs) 65 and 69, achieving 95% and 99% accuracy—outperforming methods like decision trees and random forests. The model was further scaled to the landscape level using SSURGO soil map units, providing a broad, scalable, and data-driven alternative to traditional expert-based approaches.
5. (3:20-3:25) Gomez, Deyanira, ARS - Improving image resolution quality for small species identification (Diptera) in agricultural environments artificial intelligence/machine learning algorithms
Gomez, Deyanira, ARS
Camera systems in agriculture are advancing, particularly in the identification of small species through camera traps. While existing literature often emphasizes the use of lower-resolution images and small pixels for larger species, it paradoxically suggests higher-resolution images for identifying smaller species from a distance. This study aims to enhance image resolution in camera trap systems for better small species identification. To address the challenge, we integrated suitable hardware and software while considering environmental constraints. We developed a ruggedized device capable of withstanding varied weather conditions, which is crucial for agricultural applications. Our approach is grounded in the Systems Engineering Handbook from NASA, specifically the “program/project life cycle” framework. This framework includes phases from conceptualization to final design, fabrication, testing, and operations. During the conceptual phase, we evaluated various development boards and camera compatibility, creating pseudocode to facilitate device communication. The research encompasses development, preliminary completion, and final design phases as we ruggedized the device and analyzed data from various camera systems. Our focus is on improving the resolution quality of outdoor camera trap systems and comparing them with high-resolution cameras, excluding those below 5MP and macro/micro lenses. The study is conducted in southeast-central Nebraska at the USDA-ARS-MARC facility, concentrating on horn flies during warm seasons like spring and summer. It involves both low-level and high-level programming efforts. Ultimately, this research aims to enhance outdoor camera trap designs, improve resolution quality, and advance AI/ML algorithms for identifying small species (Diptera) in agricultural settings. The paper is structured to summarize existing research, followed by a discussion of the systems engineering framework, device development and deployment, results, and conclusions.

6. (3:25-3:30) - Community-Led Assessments Using Remote Sensing: Monitoring the Impacts of Climate Change on Cloudberry (Rubus chamaemorus)
Kassama, Sire, ARS
laire Friedrichsen, Sean Gleason, Lynn Marie Church, Grace Hunter, Bryan Jones Jr., Jaqueline Cleveland, Katie Pisarello, Warren Jones
Rural Indigenous Alaskan villages have wrestled with food insecurity since colonization due to early restrictions on subsistence food practices and today as a result of climate instability. Rising sea levels, coastal erosion, decreased winter snowpack, and increased rates of permafrost thaw threaten the growing range and increase variability in harvests of cloudberry. Cloudberry (Rubus chamaemorus) is a vital subsistence plant that is a significant source of fiber, Vitamin C, and other micronutrients. Harvest of this important plant includes travel into dangerous open waters via boat or through the tundra with an ATV. Both modes of transport rely on expensive gasoline and take considerable time for travel and hand harvest. The purpose of this research is to integrate oral histories of Indigenous community members with unmanned aerial system (UAS) imagery to monitor and predict future harvests of cloudberry. RGB UAS surveys were conducted in a small Indigenous village, Quinhagak, that has been particularly impacted by climate change in the last decade and has a strong research capacity through Nalaquq, a Yup’ik owned 14(h) ANCSA corporation subsidiary. Nalaquq has spearheaded efforts to train community member in UAS use for search and rescue efforts. Thus, at least 30 community members have extensive training in UAS data collection methods which are transferrable to other tasks. This empowers the community to independently contribute to monitoring the landscape. In the summer of 2024, approximately 25 ethnographic interviews were conducted to evaluate the location of major berry picking areas, attitudes to climate change and how it has altered berry harvests, and values associated with managing cloudberry. UAS surveys were conducted to include areas with variable elevation and proximity to major waterways to capture different landscapes and differing loads of cloudberry. Six out of eight of the sampled patches were identified from ethnographic interviews and maps of historic berry picking sites provided by Nalaquq. Field surveys were conducted to estimate berry size and weight per patch to assist in the development of a harvest index. From the UAS imagery, manual berry labeling and a combination of two convolutional neural network methods, ImageNet and ResNet-50, will be deployed to extract image features and automate detection of berries in the landscape. Together, these two data streams will inform predictive yield models. Results from the study will be useful to the local community to understand where to pick berries for an upcoming harvest season and ultimately reduce costs associated with subsistence practices.

7. (3:30-3:35) - Future Directions in Generative Social Science
Gillette, Shana, APHIS

I will review the future potential of generative artificial intelligence in understanding the impact of social systems and human behavior on animal health.

8. (3:35-3:40) - Imputing Measures of Diet Quality using Circana Scanner Data and Machine Learning
Stevens, Alexander, ARS
Abigail Okrent and Lisa Mancino
The Healthy Eating Index (HEI) summarizes how well a set of foods, be it a meal, a day’s worth of food, or a trip to the grocery store, align with the Dietary Guidelines for Americans (DGAs). It serves as a valuable metric for healthiness of diets for researchers. Currently the Economic Research Service (ERS) produces the Purchase to Plate Suite (PPS), which allows the HEI to be calculated using Circana (formerly IRI) household and retail scanner data. However, constructing the PPS is labor intensive and lagged several years from the current year. This means that researchers are limited in their ability to use the most recent version of the Circana scanner data to examine policy relevant questions related to diet quality. The ERS Food Product Groups (EFPGs), which classifies foods into 90 food categories, are mapped to UPCs in the Circana scanner data. The EFPG-to-UPC mapping can be updated much more quickly than the PPS because only descriptive text information in the Circana data is required. The question arises as to whether the EFPGs in conjunction with other product label information can be predictive of HEI as measured by the PPS. If this information is highly predictive of HEI, timely information of diet quality can be made available to researchers conducting nutrition and food policy research using scanner data. The goal of this research is to use EFPGs and product label information to gauge the feasibility and accuracy of constructing imputed HEI scores. Circana Consumer Network contains a large number of variables. Currently, we are using a machine learning model, gradient boosted regression trees, to generate the predictions 12 HEI adequacy and moderation components and use this relationship to impute the HEI.

2024 Forum on AI Applications to USDA Science