06. AI-Driven Patent Data Extraction and Analysis for Agricultural Patents

Conference: Women in Statistics and Data Science 2025
11/13/2025: 2:30 PM - 4:00 PM EST
Speed 

Description

The AI-Driven Patent Data Extraction and Analysis System for Corteva Agriscience was a research project under The Data Mine at Purdue University. A team of 9 undergraduate and graduate students designed the system for efficient retrieval, extraction, and analysis of agricultural patents related to crop protection. The project integrated cutting-edge technologies, including large language models (LLMs) and advanced tools for data extraction and structured search. These capabilities will allow scientists and researchers to efficiently access, extract, and analyze patent data, enabling faster and more informed decision-making.

Project Objectives:

1. Patent Retrieval Development – Developed the system for retrieving patents directly from Google patents.
2. Automated Data Extraction – Developed a tool that extracts and converts patent metadata as well as relevant content from the example section of patents into a structured table format, making it downloadable for further analysis.
3. Interactive Chat Module – Implemented an LLM-chatbot that helps scientists to perform IP-related queries.

Keywords

Patent Data Extraction


Large Language Models (LLMs)

Intellectual Property Analytics

Structured Data Retrieval

Student-Led Research

Data-Driven Decision Making 

Presenting Author

Srishti Maurya

First Author

Srishti Maurya

CoAuthor(s)

Anna Bajszczak
Lina Im, Student

Target Audience

Beginner

Tracks

Knowledge
Women in Statistics and Data Science 2025