A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries.

Nov 1, 2025·

Oleg Zendel

Sara Fahad Dawood Al Lawati

Lida Rashidi

Falk Scholer

Mark Sanderson

· 1 min read

DOI PDF

Abstract

Large Language Models (LLMs) are increasingly used to generate search queries for various Information Retrieval (IR) tasks. However, it remains unclear how these machine-generated queries compare to human-written ones, particularly in terms of diversity and alignment with real user behavior. This paper presents an empirical comparison of LLM- and human-generated queries across multiple dimensions, including lexical diversity, linguistic variation, and retrieval effectiveness. We analyze queries produced by several LLMs and compare them with human queries from two datasets collected five years apart. Our findings show that while LLMs can generate diverse queries, their patterns differ from those observed in human behavior. LLM queries typically exhibit higher surface-level uniqueness but rely less on stopword use and word form variation. They also achieve lower retrieval effectiveness when judged against human queries, suggesting that LLM-generated queries may not always reflect real user intent. These differences highlight the limitations of current LLMs in replicating natural querying behavior. We discuss the implications of these findings for LLM-based query generation and user behavior simulation in IR. We conclude that while LLMs hold potential, they should be used with caution.

Type

Conference paper

Publication

CIKM

Add the full text or supplementary notes for the publication here using Markdown formatting.

Last updated on Nov 1, 2025