Technical Whitepaper

GenRank: A Methodology for Measuring Entity Visibility in Large Language Model

Version 2.0•January 2026•Simon Kim, Jeahong Lee

Abstract

This document presents the GenRank methodology—a systematic framework for quantifying entity visibility within large language model (LLM) recommendation outputs. As AI-driven discovery increasingly influences consumer behavior and market dynamics, understanding how LLMs prioritize and recommend entities becomes critical for researchers, marketers, and policymakers. GenRank employs a multi-model polling approach combined with market-weighted scoring to produce normalized visibility indices. This methodology document details our data collection protocols, mathematical scoring framework, model weight allocation procedures, and known limitations.

1. Introduction

1.1 Background

The proliferation of large language models (LLMs) as primary information discovery tools has fundamentally altered how consumers identify and evaluate products, services, and brands. Unlike traditional search engines that display ranked results based on explicit relevance signals, LLMs generate synthesized recommendations that reflect patterns learned during training and reinforcement learning processes.¹

This shift presents both opportunities and challenges. Entities that achieve favorable positioning in LLM outputs may benefit from increased visibility and consideration, while those absent from recommendations face potential market invisibility—a phenomenon we term AI-mediated discovery bias.²

1.2 Research Objectives

GenRank addresses the following research objectives:

Establish a reproducible framework for measuring entity visibility across multiple LLM platforms
Develop a market-weighted scoring system that reflects real-world AI usage patterns
Create transparent, open-access datasets for academic and commercial research
Enable longitudinal analysis of recommendation patterns and temporal trends

1.3 Definitions

Entity: A distinct brand, product, service, organization, or concept that can be identified and tracked across LLM outputs.
Visibility Score: A normalized metric (0–100) representing an entity's prominence in LLM recommendations within a specific category.
Market Weight: A coefficient assigned to each LLM reflecting its estimated share of global AI assistant usage.

2. Data Collection

2.1 Query Design

Queries are designed to elicit ranked recommendation lists from LLMs. Each query undergoes validation against the following criteria:

Specificity — Queries must target a defined category or use case
Neutrality — Queries must not contain leading language or brand mentions
Reproducibility — Queries must consistently elicit list-format responses
Temporal stability — Queries should remain relevant across update cycles

2.2 Polling Parameters

Approved queries are submitted to each active LLM via official API endpoints. To minimize variance from stochastic sampling, each query-model pair is executed with controlled parameters:

temperature: 0.7

top_p: 1.0

max_tokens: 2048

frequency_penalty: 0.0

2.3 Entity Resolution

Raw LLM outputs undergo entity resolution to map surface-form variations to canonical entity identifiers. This process employs:

Lexical normalization — Case folding, punctuation removal, whitespace normalization
Alias mapping — Maintained database of known aliases and abbreviations
Semantic clustering — LLM-assisted disambiguation for novel entity mentions
Human validation — Manual review of low-confidence mappings

3. Scoring Methodology

3.1 Theoretical Foundation

The GenRank scoring system is grounded in information retrieval theory, specifically adapting the logarithmic discounting principles used in normalized discounted cumulative gain (nDCG) metrics.³ We employ logarithmic decay to model the diminishing marginal value of lower-ranked positions while ensuring all mentions contribute non-zero utility.

3.2 Mathematical Formulation

Definition 3.1 — Score Function

For an entity e at rank position r from model m with weight w_m:

S(e, r, m) = [1 / (1 + log₁₀(r))] × w_m × 100

The final GenRank score aggregates across all queries Q and models M:

G(e) = Σ_q∈Q Σ_m∈M S(e, r_q,m, m)

3.3 Score Decay

The logarithmic decay function produces the following position-relative values:

Position	Raw Score	Relative Value	Decay
1	1.000	100.0%	—
2	0.769	76.9%	−23.1%
3	0.677	67.7%	−9.2%
5	0.588	58.8%	−8.9%
10	0.500	50.0%	−8.8%
20	0.435	43.5%	−6.5%

3.4 Example Calculation

Consider entity "Notion" receiving the following rankings for the query "What are the best productivity applications?":

Model	Weight	Rank	Raw	Weighted
GPT-5.2	0.15	1	1.000	15.00
Gemini 3 Flash	0.13	2	0.769	10.00
Claude Sonnet 4.5	0.10	3	0.677	6.77
Total (single query)				31.77

4. Model Weight Allocation

4.1 Methodology

Model weights are derived from a composite index incorporating three primary data sources, each addressing different aspects of real-world AI influence:

Consumer market share (40%) — Monthly active users and web traffic data from Statcounter, SimilarWeb, and company disclosures
Enterprise adoption (35%) — Deployment metrics from industry reports and API revenue estimates⁵
Developer API usage (25%) — Token throughput from aggregated gateways⁶

4.2 Current Allocation (Q1 2026)

The following table presents active model weights as of January 2026. Weights are recalibrated quarterly.

Provider	Model	Weight
OpenAI	GPT-5.2	0.15
	GPT-5 Mini	0.09
	GPT-4o-mini	0.04
	GPT-OSS-120B	0.03
Google	Gemini 3 Flash	0.13
	Gemini 2.5 Flash Lite	0.04
	Gemini 2.0 Flash	0.03
Anthropic	Claude Sonnet 4.5	0.10
Anthropic	Claude Haiku 4.5	0.07
xAI	Grok 4.1 Fast	0.10
Perplexity	Sonar	0.07
DeepSeek	DeepSeek V3.2	0.05
DeepSeek	DeepSeek V3 0324	0.02
Meta	Llama 4 Maverick	0.03
MiniMax	MiniMax M2	0.03
Z.AI	GLM 4.7	0.02
Total (n=16 active models)		1.00

4.3 Provider Distribution

Aggregate weights by provider:

31%

OpenAI

20%

Google

17%

Anthropic

10%

xAI

Perplexity

DeepSeek

5. Limitations

Users should consider the following methodological limitations when interpreting results:

5.1 Temporal Variability

LLM outputs may vary over time due to model updates, fine-tuning, and reinforcement learning from human feedback. GenRank captures point-in-time snapshots and should not be interpreted as static ground truth.

5.2 Stochastic Sampling

Despite controlled parameters, LLM outputs exhibit inherent randomness. Identical queries may produce different rankings across executions. Statistical aggregation mitigates but does not eliminate this variance.

5.3 Market Weight Estimation

Model weights are derived from publicly available market data which may not fully reflect actual usage patterns. Enterprise API usage is estimated from secondary sources and may contain measurement error.

5.4 Query Design Bias

The selection and phrasing of queries may influence recommendation outcomes. Query design inherently reflects researcher assumptions about relevant use cases and natural language patterns.

5.5 Entity Resolution Errors

Automated entity resolution may introduce errors through incorrect alias mappings or failure to distinguish between similarly-named entities.

6. References

Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.
Shah, C., & Bender, E. M. (2024). Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web? ACM Transactions on the Web.
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.
Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.
Menlo Ventures. (2025). 2025 Mid-Year LLM Market Update: Foundation Model Landscape + Economics.
OpenRouter. (2025). LLM Rankings: Model Usage Statistics.
Statcounter. (2025). AI Chatbot Market Share Worldwide.