Technical Whitepaper
GenRank: A Methodology for Measuring Entity Visibility in Large Language Model
Abstract
This document presents the GenRank methodology—a systematic framework for quantifying entity visibility within large language model (LLM) recommendation outputs. As AI-driven discovery increasingly influences consumer behavior and market dynamics, understanding how LLMs prioritize and recommend entities becomes critical for researchers, marketers, and policymakers. GenRank employs a multi-model polling approach combined with market-weighted scoring to produce normalized visibility indices. This methodology document details our data collection protocols, mathematical scoring framework, model weight allocation procedures, and known limitations.
1. Introduction
1.1 Background
The proliferation of large language models (LLMs) as primary information discovery tools has fundamentally altered how consumers identify and evaluate products, services, and brands. Unlike traditional search engines that display ranked results based on explicit relevance signals, LLMs generate synthesized recommendations that reflect patterns learned during training and reinforcement learning processes.1
This shift presents both opportunities and challenges. Entities that achieve favorable positioning in LLM outputs may benefit from increased visibility and consideration, while those absent from recommendations face potential market invisibility—a phenomenon we term AI-mediated discovery bias.2
1.2 Research Objectives
GenRank addresses the following research objectives:
- Establish a reproducible framework for measuring entity visibility across multiple LLM platforms
- Develop a market-weighted scoring system that reflects real-world AI usage patterns
- Create transparent, open-access datasets for academic and commercial research
- Enable longitudinal analysis of recommendation patterns and temporal trends
1.3 Definitions
- Entity
- A distinct brand, product, service, organization, or concept that can be identified and tracked across LLM outputs.
- Visibility Score
- A normalized metric (0–100) representing an entity's prominence in LLM recommendations within a specific category.
- Market Weight
- A coefficient assigned to each LLM reflecting its estimated share of global AI assistant usage.
2. Data Collection
2.1 Query Design
Queries are designed to elicit ranked recommendation lists from LLMs. Each query undergoes validation against the following criteria:
- Specificity — Queries must target a defined category or use case
- Neutrality — Queries must not contain leading language or brand mentions
- Reproducibility — Queries must consistently elicit list-format responses
- Temporal stability — Queries should remain relevant across update cycles
2.2 Polling Parameters
Approved queries are submitted to each active LLM via official API endpoints. To minimize variance from stochastic sampling, each query-model pair is executed with controlled parameters:
2.3 Entity Resolution
Raw LLM outputs undergo entity resolution to map surface-form variations to canonical entity identifiers. This process employs:
- Lexical normalization — Case folding, punctuation removal, whitespace normalization
- Alias mapping — Maintained database of known aliases and abbreviations
- Semantic clustering — LLM-assisted disambiguation for novel entity mentions
- Human validation — Manual review of low-confidence mappings
3. Scoring Methodology
3.1 Theoretical Foundation
The GenRank scoring system is grounded in information retrieval theory, specifically adapting the logarithmic discounting principles used in normalized discounted cumulative gain (nDCG) metrics.3 We employ logarithmic decay to model the diminishing marginal value of lower-ranked positions while ensuring all mentions contribute non-zero utility.
3.2 Mathematical Formulation
Definition 3.1 — Score Function
For an entity e at rank position r from model m with weight wm:
S(e, r, m) = [1 / (1 + log10(r))] × wm × 100
The final GenRank score aggregates across all queries Q and models M:
G(e) = Σq∈Q Σm∈M S(e, rq,m, m)
3.3 Score Decay
The logarithmic decay function produces the following position-relative values:
| Position | Raw Score | Relative Value | Decay |
|---|---|---|---|
| 1 | 1.000 | 100.0% | — |
| 2 | 0.769 | 76.9% | −23.1% |
| 3 | 0.677 | 67.7% | −9.2% |
| 5 | 0.588 | 58.8% | −8.9% |
| 10 | 0.500 | 50.0% | −8.8% |
| 20 | 0.435 | 43.5% | −6.5% |
3.4 Example Calculation
Consider entity "Notion" receiving the following rankings for the query "What are the best productivity applications?":
| Model | Weight | Rank | Raw | Weighted |
|---|---|---|---|---|
| GPT-5.2 | 0.15 | 1 | 1.000 | 15.00 |
| Gemini 3 Flash | 0.13 | 2 | 0.769 | 10.00 |
| Claude Sonnet 4.5 | 0.10 | 3 | 0.677 | 6.77 |
| Total (single query) | 31.77 | |||
4. Model Weight Allocation
4.1 Methodology
Model weights are derived from a composite index incorporating three primary data sources, each addressing different aspects of real-world AI influence:
- Consumer market share (40%) — Monthly active users and web traffic data from Statcounter, SimilarWeb, and company disclosures
- Enterprise adoption (35%) — Deployment metrics from industry reports and API revenue estimates5
- Developer API usage (25%) — Token throughput from aggregated gateways6
4.2 Current Allocation (Q1 2026)
The following table presents active model weights as of January 2026. Weights are recalibrated quarterly.
| Provider | Model | Weight |
|---|---|---|
| OpenAI | GPT-5.2 | 0.15 |
| GPT-5 Mini | 0.09 | |
| GPT-4o-mini | 0.04 | |
| GPT-OSS-120B | 0.03 | |
| Gemini 3 Flash | 0.13 | |
| Gemini 2.5 Flash Lite | 0.04 | |
| Gemini 2.0 Flash | 0.03 | |
| Anthropic | Claude Sonnet 4.5 | 0.10 |
| Claude Haiku 4.5 | 0.07 | |
| xAI | Grok 4.1 Fast | 0.10 |
| Perplexity | Sonar | 0.07 |
| DeepSeek | DeepSeek V3.2 | 0.05 |
| DeepSeek V3 0324 | 0.02 | |
| Meta | Llama 4 Maverick | 0.03 |
| MiniMax | MiniMax M2 | 0.03 |
| Z.AI | GLM 4.7 | 0.02 |
| Total (n=16 active models) | 1.00 | |
4.3 Provider Distribution
Aggregate weights by provider:
5. Limitations
Users should consider the following methodological limitations when interpreting results:
5.1 Temporal Variability
LLM outputs may vary over time due to model updates, fine-tuning, and reinforcement learning from human feedback. GenRank captures point-in-time snapshots and should not be interpreted as static ground truth.
5.2 Stochastic Sampling
Despite controlled parameters, LLM outputs exhibit inherent randomness. Identical queries may produce different rankings across executions. Statistical aggregation mitigates but does not eliminate this variance.
5.3 Market Weight Estimation
Model weights are derived from publicly available market data which may not fully reflect actual usage patterns. Enterprise API usage is estimated from secondary sources and may contain measurement error.
5.4 Query Design Bias
The selection and phrasing of queries may influence recommendation outcomes. Query design inherently reflects researcher assumptions about relevant use cases and natural language patterns.
5.5 Entity Resolution Errors
Automated entity resolution may introduce errors through incorrect alias mappings or failure to distinguish between similarly-named entities.
6. References
- Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.
- Shah, C., & Bender, E. M. (2024). Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web? ACM Transactions on the Web.
- Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.
- Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.
- Menlo Ventures. (2025). 2025 Mid-Year LLM Market Update: Foundation Model Landscape + Economics.
- OpenRouter. (2025). LLM Rankings: Model Usage Statistics.
- Statcounter. (2025). AI Chatbot Market Share Worldwide.