Technical Whitepaper

GenRank: A Methodology for Measuring Entity Visibility in Large Language Model

Version 2.0January 2026Simon Kim, Jeahong Lee

Abstract

This document presents the GenRank methodology—a systematic framework for quantifying entity visibility within large language model (LLM) recommendation outputs. As AI-driven discovery increasingly influences consumer behavior and market dynamics, understanding how LLMs prioritize and recommend entities becomes critical for researchers, marketers, and policymakers. GenRank employs a multi-model polling approach combined with market-weighted scoring to produce normalized visibility indices. This methodology document details our data collection protocols, mathematical scoring framework, model weight allocation procedures, and known limitations.

1. Introduction

1.1 Background

The proliferation of large language models (LLMs) as primary information discovery tools has fundamentally altered how consumers identify and evaluate products, services, and brands. Unlike traditional search engines that display ranked results based on explicit relevance signals, LLMs generate synthesized recommendations that reflect patterns learned during training and reinforcement learning processes.1

This shift presents both opportunities and challenges. Entities that achieve favorable positioning in LLM outputs may benefit from increased visibility and consideration, while those absent from recommendations face potential market invisibility—a phenomenon we term AI-mediated discovery bias.2

1.2 Research Objectives

GenRank addresses the following research objectives:

  1. Establish a reproducible framework for measuring entity visibility across multiple LLM platforms
  2. Develop a market-weighted scoring system that reflects real-world AI usage patterns
  3. Create transparent, open-access datasets for academic and commercial research
  4. Enable longitudinal analysis of recommendation patterns and temporal trends

1.3 Definitions

Entity
A distinct brand, product, service, organization, or concept that can be identified and tracked across LLM outputs.
Visibility Score
A normalized metric (0–100) representing an entity's prominence in LLM recommendations within a specific category.
Market Weight
A coefficient assigned to each LLM reflecting its estimated share of global AI assistant usage.

2. Data Collection

2.1 Query Design

Queries are designed to elicit ranked recommendation lists from LLMs. Each query undergoes validation against the following criteria:

  • Specificity — Queries must target a defined category or use case
  • Neutrality — Queries must not contain leading language or brand mentions
  • Reproducibility — Queries must consistently elicit list-format responses
  • Temporal stability — Queries should remain relevant across update cycles

2.2 Polling Parameters

Approved queries are submitted to each active LLM via official API endpoints. To minimize variance from stochastic sampling, each query-model pair is executed with controlled parameters:

temperature: 0.7
top_p: 1.0
max_tokens: 2048
frequency_penalty: 0.0

2.3 Entity Resolution

Raw LLM outputs undergo entity resolution to map surface-form variations to canonical entity identifiers. This process employs:

  1. Lexical normalization — Case folding, punctuation removal, whitespace normalization
  2. Alias mapping — Maintained database of known aliases and abbreviations
  3. Semantic clustering — LLM-assisted disambiguation for novel entity mentions
  4. Human validation — Manual review of low-confidence mappings

3. Scoring Methodology

3.1 Theoretical Foundation

The GenRank scoring system is grounded in information retrieval theory, specifically adapting the logarithmic discounting principles used in normalized discounted cumulative gain (nDCG) metrics.3 We employ logarithmic decay to model the diminishing marginal value of lower-ranked positions while ensuring all mentions contribute non-zero utility.

3.2 Mathematical Formulation

Definition 3.1 — Score Function

For an entity e at rank position r from model m with weight wm:

S(e, r, m) = [1 / (1 + log10(r))] × wm × 100

The final GenRank score aggregates across all queries Q and models M:

G(e) = Σq∈Q Σm∈M S(e, rq,m, m)

3.3 Score Decay

The logarithmic decay function produces the following position-relative values:

PositionRaw ScoreRelative ValueDecay
11.000100.0%
20.76976.9%−23.1%
30.67767.7%−9.2%
50.58858.8%−8.9%
100.50050.0%−8.8%
200.43543.5%−6.5%

3.4 Example Calculation

Consider entity "Notion" receiving the following rankings for the query "What are the best productivity applications?":

ModelWeightRankRawWeighted
GPT-5.20.1511.00015.00
Gemini 3 Flash0.1320.76910.00
Claude Sonnet 4.50.1030.6776.77
Total (single query)31.77

4. Model Weight Allocation

4.1 Methodology

Model weights are derived from a composite index incorporating three primary data sources, each addressing different aspects of real-world AI influence:

  • Consumer market share (40%) — Monthly active users and web traffic data from Statcounter, SimilarWeb, and company disclosures
  • Enterprise adoption (35%) — Deployment metrics from industry reports and API revenue estimates5
  • Developer API usage (25%) — Token throughput from aggregated gateways6

4.2 Current Allocation (Q1 2026)

The following table presents active model weights as of January 2026. Weights are recalibrated quarterly.

ProviderModelWeight
OpenAIGPT-5.20.15
GPT-5 Mini0.09
GPT-4o-mini0.04
GPT-OSS-120B0.03
GoogleGemini 3 Flash0.13
Gemini 2.5 Flash Lite0.04
Gemini 2.0 Flash0.03
AnthropicClaude Sonnet 4.50.10
Claude Haiku 4.50.07
xAIGrok 4.1 Fast0.10
PerplexitySonar0.07
DeepSeekDeepSeek V3.20.05
DeepSeek V3 03240.02
MetaLlama 4 Maverick0.03
MiniMaxMiniMax M20.03
Z.AIGLM 4.70.02
Total (n=16 active models)1.00

4.3 Provider Distribution

Aggregate weights by provider:

31%
OpenAI
20%
Google
17%
Anthropic
10%
xAI
7%
Perplexity
7%
DeepSeek
3%
Meta
3%
MiniMax
2%
Z.AI

5. Limitations

Users should consider the following methodological limitations when interpreting results:

5.1 Temporal Variability

LLM outputs may vary over time due to model updates, fine-tuning, and reinforcement learning from human feedback. GenRank captures point-in-time snapshots and should not be interpreted as static ground truth.

5.2 Stochastic Sampling

Despite controlled parameters, LLM outputs exhibit inherent randomness. Identical queries may produce different rankings across executions. Statistical aggregation mitigates but does not eliminate this variance.

5.3 Market Weight Estimation

Model weights are derived from publicly available market data which may not fully reflect actual usage patterns. Enterprise API usage is estimated from secondary sources and may contain measurement error.

5.4 Query Design Bias

The selection and phrasing of queries may influence recommendation outcomes. Query design inherently reflects researcher assumptions about relevant use cases and natural language patterns.

5.5 Entity Resolution Errors

Automated entity resolution may introduce errors through incorrect alias mappings or failure to distinguish between similarly-named entities.

6. References

  1. Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.
  2. Shah, C., & Bender, E. M. (2024). Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web? ACM Transactions on the Web.
  3. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.
  4. Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.
  5. Menlo Ventures. (2025). 2025 Mid-Year LLM Market Update: Foundation Model Landscape + Economics.
  6. OpenRouter. (2025). LLM Rankings: Model Usage Statistics.
  7. Statcounter. (2025). AI Chatbot Market Share Worldwide.

This document is subject to periodic revision.
Current version available at GenRank.com/methodology