Skip to content
  • Solutions

    Solutions

    E-commerce market data & insights

    Accurate and granular insights backed by millions of data points collected every week

    Tradewinds

    Cube's strategic e-commerce market data offering for investors, internet platforms, and corporates.
  • Technology

    Technology

    Why Cube?

    We achieve best-in-class data accuracy and granularity through statistical and machine learning models

    Data Collection

    Multiple data sources ensure accurate, high-quality insights by triangulating proprietary and external data.

    Data Security

    At Cube, safeguarding data integrity and confidentiality is our foremost priority.

    Product Tagging

    We have the most comprehensive classification system in the industry across platforms and countries
  • Insights

    Insights

    Cube Pulse - Articles

    Stay updated with our latest research, insights, and perspectives on hot topics in e-commerce

    Research reports

    Free to download research reports to help you understand the e-commerce landscape better

    E-commerce Glossary

    We've put all e-commerce terminology in one place so you can find the answers you are searching for

    E-commerce Platform Take-rate tracker

    This tracker provides up-to-date insights into current e-commerce platform take-rates by country and product category, offering a clear view of the marketplace dynamics.

    E-commerce Category Tree

    Cube's coverage follows a unified, standardized category tree across countries and platforms.

    OSCX index

    A benchmark of e-commerce seller sentiment across Southeast Asia and other key Shopee markets
  • Community

    Community

    Shopper Panel

    Join our Online Shopper Panel and answer paid surveys about your habits and opinions about e-commerce

    Seller Panel

    Join the Cube Data Partnership to help us build better insights for the ecosystem
  • Company

    Company

    About us

    Learn more about our mission, credentials, and how we work.

    Our Team

    We recruit great people and help them become better. Collaborate, Innovate, and Lead with Data at Cube

    Careers

    If you love working with data, hold a high bar for quality, and enjoy working with super-smart and collaborative people, Cube is the perfect place for you.

    Contact us

    Send us a question, ask to be contacted, or book a meeting with our expert team today

    Cube in the news

    Cube is frequently quoted in news stories about e-commerce. See our latest coverage here

    Security

    Explore how Cube protects data through enterprise-grade security, strict controls, and recognized compliance standards.

    Trust Center

    Highlights of our high-level risk mitigation strategy, industry best practices, and commitment to continuous improvement.
Contact us
  • Solution
    • E-commerce market data & insights
    • Tradewinds – Strategic Market Data for Southeast Asia E-Commerce
  • Technology
    • Why Cube Asia?
    • Data Collection
    • Data Security
    • Product Tagging
  • Insights
    • Cube Pulse – Articles
    • Research Reports
    • E-Commerce Glossary
    • Shopee Take-Rate Tracker
    • E-commerce Category Tree
    • OSCX Index
  • Community
    • Shopper Panel
    • Seller Panel
  • Company
    • About us
    • Our Team
    • Career
    • Cube in the news
  • Contact us

LLMs at Cube: Google Gemini leads as of end-2025, but the race is not yet over

  • By Haris Rehman
  • January 13, 2026
Link

While we usually focus on e-commerce insights, in this piece we are pulling back the curtain on the technology that powers them. Haris, from our AI Engineering team, shares a few nuggets from the R&D we conduct to ensure our clients get the most accurate data possible.

Executive Summary

At Cube, we have been testing Large Language Models (LLMs) for nearly 2 years now, using some of the standard analytics we do for our clients as a reference to benchmark their relative performance across 3 dimensions – accuracy, cost, and time. So as we open the curtains on a new year, we thought it would be a good idea to share the results from our latest round of testing – particularly after an exciting period at the end of last year when both Google and OpenAI released their latest models.

This benchmark is based on a specific type of analytics we do, where we use “SQL Agents” to turn plain English questions into precise database code (SQL). For example, a brand manager might ask, “What was the month-over-month revenue growth for our top 3 SKUs in Q4?” The AI must translate that natural language into a complex SQL query to pull the correct numbers from the database.

We then put these models through a “stress test” of such questions, ranging from simple lookups to complex logic puzzles, to see which model wrote the most accurate code, how fast they did it, and how much it cost.

This Pulse report benchmarks leading LLMs to identify the optimal engine for a natural language-to-SQL agent as described above. While cost-effective options exist, our analysis confirms Google’s Gemini models have effectively separated themselves from the pack. Both Gemini 2.5 Pro and 3 Pro models offer top tier accuracy (~98.8%) and much lower costs than OpenAI’s equally accurate GPT 5.2.

Core Finding: Accuracy but at different costs

For the kinds of big-data analytics we do, Gemini 3 Pro and GPT 5.2 were both equally accurate, but at vastly different costs. GPT 5.2 is only about 10% more expensive than Gemini 3, but the GPT 5.2 Pro is more than 10x more expensive.

Near-perfect Accuracy

Gemini 3 Pro achieves a 98.8% average accuracy, matching its previous version (Gemini 2.5 Pro) and beating the strongest competitor (DeepSeek Terminus Reasoning) by over 2%. Crucially, GPT 5.2 (Reasoning) matched this benchmark exactly, also scoring 98.8%.

The OpenAI Comeback

For a while towards the end of last year, it seemed that Google had taken a strong lead in model performance, and OpenAI trailed significantly with GPT 5.1. However, the release of GPT 5.2 erased that deficit. It now matches Gemini 3 Pro’s accuracy step-for-step (98.8%) and has successfully bridged the “attention to detail” gap. The narrative (at least on performance alone) has shifted from “Gemini vs. the rest” to a direct face-off between two equals.

The Speed Question

Historically, a key barrier to adopting high-reasoning models (at least as an embedded feature in self-serve data dashboards) has been latency. The previous market leader, Gemini 2.5 Pro, required an agonizing average of 216 seconds (3 minutes, 36 seconds) to generate a single complex query. Competitors have done better, with DeepSeek Terminus Reasoning taking 189 seconds and GPT 5.1 (Reasoning) clocking in at 139 seconds.

Gemini 3 Pro made headlines with its 95-second response time, but GPT 5.2 has gone even faster, clocking in at an average of 76.5 seconds. This makes GPT 5.2 roughly 20% faster than Gemini 3 Pro. Both models have successfully moved “Reasoning” capabilities from experimental features to viable production tools, but OpenAI currently holds the speed crown. Nonetheless, with wait times of over 1 minute, these models remain unusable for end-user deployment of the use cases that we are testing at least.

The Competitive Landscape & Trade-offs to consider

Trade-off 1: Cost

High intelligence comes at a premium. Gemini 3 Pro costs $0.061 per query, while GPT-5.2 is slightly more expensive at $0.069 per query. While both are significantly pricier than DeepSeek (approximately $0.005 per query), they are the only viable options for mission-critical accuracy. 

Trade-off 2: The “Reasoning” Latency Tax

Despite the speed improvements in Gemini 3 Pro and OpenAI GPT 5.2, “Reasoning” models impose an inherent latency tax compared to standard models. While standard models like DeepSeek Terminus execute in roughly 33 seconds, users must still wait 1.5 minutes for Gemini 3 Pro and GPT 5.2. For many applications, this wait may be necessary: in our experience, enabling reasoning is the only way to maintain >96% accuracy on “Hard” complexity questions. Standard models drop to approximately 86% accuracy on these tasks, a risk level unacceptable for reliable reporting in our use cases.

Trade-off 3: DeepSeek as the “Almost” Contender

DeepSeek V3.1 Terminus (Reasoning) represents an impressive technical feat, achieving a respectable 96.4% accuracy. Its primary failure is speed; at 189 seconds, it is remarkably slow and lacks the fine-grained edge-case handling found in Gemini. Consequently, DeepSeek serves as a “good enough” option for strict budget constraints, whereas OpenAI GPT 5.2 and Gemini 3 Pro remain the “best” option for performance requirements.

Trade-off 4: Speed vs. Efficiency

The choice at the top tier is no longer about capability, but optimization. Gemini 3 Pro remains the most cost-efficient of the elite models ($0.061), saving ~13% on every query compared to GPT 5.2, provided you can tolerate a slightly longer wait (95s). Whereas, GPT 5.2 is the speed king. If your user experience demands the fastest possible “Reasoning” response (76s), the extra cost is a necessary premium.

Conclusion & Recommendation

The data indicates a clear hierarchy in LLMs for building agents for data analytics. Gemini 3 Pro and GPT 5.2 stand alone as the winner overall, being the only models to combine peak accuracy (98.8%) with low cost (less than US$ 0.07 per query) and tolerable latency (less than 100s).

Final Verdict:

For a production-ready agent where data integrity is paramount, the field has narrowed to two choices. GPT 5.2 takes the crown for raw speed, while Gemini 3 Pro retains the edge in cost efficiency. Both have rendered previous leaders obsolete.

Related Articles

Top three mobile commerce that is popular in Malaysian market

In the news, Lazada, Shopee, TikTok Shop

February 5, 2025

Go big or go home: Signs of consolidation in Southeast Asia e-commerce

Lazada, Shopee, TikTok Shop

January 6, 2025

Year in review: 5 highlights for Southeast Asia e-commerce in 2024, and our predictions for 2025

Shopee

December 17, 2024

A steady climb up: Charting Shopee’s growing take-rates in Southeast Asia in 2024

Become part of our e-commerce community

Join 2,000+ other leaders and experts to stay informed about the latest news, insights, and updates

  • info@cube.asia
  • Cube

Solutions

  • E-commerce market data & insights
  • Tradewinds

Community

  • Shopper Panel
  • Seller Panel

Technology

  • Why Cube?
  • Data Collection
  • Data Security
  • Product Tagging

Insights

  • Cube Pulse - Articles
  • Research reports
  • E-Commerce Glossary
  • Take-rate tracker
  • E-commerce Category Tree
  • OSCX Index

Company

  • About us
  • Our Team
  • Careers
  • Contact us
  • Cube in the news
  • Security
  • Trust Center

Copyright 2025 Cube. All Rights Reserved.

Your message has been successfully sent

We appreciate that you’ve taken the time to write us. We’ll get back to you very soon. Please come back and see us often.