AI reality check: Can ChatGPT and Google Bard produce reliable e-commerce insights?

By Simon Torring
July 17, 2023

Context: Breakthrough capabilities, but risks of misinformation

So much has been written about ChatGPT since its launch that most of us have probably wondered how AI will enhance or challenge our jobs. A recent BBC story carried this ominous quote – “Workers that don’t work with AI are going to find their skills [become] obsolete … it’s imperative to work with AI to stay employed.” Amidst all these “change or you shall be replaced” warnings, it’s easy to empathize for Steven Schwartz, a New York lawyer who relied on ChatGPT for research, only to realize later that “six of the submitted cases appear to be bogus judicial decisions with bogus quotes and bogus internal citations.”

AI systems such as ChatGPT and Google Bard no doubt possess remarkable capabilities in terms of speed, information processing, natural language understanding, and responsive communication. However, it is less clear just how good they are for research. In this post, we delve into this specific question, and try to use data to evaluate their performance within the specific context of the Southeast Asian e-commerce landscape.

Beyond the Surface: Analyzing ChatGPT and Bard’s ecommerce knowledge

Every day, our analysts research news websites, company reports, and government publications for any new announcements or information relating to e-commerce in Asia.

To gauge the performance of ChatGPT and Bard in research, we conducted an assessment using 50 data points related to the Southeast Asian e-commerce market, including the gross merchandise value (GMV) of different countries and categories. To ensure fairness, all the data points queried were for the year 2020, considering ChatGPT’s knowledge limitation up until September 2021.

The results were tagged in three simple buckets:

Green – reliability; AI was able to generate answers and point to real sources where the data existed
Yellow – hallucinations*; AI either quoted a wrong data point or invented a source that doesn’t exist
Gray – honesty; AI acknowledged that it did not know the answer

* Hallucinations refer to the creation of nonexistent sources and information, where the AI fabricates facts and details instead of admitting its lack of knowledge. Within our research we saw 2 types of hallucinations i) source provided but data not found ii) fabricated data point with no source provided.

The ideal color mix would have been green and gray – for there can only be 2 possibilities, either something is available, or it isn’t. And yet, as highlighted in the chart above, both models provided many manufactured answers.

Truths vs Fiction: Insights reveal reliability gaps and hallucination galore

What we observed:

The findings revealed that both ChatGPT and Bard had accuracy rates below 20%, indicating significant room for improvement. The results did however improve between May and June, suggesting the teams behind the models are hard at work trying to improve them over time.
ChatGPT’s honesty showed improvement as it refused to provide answers in 50% of the cases, up from 32% in May. This suggests that the company may be taking steps to reduce hallucinations. However, there was a noticeable negative trend in reliability, with the share of reliable (green) answers dropping from 14% to 4%.
As for Bard we observed that the AI never indicated that it doesn’t know an answer, hallucinating >80% of the time. Furthermore, in 50% of cases, Bard reported data without any source. Bard’s warning about its accuracy upon sign-up is spot on – the high volume of fabrication undermines its suitability as a research tool for now.

We’ve highlighted one particular egregious example of hallucinations below. We asked Google Bard to look through YouTube transcripts to see if there is any data available about Lazada, and it referred to a video interview with the Lazada Philippines CEO that it claimed was uploaded on 28 May 2023, had over 1,000 views, and at least 3 comments.

Remarkably however, all 3 of these data points, which could all be easily verified, were incorrect – the interview had happened a full year ago (on 04 April 2022), has fewer than 448 views, and only 1 comment!

Decoding the trends: Our theory of factors behind AI performance challenges

In our quest to understand the factors contributing to the observed results, we identified three potential reasons:

Generative Nature of Language Models: ChatGPT and Bard, being language models, are primarily trained as a generative tool and do not possess the ability to differentiate between fact and fiction. Simon Willison, a software developer, explains that large language models rely on statistical probability from their training data to merely predict the next word, which can lead to confabulation. Benj Edwards’ article on why AI models hallucinate is also a good read.
Overfitting: Overfitting is a common issue in machine learning, where a model becomes excessively tailored to the training data, making it difficult to generalize to new or unseen data. This phenomenon could contribute to the inconsistency and lack of accuracy observed in AI systems.
Lack of Contextual Understanding: AI systems currently struggle with contextual understanding, which encompasses elements like common sense, nuanced details, emotions, social dynamics, and human behavior. These limitations hinder their ability to accurately interpret and predict outcomes, particularly in areas that require a comprehensive understanding. Ted Chiang’s New Yorker piece gives a great insight into the limitations of AI.

Final word: Generative AI tools can’t replace human researchers just yet

Our research indicates that popular generative AI models suffer from extensive hallucinations, providing confident yet fabricated information. This makes them hard to rely on and forces analysts to exercise plenty of caution and skepticism when using them for e-commerce research.

Although generative AI continues to evolve rapidly, it is not yet a substitute for human analysts and researchers. Just how much longer would that continue to be the case? On that question our guess is probably as good as yours.

Products

Astro - Online Category Tracking

Tradewinds

Technology

Why Cube?

Data Collection

Data Security

Product Tagging

Insights

Cube Pulse - Articles

Research reports

E-commerce Glossary

E-commerce Platform Take-rate tracker

E-commerce Category Tree

OSCX index

Community

Shopper Panel

Seller Panel

Company

About us

Our Team

Careers

Contact us

Cube in the news

Security

Trust Center

AI reality check: Can ChatGPT and Google Bard produce reliable e-commerce insights?

Context: Breakthrough capabilities, but risks of misinformation

Beyond the Surface: Analyzing ChatGPT and Bard’s ecommerce knowledge

Truths vs Fiction: Insights reveal reliability gaps and hallucination galore

Decoding the trends: Our theory of factors behind AI performance challenges

Final word: Generative AI tools can’t replace human researchers just yet

Related Articles

E-commerce, Lazada, Shopee, TikTok Shop

April 20, 2026

E-commerce in Philippines: Market Size, Platforms & Trends 2026 | Cube

Shopee

January 13, 2026

LLMs at Cube: Google Gemini leads as of end-2025, but the race is not yet over

TikTok Shop

February 24, 2025

The best is yet to come: TikTok Shop’s pivotal 2024, and what’s in store for 2025

Become part of our e-commerce community

Join 2,000+ other leaders and experts to stay informed about the latest news, insights, and updates

Your message has been successfully sent

We appreciate that you’ve taken the time to write us. We’ll get back to you very soon. Please come back and see us often.