3 Ways Particle Detects and Avoids Hallucinations in News Summaries
With LLMs prone to error, Particle’s use of AI prioritizes accuracy above all. With multi-step verification processes that use the best available LLMs, Particle minimizes the chances of inaccurate news summaries.
Even the most advanced Large Language Models (LLMs) occasionally produce errors — or “hallucinations.” At Particle, we observe these kinds of errors occur in only about 1% of outputs, but that’s still unacceptable for news. When users depend on Particle to stay informed, even a small error rate could mean dozens of potential misinformation incidents daily.
These kinds of inaccuracies contribute to increasing skepticism of how AI will affect their daily lives. According to Pew Research Center, as of 2023, 52% of Americans are more concerned than excited about an AI-powered future.
Given the nature of some of these errors, the skepticism is understandable. This problem can be made worse for a few reasons:
- The specific model being used (as some are more prone to hallucinations than others for a given task)
- When generations are displayed directly to users without any measures to verify accuracy.
To address this, Particle:
- Continuously evaluates different models to identify which ones have the lowest error rates for our tasks
- Adds a multi-step verification process between LLM outputs and our users
- Maintains an audit log of all verification steps for human review
Using these 3 measures, Particle reduces the chance of a user seeing an error from about 1 in 100 to 1 in 10,000.
What types of mistakes do LLMs make about the news?
We see the following types of errors in about 1% of raw LLM outputs when summarizing the news*:
- Factual Errors: Factually incorrect information, like stating “Barack Obama was the 47th President”, when he was actually the 44th.
- Fabrications: Plausible but false details added to otherwise accurate stories. For example, a crime report mentions a police reward for offering information, where in reality none of the cited sources reported such a reward.
- Consensus mismatch: Sometimes, news coverage doesn’t agree on how to interpret an event. Picking one interpretation over another can be misleading. Instead, the summary should make it clear that there are multiple perspectives.
- Nuanced Errors: Subtle mischaracterizations that can distort a story’s meaning or implications. There’s a detailed example of this below.
The hallucinations people often talk about are obvious mistakes or made-up facts. However, in our experience, these are rare. While we catch and reject all 4 types of errors, almost all of them are the subtle, misleading details that look like reasonable interpretations but don’t have enough evidence to back them up.
In a period of unprecedented polarization, it’s really important to get this right.
Particle’s 3-Part Approach to Eliminating Hallucinations
1. Selecting models with the highest accuracy
In general, when selecting an AI model, you need to balance the tradeoffs between cost, accuracy, and speed. At Particle, we predominantly use the most accurate models available for our tasks, which comes at a cost because they tend to be the most expensive. For news, accuracy is non-negotiable.
This approach also means our summaries might take minutes rather than seconds to generate — but that’s a trade-off we accept. And, since we summarize thousands (not millions or billions) of stories every day, our costs remain manageable.
2. Multi-Step Verification and Regeneration
Our verification process has 2 key components:
Evaluation checks: These checks, commonly known as “evals”, confirm the AI satisfied all the instructions for performing the task. We won’t focus here in this post, but evals measure things like minimizing duplicative information, adherence to our style guide, etc.
Reality checks: These checks require that all of the claims in the summary remain faithful to the source material, even with respect to nuances, as described earlier. At every step, the AI is required to “show its work” and explain why it has made its assessments, with evidence and reasoning.
Below is an example of Particle’s Reality Check in action.
Step 1: Particle generates a headline for a given story.
Generated Headline:
Los Angeles Wildfires Highlight Tensions Over Fire Department Budget Cuts
Step 2: Particle takes the generated headline and runs a Reality Check against the sources used to generate it.
Reality Check evaluation: Fail
Reasoning:
The evidence provided from the articles supports the claims
made in the summary regarding tensions over budget cuts and their
impact on the fire department's ability to respond to wildfires.
Crowley's criticism of the budget cuts and the acknowledgment of
limited training and response capabilities due to reduced funding
are directly referenced. However, the articles also highlight that
despite these cuts, the overall budget for the fire department increased,
which complicates the narrative of a purely negative impact from the
budget reductions. Therefore, while the headline captures the essence
of the situation, it may oversimplify the complexities involved.
The Reality Check also provides citations for each claim, along with an explanation of why the claim is or isn’t supported by the evidence.
Reality Check citation for the problematic claim:
Source:
https://www.forbes.com/sites/tylerroush/2025/01/11/
did-budget-cuts-affect-emergency-response-to-la-wildfires-
what-to-know-about-funding-to-citys-fire-department/
Excerpt:
Despite the budget cuts, the city approved an additional
$53 million in pay raises for firefighters and another $58 million
for new firetrucks and other equipment in November, increasing
the LAFD's budget by more than 7% over the previous fiscal year.
Explanation:
This excerpt provides context that, despite the budget cuts,
the overall budget for the fire department increased, which counters
the narrative that budget cuts severely impacted the department's
capabilities.
Step 3: If any claim is problematic, Particle regenerates the headline, providing the reasoning for why the previous Reality Check failed to avoid making the same mistake again.
New headline:
Los Angeles Wildfires Intensify Scrutiny of Fire Department Budget and
Leadership
Step 4: Particle performs the Reality Check again, and provides an explanation on why it passes this time.
Reality Check evaluation: Pass
Reasoning:
The failed verification report highlighted that the previous summary
oversimplified the impact of budget cuts by not acknowledging the
increase in the overall budget for the fire department. The new output
was evaluated to see if it addressed this issue by providing a balanced
view that includes both the budget cuts and the increase in overall
budget. [..] The headline mentions scrutiny of the budget and leadership,
suggesting a more nuanced view. [..] This indicates that the new output
does not oversimplify the situation and provides a balanced perspective,
thus addressing the specific issue identified in the failed verification
report. Additionally, the new output does not repeat any unsupported
claims from the previous summary, as it includes information about the
increase in the overall budget, providing a more comprehensive view of
the situation.
The new headline passes the Reality Check. Although the difference between the new headline and the originally generated one is nuanced, even these subtleties are important to maintain accurate reporting.
For brevity, we’ve shown an example related to a headline, but in practice the Reality Checks are performed for headlines, subheads, bullet point summaries, and Q&A. Reality checks are performed on all Particle stories before they reach your feed.
Of course, the LLM used to perform Reality Checks and detect errors can also make mistakes. Who watches the watchman? Humans!
3. Transparent Error Tracking for Human Oversight
When Particle “shows its work” with Reality Checks, it creates a transparent digital paper trail, complete with citations that humans review. Some of these results are also visible in the Particle app for users to see: Tap on any bullet point summary to see a selection of citations that confirm the claims being made.
Users can report anything that seems off or inaccurate on any story. These reports trigger immediate reviews by our engineering team and editorial staff, who provide an additional human layer of oversight. The paper trail contains additional context to our team to confirm or fix the errors that users spot.
Companies using AI can do a lot more to reduce hallucinations, and they should
While AI gives us powerful new ways to understand the news, we must ensure its insights are reliable and trustworthy. We can’t completely eliminate errors in language models yet, but we’ve developed robust methods to detect and minimize them. We’ve made great progress in reducing these errors, and while the results are encouraging, this is just the beginning. We’re actively working on additional measures to bring those numbers down further, and reducing the number of errors to 1 in 100,000 is in sight.
Download Particle today on the iOS App Store 📲
*Addressing hallucinations is different from fact-checking. Fixing hallucinations ensures the output remains true to the source material, while fact-checking confirms the content is accurate. At Particle, we use high quality, reputable news sources and human reviews to avoid misinformation. This is a top priority, and we’re dedicated to improving it further.