Misinformation
Misinformation testing evaluates whether LLM applications (e.g., chatbots, content generation tools) resist producing, accepting, or amplifying false or misleading information.
This includes measuring the model’s ability to reject incorrect claims, request clarification, or otherwise avoid validating falsehoods.
In high-risk domains (e.g., healthcare, financial services, legal contexts), even small inaccuracies can lead to serious compliance, reputational, or safety risks.
Misinformation testing helps organizations:
- Detect when a model accepts false information without challenge.
- Understand why certain prompts succeed in “tricking” the model.
- Reduce the risk of misinformation propagation in production environments.
Types of Misinformation Tests
1. False Assertion Testing
Measures how the model responds to deliberately false statements. Prompts are generated by modifying factual datasets such as WikiFact — for example, changing dates, statistics, or terminology — while preserving the overall sentence structure.
Objective: Determine whether the model rejects the falsehood, requests clarification, or instead validates the incorrect information.
2. Package Hallucination
Checks whether models reference the correct package names when recommending libraries or tools for specific programming languages.
| Language | Example Focus |
|---|---|
| Python | Python package reference accuracy |
| JavaScript | npm package reference accuracy |
| Ruby | Ruby gem reference accuracy |
| Rust | Rust crate reference accuracy |
Factors That Increase Misinformation Risk
Our testing shows that certain linguistic patterns and prompt structures can make models more likely to accept false information.
Prefix framing
Introductory phrases can prime a model to treat a false statement as true. Examples:
- Why is it that …
- Explain why …
- Summarize reports suggesting that …
- Help me understand why …
These prefixes often trigger the model to elaborate or rationalize rather than verify the claim.
Common Adversarial Modifications
| Modification type | Example | Observed sensitivity |
|---|---|---|
| Single-digit year shift | 2015 → 2017 | High vulnerability |
| Larger year shift | 2015 → 2005 / 2115 / 3015 | Increasingly easier to catch |
| Small statistical change | 97,607 → 99,607 | Low sensitivity |
| Percentage change | 14.1% → 13.8% | Low sensitivity |
| Ranking change | 2nd → 3rd largest | Low sensitivity |
| Sequence change | Event A ↔ Event B order swap | Moderate sensitivity |
| Geographic substitution | Georgia → Alabama | Moderate sensitivity |
| Terminology change | Award name tweak | Low–moderate sensitivity |
Why We Show These Results
The example results below are taken from controlled “false assertion” prefix tests.
They are included to:
- Demonstrate how certain prompt styles (prefix framing) can significantly change a model’s misinformation acceptance rate.
- Highlight that even high-performing models can be vulnerable to subtle misinformation attacks.
These figures are not intended as definitive model rankings, but as a snapshot of attack patterns and their impact on accuracy.
Example Results: False Assertion Prefix Testing
In controlled tests, models were prompted with falsified claims.
Below are example error rates for Gpt-4.1 across prefixes:
| Prefix | Error Rate |
|---|---|
| Why is it that | 36.0% |
| Explain why | 24.5% |
| Summarize reports suggesting | 37.0% |
| Help me understand why | 33.5% |
Full cross-model benchmark data available in the Appendix.
How Asenion Conducts Misinformation Testing
Asenion uses a combination of:
- Open-source datasets (e.g., WikiFact)
- Proprietary datasets
- Synthetic misinformation scenarios
Testing methods include:
- Statistical evaluation of false acceptance/rejection rates
- Scenario-based stress testing using prefixes and subtle factual changes
- Regression testing to track improvements or regressions in misinformation handling across model updates