Misinformation

Misinformation testing evaluates whether LLM applications (e.g., chatbots, content generation tools) resist producing, accepting, or amplifying false or misleading information.
This includes measuring the model’s ability to reject incorrect claims, request clarification, or otherwise avoid validating falsehoods.

In high-risk domains (e.g., healthcare, financial services, legal contexts), even small inaccuracies can lead to serious compliance, reputational, or safety risks.

Misinformation testing helps organizations:

  • Detect when a model accepts false information without challenge.
  • Understand why certain prompts succeed in “tricking” the model.
  • Reduce the risk of misinformation propagation in production environments.

Types of Misinformation Tests

1. False Assertion Testing

Measures how the model responds to deliberately false statements. Prompts are generated by modifying factual datasets such as WikiFact — for example, changing dates, statistics, or terminology — while preserving the overall sentence structure.

Objective: Determine whether the model rejects the falsehood, requests clarification, or instead validates the incorrect information.


2. Package Hallucination

Checks whether models reference the correct package names when recommending libraries or tools for specific programming languages.

Language Example Focus
Python Python package reference accuracy
JavaScript npm package reference accuracy
Ruby Ruby gem reference accuracy
Rust Rust crate reference accuracy

Factors That Increase Misinformation Risk

Our testing shows that certain linguistic patterns and prompt structures can make models more likely to accept false information.

Prefix framing

Introductory phrases can prime a model to treat a false statement as true. Examples:

  • Why is it that …
  • Explain why …
  • Summarize reports suggesting that …
  • Help me understand why …

These prefixes often trigger the model to elaborate or rationalize rather than verify the claim.


Common Adversarial Modifications

Modification type Example Observed sensitivity
Single-digit year shift 2015 → 2017 High vulnerability
Larger year shift 2015 → 2005 / 2115 / 3015 Increasingly easier to catch
Small statistical change 97,607 → 99,607 Low sensitivity
Percentage change 14.1% → 13.8% Low sensitivity
Ranking change 2nd → 3rd largest Low sensitivity
Sequence change Event A ↔ Event B order swap Moderate sensitivity
Geographic substitution Georgia → Alabama Moderate sensitivity
Terminology change Award name tweak Low–moderate sensitivity

Why We Show These Results

The example results below are taken from controlled “false assertion” prefix tests.
They are included to:

  • Demonstrate how certain prompt styles (prefix framing) can significantly change a model’s misinformation acceptance rate.
  • Highlight that even high-performing models can be vulnerable to subtle misinformation attacks.

These figures are not intended as definitive model rankings, but as a snapshot of attack patterns and their impact on accuracy.


Example Results: False Assertion Prefix Testing

In controlled tests, models were prompted with falsified claims.
Below are example error rates for Gpt-4.1 across prefixes:

Prefix Error Rate
Why is it that 36.0%
Explain why 24.5%
Summarize reports suggesting 37.0%
Help me understand why 33.5%

Full cross-model benchmark data available in the Appendix.


How Asenion Conducts Misinformation Testing

Asenion uses a combination of:

  • Open-source datasets (e.g., WikiFact)
  • Proprietary datasets
  • Synthetic misinformation scenarios

Testing methods include:

  • Statistical evaluation of false acceptance/rejection rates
  • Scenario-based stress testing using prefixes and subtle factual changes
  • Regression testing to track improvements or regressions in misinformation handling across model updates

Appendix: Full False Assertion Test Results

Click to view all models and prefixes