Asenion Assurance for LLM Testing End-user guide (Web interface)

This document is stand-alone and describes only the web interface: installation, Settings (every option the Configuration page offers), how targets and Connect AI System work, each control on New Test Run, and what happens during runs, in Run History, and in the results panels. It does not cover command-line use or automation APIs.


Contents

  1. Web layout: header and main tabs
  2. Settings (Configuration): every field and button
  3. Connecting to the target system
  4. New Test Run: Simple wizard and Advanced
  5. During and after a run
  6. How to interpret test results

1. Web layout: header and main tabs

Top header (every control)

Control What it does
Theme (Light / Dark / System) Chooses color theme for the console. The choice is remembered in your browser.
Refresh Reloads the Run History list from the server (useful after a long run finishes elsewhere or after another user started runs).
API Docs Opens a new browser tab with the machine-readable OpenAPI documentation for this server (technical reference for integrators).
Sign Out Ends your signed-in session when login or single sign-on is enabled. If login is off, behavior depends on your build.

Main tabs (primary navigation)

Tab Purpose
New Test Run Build and start tests (simple wizard or advanced form).
Run History List past runs; open a run, rename, rerun, resume (if offered), or delete.
Target Providers See which provider YAML files exist under providers/; use Connect AI System and Refresh from here too.
Settings Edit values stored in .env on the server (keys, MLflow, security, optional paths, etc.).

2. Settings (Configuration): every field and button

Open Settings. The page explains that values are saved to a .env file next to the server.

Buttons on the Configuration card

Button What it does
Connect AI System Opens the same Connect AI System drawer used on the run form (see Section 4). Use it after saving cloud credentials so the server can list models.
Reload Fetches the latest values from the server and rebuilds the form (discards unsaved edits in the browser).
Save settings Writes all fields to .env. A short confirmation appears. Secret fields (API keys, passwords): leave blank to keep the existing saved value; only type a new value when you intend to replace it.
Where are these stored? (expandable) Shows the path to the .env file on the server.

After Save, many options apply on the next test run without restarting. Login-related and some security options typically require a server restart—the UI may remind you after save.

Azure (fields)

Field label (as shown) Purpose
Azure API Version API version string the Azure OpenAI client should send (for example a dated preview string).
Azure API Type Usually azure for Azure OpenAI–style endpoints.
Azure API Key Secret key from the Azure portal for your OpenAI resource. Leave blank when saving to keep the current key.

AWS (fields)

Field label Purpose
AWS Access Key ID IAM user access key for Amazon Bedrock (and discovery). Optional if the server uses instance role or ~/.aws/credentials.
AWS Secret Access Key Matching secret. Leave blank to keep current.
AWS Session Token Optional; for temporary credentials (SSO, assumed role).
AWS Region Region for Bedrock (for example us-east-1).

Google Cloud (fields)

Field label Purpose
Google Cloud Project ID GCP project used for Vertex AI discovery and targets.
Google Cloud / Vertex AI region Region for Vertex (for example us-central1); may default if left blank.
Service account key file path Path on the server to a JSON key file. Leave blank if you use pasted JSON below or default application credentials.
Or paste service account JSON Paste the full JSON key contents; when set, it overrides the file path for server-side auth. Leave blank to keep current secret.

MLflow (fields)

Field label Purpose
MLflow Tracking URI URL of your MLflow tracking server (for example http://localhost:5000 or a URL with basic auth).
MLflow Experiment Name Default experiment name for logged runs.
MLflow Username If your MLflow server requires HTTP basic auth.
MLflow Password Password for MLflow auth; leave blank to keep current.

Optional (fields)

Field label Purpose
Project root Directory that contains server.py, plugins/, providers/. Normally empty so the server uses its install directory.
Data directory Root folder for run outputs (default often .asenion). Change if you need another disk or path.
Run timeout (seconds) Maximum duration before the server stops a run (default often 10800 = 3 hours). You can also set hours on the New Test Run form per run.
Additional provider (file path) Optional path (absolute or relative to project root) to an extra provider YAML; that target appears in the AI System to Test list.
X-Frame-Options Controls whether the console may be embedded in an iframe (SAMEORIGIN, DENY, or empty to allow embedding—your admin sets policy).
Frame ancestors (CSP) When embedding is allowed, which parent sites may frame the app (your admin configures).
Cookie SameSite lax (default) or none. Use none only when the app is embedded in an iframe on another site (typically requires HTTPS).

Security (fields)

Field label Purpose
Enable login When true, the web UI requires username and password. Restart the server after changing.
Admin username Sign-in name when login is enabled.
Admin password Sign-in password; leave blank to keep current.
Entra ID tenant ID Microsoft Entra (Azure AD) tenant or common for multi-tenant “Sign in with Microsoft.”
Entra ID application (client) ID From the app registration in Azure.
Entra ID client secret From app registration; leave blank to keep current.
Entra ID redirect URI Must match the URI registered in Azure; if empty, the app builds one from its own base URL.

Test generation — cloud (fields)

These may appear under a group related to remote / cloud test generation (labels depend on version):

Field label (typical) Purpose
Remote … generation (toggle) When on, some composite or jailbreak-style tests can be generated using a vendor cloud API. When off, generation uses your evaluator model only. Applies on the next run.
Disable sharing to … Cloud When set to disallow sharing, eval configuration and results are not uploaded to a third-party cloud (recommended for sensitive data).

Evaluation tool behavior (fields)

Field label (typical) Purpose
Disable eval CLI telemetry Reduces anonymous usage telemetry from the background Node evaluation tool.
Disable eval CLI update check Stops the tool from checking for newer versions online.
Enable remote test generation Alternative flag for allowing remote generation APIs for some strategies; default is often off so your own model is used.
Eval CLI account email Optional; some cloud features of the tool ask for an account email—pre-fill here to avoid prompts.

Persona (fields)

Field label Purpose
Multiturn language Language used for persona and multiturn user messages (choices such as Thai, English, Japanese, etc.). Applies on the next persona run.

3. Connecting to the target system

The target is the model or API you test. Technically it is defined by YAML files in providers/ on the server.

Default file providers/target.yaml

  • id — Internal provider identifier (cloud deployment string).
  • label — Readable name shown in dropdowns.
  • config — Usually apiHost, apiKeyEnv (name of an environment variable in .env), and optionally apiKey (inline key; avoid in production).

Target Providers tab

  • Lists provider configs the server loads from providers/.
  • Connect AI System and Refresh match the Configuration page: discover models and reload the list.
  • Help text explains that removing a YAML file from providers/ (on the server) removes it from the list; discovery creates new discovered_*.yaml files.

Connect AI System (drawer)

Opens from New Test Run (next to AI System to Test), Settings, or Target Providers.

Part What it does
Title Connect AI System
Intro text Explains that models come from configured vendors and you can create targets for test runs.
Loading Loading… while the server queries Azure OpenAI, AWS Bedrock, and Google Cloud (Vertex) in parallel (only sources with valid credentials return lists).
By vendor Models grouped under each cloud; errors for one vendor appear under that group (for example missing AWS library or wrong region).
Checkboxes Select one or more models to register as targets.
“N selected” Count of checked models.
Cancel Closes the drawer without creating files.
Create target provider(s) Writes one YAML per selected model (names like discovered_<slug>.yaml) into providers/ and refreshes the target dropdown. Disabled until at least one model is selected.
Close (X) Same as cancel.
Click outside the drawer Closes the drawer (same as Cancel).

Prerequisite: Save Azure, AWS, and/or Google fields under Settings first, then open Connect AI System.

Advanced form: section “4. Target”

Control What it does
AI System to Test (dropdown) Chooses which provider YAML is the run’s target.
Connect AI System Opens the drawer above.
Show Details ▼ Toggles a read-only view of the selected provider’s YAML (for verification).
Hint under dropdown Reminds that the default config often lives in providers/target.yaml.

4. New Test Run: Simple wizard and Advanced

At the top of New Test Run, choose mode:

Mode Audience What you see
Simple wizard Guided flow Four steps: What to test → Options → How many → Start.
Advanced Full control Full form with tabs, sample-size tools, adaptive mode, and all persona options.

Simple wizard — step bar (clickable)

You can click What to test, Options, How many, or Start to jump to that step (when allowed).

Step 1 — What to test

Target for wizard (dropdown at top when shown): pick which AI system this wizard run uses (same idea as AI System to Test in Advanced).

Cards (pick one):

Card Meaning
OWASP Top 10 for LLM Classic LLM security tests (injection, leaks, etc.).
OWASP Top 10 for Agentic AI Tests aimed at agentic apps (ASI-style categories).
Bias & fairness Fairness-focused probes.
Multiturn Fairness Conversations with varied user demographics.
ISO/IEC 42001 AI management system–style compliance checks.
EU AI Act EU regulatory–oriented checks.

Next — goes to step 2.

Step 2 — Options

Content depends on the card chosen in step 1:

  • OWASP LLM: Focus area dropdown — All areas, Basic security, Prompt injection, Data / PII leaks, Security, Prompt extraction, etc.
  • Multiturn Fairness: Vary by (demographic dimension), Conversation length (short / medium / long turns).
  • Bias & fairness: optional Category (All, Fairness, Privacy, Robustness).
  • ISO/IEC 42001: optional Category (All, Fairness, Privacy, Robustness).
  • EU AI Act: optional Category (risk classification, oversight, transparency, bias, PII, etc.).
  • Agentic (wizard): optional narrow categories (loops, tool misuse, sandbox, …).
  • OWASP Agentic: informational text that the full ASI suite runs (no narrow options).

Back / Next — navigate steps.

Step 3 — How many

Cards for run size (examples: Smoke test, Light pass, Balanced, Find vulnerabilities (recommended)). The recommended option ties to a larger, confidence-oriented sample.

Back / Next.

Step 4 — Start

Shows a summary of your choices.

Back — edit earlier steps.
Start test — submits the wizard run.


Advanced mode — full form

Mode tabs inside the form

Tab Purpose
Red Team Security and compliance red-team runs (framework + category + counts + adaptive + sample calculator).
Multiturn Fairness Persona-based multi-turn fairness runs with demographic variation.

Section 1 — Test type

Same choice as wizard tabs: Red Team vs Multiturn Fairness (controls which panel below is active).

Section 2 — System context

Control What it does
Generic Runs without tailoring to a specific product description.
Auto-detect The workflow asks the target to infer a system purpose for more targeted tests.
Manual Shows a text box: Describe your system — your text steers test generation toward your domain.

Section 3 — Configuration (Red Team tab)

Control What it does
Testing Framework (radio grid) OWASP Top 10 for LLM, OWASP Top 10 for Agentic AI, ISO/IEC 42001, EU AI Act.
Category (Optional) Dropdown; options depend on framework (security, bias, PII, EU categories, etc.). Hint text under the box explains the current framework’s categories.
Tests Per Category Number input (limits like 1–500). The expected total block below may estimate how many tests will run given plugins and categories.
Adaptive Mode (checkbox) Re-runs successful attacks across rounds until a steady state; uses more time and API calls. When checked, Steady State Rounds appears (how many consecutive “good” rounds mean convergence).
Sample size calculator (expandable) Goal: Estimate pass rate vs Confidence we’ve found (nearly) all vulnerabilities; confidence level; margin or max failure rate; Use recommended copies the suggested count into Tests Per Category; Recalculate refreshes the math.

Section 4 — Configuration (Multiturn Fairness tab)

Control What it does
Vary feature Single demographic attribute that changes across tests (race, sex, age, religion, …).
Number of turns Length of each conversation (1–20).
Tests per variant Optional; per-variant replication count.
Number of Personas Total tests; may auto-adjust when variants change.
Designated persona Optional free text (for example “frustrated elderly customer”); base persona for scenarios.
Parity metrics to run Checkboxes for metrics (assistant turns, user turns, questions, sentiment, encouragement, follow-up depth, turn length ratio, pass rate). Only selected metrics appear in results.
Use memory-backed simulation Richer, more human-like simulated user behavior when enabled.

Section 5 — Target

Same as Section 4 above: dropdown, Connect AI System, Show Details.

Section 6 — Run options

Control What it does
Run timeout (hours) Stops the run after this duration (range such as 1–168 hours; default often 3). Applies to both Red Team and Multiturn Fairness.

Submit

Control What it does
Start Test Run Validates the form and starts the run. The Current Run panel appears and streams progress.

5. During and after a run

Current Run panel (appears while a run exists or is selected)

Control What it does
Show technical output (terminal icon) Toggles a more verbose technical log stream in the console area. Hover text: “Show Technical Output.”
Show Full Log (checkbox) When on, the console shows the full log stream; when off, a filtered/summary view may hide noisy lines.
Cancel Requests cancellation of the running job (may take a moment; not all stages stop instantly).
Tests / Passed / Failed Live or final counters when the server supplies them.
Progress bar and % Overall progress estimate.
Reassurance / time messages Short status lines during long phases (for example test generation).
Console area Scrolling log output for the run.

When the run finishes (or you open a completed run), Run details may expand:

Control What it does
Result files Lists JSON (and other) outputs with per-file pass/fail counts when available.
View full details Opens the Run Results side drawer with full stats and file list.
Rerun Starts a new run with the same configuration as this one.
Resume Shown for some failed runs (for example timeout): reruns only incomplete or errored tests.
Validate responses Opens human validation for this run (see below).
Generate report Opens or builds a report view for this run (behavior depends on your deployment).

Run History tab

Control What it does
Refresh Reloads the list of runs.
Run count Shows how many runs are listed.
Pagination If many runs exist, move between pages.
Run row (click main area) Opens that run in Current Run (same status view as live runs).
Status badge running, completed, failed, cancelled, etc.
Resume (play icon, failed only) Same as Resume in Current Run.
Rerun (refresh icon) Same as Rerun.
Rename (pencil) Opens Rename test run: type a display name, Save or Cancel.
Delete (trash) Opens Delete test run?Cancel or Delete (permanent).

Run Results drawer (from “View full details” or equivalent)

Control What it does
Rename (pencil in header) Same rename dialog.
Close (X) Closes the drawer.
Tests / Passed / Failed / Pass Rate Summary for the run.
Result Files List with per-file pass/fail when available.
Run Information Metadata (framework, times, flags, etc.).
Validate Responses Opens validation UI for this run.
View Report Opens detailed report view when implemented.
Upload to Test Repository Opens upload flow (often MLflow- or platform-linked); choose or type an experiment name, then Upload or Cancel.
Close Closes the drawer.

Validate AI Responses drawer

Area What it does
Intro Explains that you mark each response appropriate/safe or inappropriate/unsafe to align results with human judgment.
Total / Pass / Fail / Pending Validation progress counts.
Validation coverage bar Share of tests you have reviewed.
Upload to Test Repository Sends validated results upstream (enabled when your workflow allows).
Generate Accuracy Report Builds an accuracy-style report from validation (enabled when enough data exists).
All tests / Failure Analysis Switches list view; Failure Analysis may show category breakdown.
Human Validation filters All, Pending, Pass, Fail for your labels.
Result filters All, Pass, Fail for automated test outcome.
Category tags Filter by test category.
Select all visible / Clear selection Bulk select rows in the list.
Mark as Pass / Mark as Fail Apply your judgment to selected items.
Per-test rows Open each test to mark valid/invalid and add notes (exact layout depends on version).
Close Closes validation.

6. How to interpret test results

Red-team mindset

Tests are adversarial. A failed automated test often means the probe succeeded at eliciting a problematic response. Treat failures as findings to triage, not necessarily as defects in the console itself.

Numbers in the UI

  • Passed / Failed reflect how the automated grader scored each test against expectations.
  • Pass rate is a simple ratio; your organization may set a policy threshold for reporting.
  • Adaptive or multi-round runs can produce multiple JSON files; totals may differ from a single-round mental model—use the primary or consolidated file the UI highlights.

Human validation

If you use Validate Responses, your Pass/Fail choices can override or refine purely automated scoring for audit and reporting. Upload to Test Repository pushes that view of results to your organization’s central store when configured.

Files on disk

Runs are stored under your configured data directory (often .asenion/…). Typical artifacts: JSON results, logs, and sometimes HTML reports. Administrators handle backup and retention.

When something looks wrong

Symptom What to check
Many failures Target and evaluator credentials, quotas, content filters blocking security prompts.
No tests generated Settings for remote generation vs local evaluator; rate limits; model refusals.
Empty or stuck run Run timeout; disk space; Cancel and inspect logs; try Resume if offered.
Counts don’t match Adaptive rounds or multiple JSON merges—use the file list in Run details or Run Results.

End of end-user guide.


Table of contents