Asenion Assurance for LLM Testing End-user guide (Web interface)
This document is stand-alone and describes only the web interface: installation, Settings (every option the Configuration page offers), how targets and Connect AI System work, each control on New Test Run, and what happens during runs, in Run History, and in the results panels. It does not cover command-line use or automation APIs.
Chooses color theme for the console. The choice is remembered in your browser.
Refresh
Reloads the Run History list from the server (useful after a long run finishes elsewhere or after another user started runs).
API Docs
Opens a new browser tab with the machine-readable OpenAPI documentation for this server (technical reference for integrators).
Sign Out
Ends your signed-in session when login or single sign-on is enabled. If login is off, behavior depends on your build.
Main tabs (primary navigation)
Tab
Purpose
New Test Run
Build and start tests (simple wizard or advanced form).
Run History
List past runs; open a run, rename, rerun, resume (if offered), or delete.
Target Providers
See which provider YAML files exist under providers/; use Connect AI System and Refresh from here too.
Settings
Edit values stored in .env on the server (keys, MLflow, security, optional paths, etc.).
2. Settings (Configuration): every field and button
Open Settings. The page explains that values are saved to a .env file next to the server.
Buttons on the Configuration card
Button
What it does
Connect AI System
Opens the same Connect AI System drawer used on the run form (see Section 4). Use it after saving cloud credentials so the server can list models.
Reload
Fetches the latest values from the server and rebuilds the form (discards unsaved edits in the browser).
Save settings
Writes all fields to .env. A short confirmation appears. Secret fields (API keys, passwords): leave blank to keep the existing saved value; only type a new value when you intend to replace it.
Where are these stored? (expandable)
Shows the path to the .env file on the server.
After Save, many options apply on the next test run without restarting. Login-related and some security options typically require a server restart—the UI may remind you after save.
Azure (fields)
Field label (as shown)
Purpose
Azure API Version
API version string the Azure OpenAI client should send (for example a dated preview string).
Azure API Type
Usually azure for Azure OpenAI–style endpoints.
Azure API Key
Secret key from the Azure portal for your OpenAI resource. Leave blank when saving to keep the current key.
AWS (fields)
Field label
Purpose
AWS Access Key ID
IAM user access key for Amazon Bedrock (and discovery). Optional if the server uses instance role or ~/.aws/credentials.
AWS Secret Access Key
Matching secret. Leave blank to keep current.
AWS Session Token
Optional; for temporary credentials (SSO, assumed role).
AWS Region
Region for Bedrock (for example us-east-1).
Google Cloud (fields)
Field label
Purpose
Google Cloud Project ID
GCP project used for Vertex AI discovery and targets.
Google Cloud / Vertex AI region
Region for Vertex (for example us-central1); may default if left blank.
Service account key file path
Path on the server to a JSON key file. Leave blank if you use pasted JSON below or default application credentials.
Or paste service account JSON
Paste the full JSON key contents; when set, it overrides the file path for server-side auth. Leave blank to keep current secret.
MLflow (fields)
Field label
Purpose
MLflow Tracking URI
URL of your MLflow tracking server (for example http://localhost:5000 or a URL with basic auth).
MLflow Experiment Name
Default experiment name for logged runs.
MLflow Username
If your MLflow server requires HTTP basic auth.
MLflow Password
Password for MLflow auth; leave blank to keep current.
Optional (fields)
Field label
Purpose
Project root
Directory that contains server.py, plugins/, providers/. Normally empty so the server uses its install directory.
Data directory
Root folder for run outputs (default often .asenion). Change if you need another disk or path.
Run timeout (seconds)
Maximum duration before the server stops a run (default often 10800 = 3 hours). You can also set hours on the New Test Run form per run.
Additional provider (file path)
Optional path (absolute or relative to project root) to an extra provider YAML; that target appears in the AI System to Test list.
X-Frame-Options
Controls whether the console may be embedded in an iframe (SAMEORIGIN, DENY, or empty to allow embedding—your admin sets policy).
Frame ancestors (CSP)
When embedding is allowed, which parent sites may frame the app (your admin configures).
Cookie SameSite
lax (default) or none. Use none only when the app is embedded in an iframe on another site (typically requires HTTPS).
Security (fields)
Field label
Purpose
Enable login
When true, the web UI requires username and password. Restart the server after changing.
Admin username
Sign-in name when login is enabled.
Admin password
Sign-in password; leave blank to keep current.
Entra ID tenant ID
Microsoft Entra (Azure AD) tenant or common for multi-tenant “Sign in with Microsoft.”
Entra ID application (client) ID
From the app registration in Azure.
Entra ID client secret
From app registration; leave blank to keep current.
Entra ID redirect URI
Must match the URI registered in Azure; if empty, the app builds one from its own base URL.
Test generation — cloud (fields)
These may appear under a group related to remote / cloud test generation (labels depend on version):
Field label (typical)
Purpose
Remote … generation (toggle)
When on, some composite or jailbreak-style tests can be generated using a vendor cloud API. When off, generation uses your evaluator model only. Applies on the next run.
Disable sharing to … Cloud
When set to disallow sharing, eval configuration and results are not uploaded to a third-party cloud (recommended for sensitive data).
Evaluation tool behavior (fields)
Field label (typical)
Purpose
Disable eval CLI telemetry
Reduces anonymous usage telemetry from the background Node evaluation tool.
Disable eval CLI update check
Stops the tool from checking for newer versions online.
Enable remote test generation
Alternative flag for allowing remote generation APIs for some strategies; default is often off so your own model is used.
Eval CLI account email
Optional; some cloud features of the tool ask for an account email—pre-fill here to avoid prompts.
Persona (fields)
Field label
Purpose
Multiturn language
Language used for persona and multiturn user messages (choices such as Thai, English, Japanese, etc.). Applies on the next persona run.
3. Connecting to the target system
The target is the model or API you test. Technically it is defined by YAML files in providers/ on the server.
Default file providers/target.yaml
id — Internal provider identifier (cloud deployment string).
label — Readable name shown in dropdowns.
config — Usually apiHost, apiKeyEnv (name of an environment variable in .env), and optionally apiKey (inline key; avoid in production).
Target Providers tab
Lists provider configs the server loads from providers/.
Connect AI System and Refresh match the Configuration page: discover models and reload the list.
Help text explains that removing a YAML file from providers/ (on the server) removes it from the list; discovery creates new discovered_*.yaml files.
Connect AI System (drawer)
Opens from New Test Run (next to AI System to Test), Settings, or Target Providers.
Part
What it does
Title
Connect AI System
Intro text
Explains that models come from configured vendors and you can create targets for test runs.
Loading
Loading… while the server queries Azure OpenAI, AWS Bedrock, and Google Cloud (Vertex) in parallel (only sources with valid credentials return lists).
By vendor
Models grouped under each cloud; errors for one vendor appear under that group (for example missing AWS library or wrong region).
Checkboxes
Select one or more models to register as targets.
“N selected”
Count of checked models.
Cancel
Closes the drawer without creating files.
Create target provider(s)
Writes one YAML per selected model (names like discovered_<slug>.yaml) into providers/ and refreshes the target dropdown. Disabled until at least one model is selected.
Close (X)
Same as cancel.
Click outside the drawer
Closes the drawer (same as Cancel).
Prerequisite: Save Azure, AWS, and/or Google fields under Settings first, then open Connect AI System.
Advanced form: section “4. Target”
Control
What it does
AI System to Test (dropdown)
Chooses which provider YAML is the run’s target.
Connect AI System
Opens the drawer above.
Show Details ▼
Toggles a read-only view of the selected provider’s YAML (for verification).
Hint under dropdown
Reminds that the default config often lives in providers/target.yaml.
4. New Test Run: Simple wizard and Advanced
At the top of New Test Run, choose mode:
Mode
Audience
What you see
Simple wizard
Guided flow
Four steps: What to test → Options → How many → Start.
Advanced
Full control
Full form with tabs, sample-size tools, adaptive mode, and all persona options.
Simple wizard — step bar (clickable)
You can click What to test, Options, How many, or Start to jump to that step (when allowed).
Step 1 — What to test
Target for wizard (dropdown at top when shown): pick which AI system this wizard run uses (same idea as AI System to Test in Advanced).
OWASP Agentic: informational text that the full ASI suite runs (no narrow options).
Back / Next — navigate steps.
Step 3 — How many
Cards for run size (examples: Smoke test, Light pass, Balanced, Find vulnerabilities (recommended)). The recommended option ties to a larger, confidence-oriented sample.
Back / Next.
Step 4 — Start
Shows a summary of your choices.
Back — edit earlier steps. Start test — submits the wizard run.
Persona-based multi-turn fairness runs with demographic variation.
Section 1 — Test type
Same choice as wizard tabs: Red Team vs Multiturn Fairness (controls which panel below is active).
Section 2 — System context
Control
What it does
Generic
Runs without tailoring to a specific product description.
Auto-detect
The workflow asks the target to infer a system purpose for more targeted tests.
Manual
Shows a text box: Describe your system — your text steers test generation toward your domain.
Section 3 — Configuration (Red Team tab)
Control
What it does
Testing Framework (radio grid)
OWASP Top 10 for LLM, OWASP Top 10 for Agentic AI, ISO/IEC 42001, EU AI Act.
Category (Optional)
Dropdown; options depend on framework (security, bias, PII, EU categories, etc.). Hint text under the box explains the current framework’s categories.
Tests Per Category
Number input (limits like 1–500). The expected total block below may estimate how many tests will run given plugins and categories.
Adaptive Mode (checkbox)
Re-runs successful attacks across rounds until a steady state; uses more time and API calls. When checked, Steady State Rounds appears (how many consecutive “good” rounds mean convergence).
Sample size calculator (expandable)
Goal: Estimate pass rate vs Confidence we’ve found (nearly) all vulnerabilities; confidence level; margin or max failure rate; Use recommended copies the suggested count into Tests Per Category; Recalculate refreshes the math.
Single demographic attribute that changes across tests (race, sex, age, religion, …).
Number of turns
Length of each conversation (1–20).
Tests per variant
Optional; per-variant replication count.
Number of Personas
Total tests; may auto-adjust when variants change.
Designated persona
Optional free text (for example “frustrated elderly customer”); base persona for scenarios.
Parity metrics to run
Checkboxes for metrics (assistant turns, user turns, questions, sentiment, encouragement, follow-up depth, turn length ratio, pass rate). Only selected metrics appear in results.
Use memory-backed simulation
Richer, more human-like simulated user behavior when enabled.
Section 5 — Target
Same as Section 4 above: dropdown, Connect AI System, Show Details.
Section 6 — Run options
Control
What it does
Run timeout (hours)
Stops the run after this duration (range such as 1–168 hours; default often 3). Applies to both Red Team and Multiturn Fairness.
Submit
Control
What it does
Start Test Run
Validates the form and starts the run. The Current Run panel appears and streams progress.
5. During and after a run
Current Run panel (appears while a run exists or is selected)
Control
What it does
Show technical output (terminal icon)
Toggles a more verbose technical log stream in the console area. Hover text: “Show Technical Output.”
Show Full Log (checkbox)
When on, the console shows the full log stream; when off, a filtered/summary view may hide noisy lines.
Cancel
Requests cancellation of the running job (may take a moment; not all stages stop instantly).
Tests / Passed / Failed
Live or final counters when the server supplies them.
Progress bar and %
Overall progress estimate.
Reassurance / time messages
Short status lines during long phases (for example test generation).
Console area
Scrolling log output for the run.
When the run finishes (or you open a completed run), Run details may expand:
Control
What it does
Result files
Lists JSON (and other) outputs with per-file pass/fail counts when available.
View full details
Opens the Run Results side drawer with full stats and file list.
Rerun
Starts a new run with the same configuration as this one.
Resume
Shown for some failed runs (for example timeout): reruns only incomplete or errored tests.
Validate responses
Opens human validation for this run (see below).
Generate report
Opens or builds a report view for this run (behavior depends on your deployment).
Run History tab
Control
What it does
Refresh
Reloads the list of runs.
Run count
Shows how many runs are listed.
Pagination
If many runs exist, move between pages.
Run row (click main area)
Opens that run in Current Run (same status view as live runs).
Status badge
running, completed, failed, cancelled, etc.
Resume (play icon, failed only)
Same as Resume in Current Run.
Rerun (refresh icon)
Same as Rerun.
Rename (pencil)
Opens Rename test run: type a display name, Save or Cancel.
Delete (trash)
Opens Delete test run? — Cancel or Delete (permanent).
Run Results drawer (from “View full details” or equivalent)
Control
What it does
Rename (pencil in header)
Same rename dialog.
Close (X)
Closes the drawer.
Tests / Passed / Failed / Pass Rate
Summary for the run.
Result Files
List with per-file pass/fail when available.
Run Information
Metadata (framework, times, flags, etc.).
Validate Responses
Opens validation UI for this run.
View Report
Opens detailed report view when implemented.
Upload to Test Repository
Opens upload flow (often MLflow- or platform-linked); choose or type an experiment name, then Upload or Cancel.
Close
Closes the drawer.
Validate AI Responses drawer
Area
What it does
Intro
Explains that you mark each response appropriate/safe or inappropriate/unsafe to align results with human judgment.
Total / Pass / Fail / Pending
Validation progress counts.
Validation coverage bar
Share of tests you have reviewed.
Upload to Test Repository
Sends validated results upstream (enabled when your workflow allows).
Generate Accuracy Report
Builds an accuracy-style report from validation (enabled when enough data exists).
All tests / Failure Analysis
Switches list view; Failure Analysis may show category breakdown.
Human Validation filters
All, Pending, Pass, Fail for your labels.
Result filters
All, Pass, Fail for automated test outcome.
Category tags
Filter by test category.
Select all visible / Clear selection
Bulk select rows in the list.
Mark as Pass / Mark as Fail
Apply your judgment to selected items.
Per-test rows
Open each test to mark valid/invalid and add notes (exact layout depends on version).
Close
Closes validation.
6. How to interpret test results
Red-team mindset
Tests are adversarial. A failed automated test often means the probe succeeded at eliciting a problematic response. Treat failures as findings to triage, not necessarily as defects in the console itself.
Numbers in the UI
Passed / Failed reflect how the automated grader scored each test against expectations.
Pass rate is a simple ratio; your organization may set a policy threshold for reporting.
Adaptive or multi-round runs can produce multiple JSON files; totals may differ from a single-round mental model—use the primary or consolidated file the UI highlights.
Human validation
If you use Validate Responses, your Pass/Fail choices can override or refine purely automated scoring for audit and reporting. Upload to Test Repository pushes that view of results to your organization’s central store when configured.
Files on disk
Runs are stored under your configured data directory (often .asenion/…). Typical artifacts: JSON results, logs, and sometimes HTML reports. Administrators handle backup and retention.
When something looks wrong
Symptom
What to check
Many failures
Target and evaluator credentials, quotas, content filters blocking security prompts.
No tests generated
Settings for remote generation vs local evaluator; rate limits; model refusals.
Empty or stuck run
Run timeout; disk space; Cancel and inspect logs; try Resume if offered.
Counts don’t match
Adaptive rounds or multiple JSON merges—use the file list in Run details or Run Results.