Skip to content

pegasi-ai/shield

Repository files navigation

🛡️ Pegasi Shield

License Python 3.7+ Code style: black PyPI - Python Version Downloads ICML 2025

plot

A lightweight safety and reliability layer for large‑language‑model (LLM) applications.


Overview

Pegasi Shield sits between your application and any LLM (OpenAI, Claude, local models, etc.).
It inspects every prompt and response, blocks or edits unsafe content, and logs decisions for auditing—all with minimal latency and no data egress.

🔬 Research: FRED

Pegasi Shield’s hallucination module is powered by FRED — Financial Retrieval‑Enhanced Detection & Editing. The method was peer‑reviewed and accepted to the ICML 2025 Workshop. Code, evaluation harness and demo notebooks are in fred/.

Open ICML Streamlit Demo

Read the paper on OpenReview


🔧 Key capabilities

Area What Shield provides
Prompt security Detects and blocks prompt injections, role hijacking, system‑override attempts.
Output sanitisation Removes personal data, hate speech, defamation and other policy violations.
Hallucination controls Scores and rewrites ungrounded text using a 4B parameter model at performance on par with oo3.
Observability Emits structured traces and metrics (OpenTelemetry) for dashboards and alerts.
Deployment Pure‑Python middleware, Docker image, or Helm chart for Kubernetes / VPC installs.

⚡ Quick start

*Coming July 18th

pip install pegasi-shield
from pegasi_shield import Shield
from openai import OpenAI

client = OpenAI()
shield = Shield()                       # uses default policy

messages = [{"role": "user", "content": "Tell me about OpenAI o3"}]
response = shield.chat_completion(
    lambda: client.chat.completions.create(model="gpt-4.1-mini", messages=messages)
)

print(response.choices[0].message.content)

Shield.chat_completion accepts a callable that runs your normal LLM request. Shield returns the same response object—or raises ShieldError if the call is blocked.


📚 How it works

  1. Prompt firewall — lightweight rules (regex, AST, ML) followed by an optional LLM check.

  2. LLM request — forwards the original or patched prompt to your provider.

  3. Output pipeline

    • heuristics → vector similarity checks → policy LLM
    • optional “Hallucination Lens” rewrite if factuality score is below threshold.
  4. Trace — JSON event with allow/block/edit decision and risk scores.

All stages are configurable via YAML or Python.


Roadmap

  • v0.5 launch (July 18th)
  • LiveKit Agent Tutorial
  • LangGraph Agent Tutorial
  • Fine‑grained policy language
  • Streaming output inspection
  • JavaScript/TypeScript SDK

Contributing

Issues and pull requests are welcome. See CONTRIBUTING.md for details.


License

Apache 2.0

::contentReference[oaicite:0]{index=0}

Packages

No packages published

Contributors 2

  •  
  •  

Languages