Securing GenAI Beyond the Model: 10 LLM Attacks and the Case for Governance and Recovery

Ali Salman

7 hours ago

Why This Matters Now

Enterprises are moving beyond chatbots into LLM-powered assistants that can:

Retrieve information from internal repositories (RAG).
Summarize sensitive content.
Create tickets and run workflows.
And most importantly: Take actions through tool integrations (e.g., email, ITSM, IAM, cloud APIs, DevOps pipelines).

That’s where risk changes dramatically.

Traditional application security focuses on code paths and APIs. LLM applications, however, add a second, less predictable layer: Natural-language instructions that can be manipulated, sometimes directly by a user, and sometimes indirectly through content the system retrieves (e.g., documents, web pages, tickets, PDFs, wiki pages).

In practice, GenAI security is no longer “just model safety.” It becomes the intersection of:

Data governance, or what AI can access and return, where platforms like Securiti AI operate (e.g., discovery, classification, entitlements, policy enforcement).
Cyber resilience, or how quickly you can recover when something goes wrong, where platforms like Veeam operate (e.g., immutable backups, clean recovery, rollback, restoration confidence).

This blog breaks down 10 common LLM attacks you should plan for and ends with a final, copy/paste enterprise checklist your teams can use as a release gate.

The New LLM Attack Surface in One Picture

Most LLM incidents aren’t the result of the model becoming “evil”. They’re usually caused by one or more of these realities:

The model treats untrusted text (from users or retrieved docs) as instructions.
The app passes too much context (including sensitive data) into prompts.
Tools and agents are given excessive permissions.
The system treats LLM output as trusted and executes it.
The organization lacks rollback and clean recovery when the AI pipeline or knowledge base is tampered with.

10 Common LLM Attacks and Their Enterprise Impact and Mitigations

1) Prompt Injection (Direct and Indirect)

What it is: An attacker manipulates the model’s behavior using crafted instructions.

Direct injection: The attacker types malicious instructions into the chat.
Indirect injection: Malicious instructions are hidden inside content your system retrieves (e.g., a wiki page, PDF, ticket comment) and then fed to the model via RAG.

Enterprise Impact

Unauthorized disclosure (“summarize the confidential folder”).
Unauthorized actions (“create a user”, “reset MFA”, “send this email”).
Policy bypass (“ignore restrictions and proceed”).

Mitigations

Treat all user input and retrieved content as untrusted.
Keep a strict separation between system instructions and retrieved content in your orchestration logic.
Use tool allowlists and argument validation so the model never gets raw power.
Apply least privilege access at retrieval and tool layers Securiti AI style entitlements mindset.

2) Sensitive Information Disclosure (Data Leakage)

What it is: The model outputs data it shouldn’t because it was in the prompt/context, retrieved without proper authorization, present in logs, or accessible via tools.

Enterprise Impact

PII/PHI/PCI exposure
Secrets leakage (API keys, tokens, credentials)
Confidential IP loss (roadmaps, pricing, M&A material)

Mitigations

Discover and classify sensitive data before connecting it to AI (data governance first).
Enforce retrieval permissions based on identity and entitlement, not “whoever asked the bot”.
Implement output filtering/redaction for regulated classes.
Remove secrets from prompts and agent instructions; use vault-based runtime retrieval instead.

3) Model / Tool / Dependency Supply Chain Attacks

What it is: A compromise enters through third-party components: Models, libraries, plugins, prompt templates, connectors, or datasets.

Enterprise Impact

Backdoored behavior that’s triggered by certain phrases
Compromised connector exfiltrates data
Integrity loss that’s difficult to detect

Mitigations

Maintain an inventory of AI components and versions (e.g., models, embeddings, tools, connectors).
Vet third-party plugins/connectors and enforce scoped permissions.
Version and protect prompt templates/configuration as production assets.
Ensure you can rollback quickly if a dependency is found compromised.

4) Data Poisoning (Training, Fine-Tuning, RAG Corpus)

What it is: Attackers manipulate the data the system learns from or retrieves. This includes training sets, fine-tuning data, feedback loops, or RAG content.

Enterprise Impact

Persistent misinformation like “always recommend insecure steps”
Stealthy policy bypass that’s planted into “trusted” content
Long-lived integrity compromise across many users

Mitigations

Apply provenance controls, including who added/changed content, when, and from where.
Separate high-trust and low-trust sources in RAG ingestion.
Require approvals for updates to authoritative corpora.
Keep immutable versions of corpora and indexes so you can revert to known-good.

5) Improper Output Handling (LLM Output as an Injection Vector)

What it is: Downstream systems treat LLM output as trusted and render it as HTML, execute it as code, or use it to construct database queries or API calls without validation.

Enterprise Impact

XSS or UI injection
Command injection / SQL injection style outcomes
Tool misuse via crafted arguments

Mitigations

Treat LLM output like user input: Validate, escape, sanitize.
Use structured outputs or schemas and server-side verification.
Never execute “generated code” without sandboxing and approval gates.

6) Excessive Agency via Over-Permissioned Agents

What it is: Agents are given broad permissions and autonomy, which turn a single injection or misfire into real-world impact.

Enterprise Impact

Automated destructive changes like deletions or misconfigurations
Rapid propagation into systems like email, IAM, ITSM, cloud
High-speed data exfiltration

Mitigations

Default to bounded autonomy, or approvals for high-risk actions.
Introduce step-up authentication for sensitive operations.
Limit scope and blast radius using short-lived tokens and least-privilege tool permissions.
Log and audit tool calls and decisions and treat agents like privileged identities.

7) System Prompt Leakage (Instruction and Policy Exposure)

What it is: Attackers coax the model to reveal system prompts, tool instructions, or hidden policy logic to make future attacks easier.

Enterprise Impact

Guardrail bypass becomes easier since attackers learn “how it thinks”
Exposure of internal workflows and endpoints
Leakage of proprietary prompt engineering and business logic

Mitigations

Keep secrets out of prompts.
Store sensitive instructions or configurations as protected assets with RBAC and change control.
Reduce prompt verbosity and avoid embedding internal credentials or endpoints in text.
Monitor for repeated attempts to extract system instructions.

8) RAG / Vector Store Weaknesses (Embedding and Retrieval Attacks)

What it is: Attacks targeting the embedding pipeline, vector database, and retrieval logic, especially when multi-tenant or poorly segmented.

Enterprise Impact

Cross-tenant leakage
Indirect prompt injection through retrieved chunks
Persistence of malicious content in the index

Mitigations

Secure the vector store like a production database (via network controls, authentication, and encryption).
Enforce tenant isolation and document-level entitlements at retrieval time.
Build fast purge/reindex procedures.
Maintain versioned backups of vector indexes and ingestion configurations.

9) Misinformation (Hallucinations and Manipulated Grounding)

What it is: The model produces plausible but incorrect answers, or is grounded on manipulated or poisoned sources.

Enterprise Impact

Wrong remediation steps cause outages
Compliance/legal risk from fabricated citations
Poor decisions made with unwarranted confidence

Mitigations

Require grounding in approved sources for high-impact workflows.
Display citations/snippets and source metadata where possible.
Put humans in the loop for security actions, legal advice, financial commitments, or customer promises.
Ensure your authoritative runbooks are governed and protected.

10) Unbounded Consumption (Cost and Resource Exhaustion)

What it is: Attackers or poorly controlled workflows drive runaway token usage, tool calls, retrieval loops, or expensive model routes, which cause outages or unexpected cost.

Enterprise Impact

DoS-like degradation
Budget overruns
Cascading failures as downstream APIs throttle

Mitigations

Rate limits, token caps, concurrency limits, and tool-call quotas.
Circuit breakers and loop detection for agents.
Per-tenant budgets and alerts enforce model routing policies.

The Enterprise Approach: Govern What the AI Can Access and Recover What it Breaks

Most GenAI programs focus heavily on “safe prompts.” That’s necessary, but not sufficient.

A production-ready posture pairs:

Data governance and entitlement enforcement: Discover, classify, limit access, and monitor usage.
Resilience and recovery readiness: Immutable backups, clean restore points, and tested recovery of the systems and data your AI depends on.

With LLM applications, the realistic goal is not zero incidents. It’s:

Minimum blast radius
Rapid detection
Fast rollback and clean recovery
Auditability and proof of control

Final 10-Step LLM Security and Resilience Checklist

Use this as a release gate for any LLM app, RAG assistant, or agentic workflow.

Inventory every AI component end-to-end
- Apps, models, prompts, agent workflows, tools/actions, connectors, data sources, vector databases
- Assign an owner and approver for each system
Discover, classify, and label sensitive data before AI touches it
- Identify PII/PHI/PCI, secrets, regulated data, and critical IP
- Define AI-allowed vs AI-blocked categories
Enforce least-privilege retrieval and least-privilege actions
- Retrieval must respect identity and document entitlements
- Scope tool permissions with short-lived credentials and allowlists
Harden against direct and indirect prompt injection
- Treat all input and retrieved content as untrusted
- Add policy gates before retrieval and before response delivery
Constrain tool use and agent autonomy
- Allowlist tools/endpoints and validate arguments
- Require approvals for high-risk actions like delete, pay, publish, and provision
Treat LLM outputs as untrusted
- Validate/escape before execution or rendering
- Use structured outputs (schemas) and server-side verification
Secure the RAG and vector layer like production
- Control ingestion with provenance and change tracking
- Enforce tenant isolation, encryption, and auditing
Log, monitor, and alert on AI-specific signals
- Prompts (redacted), retrieval hits, tool calls, policy decisions
- Detect anomalies like injection patterns, tool-call spikes, unusual retrieval volume
Operational kill switches and incident response playbooks
- Ability to disable tools, block sources, pause ingestion, and isolate the agent
- Key rotation procedures for every connector the AI can access
Resilience: immutable backups and clean recovery
- Back up source data, configs, prompts, indexes, and pipelines
- Test restores regularly and maintain known-good baselines for rollback

Closing Thoughts

LLMs are becoming a new “interface” to enterprise systems, one that operates in natural language, often across sensitive data, and increasingly with permission to act. The teams that succeed in production are the ones that treat AI security as an enterprise discipline: Govern access, constrain actions, verify outputs, and ensure recovery.

If you can do those four consistently, you can scale GenAI safely, without slowing the business.