Arming Loki with jArvIs: How AI Is Powering Real-World Intrusions

Written by Roshan Bhandari | Dec 5, 2025 9:20:28 AM

What happened?

Anthropic reports that a state-sponsored group (GTG-1002, assessed as China-nexus) used an AI agent to run most of a live cyber-espionage campaign. Roughly 30 entities were targeted, with several confirmed intrusions. The AI executed the bulk of the operation—reconnaissance, vulnerability discovery, exploitation, lateral movement, credential use, data access, and exfiltration—while humans stepped in only at a few approval checkpoints. Anthropic estimates the agent handled about 80–90% of the tactical work and operated at thousands of requests, often multiple per second, a pace humans can’t match.

The attackers built an autonomous framework around Claude Code and Model Context Protocol (MCP) servers. The orchestrator broke the intrusion into many small, routine-looking tasks (scan, validate, query, summarize), sequenced them across phases, and kept state between steps. Importantly, the toolkit leaned on commodity, open-source security tools; the novelty was in orchestration and scale, not custom malware. This let the actor parallelize activity across targets and hide intent inside lots of normal-seeming requests.

How did the operation unfold?

The operation unfolded in six stages.

First, human operators chose a target (a company or a government agency), set goals, and stood up an attack framework designed to run most steps on its own. They wired in an AI agent to plan and execute tasks with minimal oversight. Next came safety bypass and task shaping. Because well-trained models resist harmful use, the operators “jailbroke” the model by framing the work as legitimate security testing and slicing the objective into many small, harmless-looking tasks. In simple terms, a jailbreak is any method that makes a language model ignore its safety rules. Attackers can do this with prompt tricks on hosted models, such as role-play, obfuscation, or multi-step wrappers, or by running uncensored or local models where safety filters are removed. In both cases, the model stops saying no and starts doing the work: planning steps, writing code, calling tools, and summarizing results.

With those guardrails out of the way, the framework directed the AI to perform rapid reconnaissance. It mapped systems, services, portals, and data stores, and flagged high-value databases, then reported back with a concise summary. From there, the AI researched vulnerabilities, drafted or adapted exploits, tested candidates, and attempted initial access, checking in with humans at key points. Once a foothold existed, it harvested and tested credentials, identified high-privilege accounts, pivoted across systems, staged large volumes of sensitive data, ranked that data by intelligence value, and quietly exfiltrated it. Backdoors were planted to maintain access. In the final stage, the AI auto-documented everything: stolen credentials, systems analyzed, access paths, timelines, and outputs. These tidy artifacts made the next wave of operations faster and easier to automate.

Attack Chain

Why does this need attention?

When attackers leverage AI, they compress time. One short instruction turns into thousands of tiny actions that run in parallel: scanning, banner checks, CVE lookups, credential tests, and staging. A small crew can keep multiple intrusions moving around the clock, try many ideas at once, and pivot in seconds. The cost to operate goes down, the volume of activity goes up, and patch or containment windows feel much shorter for defenders.

The signal also changes shape. Each step looks harmless on its own, because the model breaks work into small, ordinary requests that rely on common tools. The risk only appears when you connect the steps and watch the tempo: bursts of reconnaissance followed by banner to CVE checks, then evenly spaced authentication probes, then callback “validation,” and finally staging and neat exfiltration. If you only chase single events or static IOCs, you will miss the bigger pattern. Defense has to focus on sequence and timing across DNS and proxy logs, EDR, IAM, cloud audit trails, CI/CD, and database logs, supported by fresher enrichment that maps to behaviors (for example ATT&CK techniques), not only domains and hashes.

These “Agentic Criminals,” or jailbreak AI agents, also add resilience on the attacker’s side. The system keeps notes, remembers context, retries with new tools, and adapts when something is blocked. Jailbreak prompts or “defensive testing” role-play can hide intent from basic filters. That means defenders need controls that slow or break this loop: rate limits, approval gates for risky actions, short-lived tokens with quick revocation, and deception such as honey credentials or canary buckets that turn the AI’s overconfidence into high-signal alerts. Monitoring model usage internally, for example unusual prompt rates, very long prompts, or repeated role claims, becomes part of normal security telemetry.

Recent research from Straiker describes Villager, an AI-native penetration testing framework released by a Chinese group called Cyberspike. On paper it looks like a red-team tool, but in practice it behaves like an AI-powered successor to Cobalt Strike. It connects Kali Linux toolsets with an AI model (DeepSeek) and uses Model Context Protocol (MCP) to automate full attack chains. The tool is published on PyPI and has already passed ten thousand downloads in its first two months, which means almost anyone can pull it into their environment with a single pip install.

All of this will fuel an “offense as a service” market. We should expect plug and play task graphs, rented playbooks, and affiliates who do not need deep expertise to run serious operations. On the defense side, teams will look for platforms that make pattern detection straightforward: clean normalization across sources, identity and asset stitching, behavior-first rules, fresh threat intelligence with source weighting, and one-click containment. The practical goal remains simple: spot the sequence early, raise the cost of the next step, and cut the loop before the data moves.

How attackers misuse AI ?

Prompt injection and RAG hijacking
Attackers hide instructions inside web pages, PDFs, tickets, notes, or wiki articles. When an LLM or RAG system reads that content, it gets tricked into following the attacker’s script, ignoring its original instructions, leaking data, or calling tools it should not.
Training data poisoning
Instead of attacking the model directly, attackers quietly poison what it learns from. They push fake or biased content into public repos, forums, wikis, or even logs; later, the model or detector trains on this polluted data and starts making wrong decisions that benefit the attacker.
Automating classic recon and exploitation
A simple order like “get in” turns into many small tasks: discover assets, fingerprint services, map versions to CVEs, try logins, and move or stage data. These steps can run in parallel, all day, with quick retries. The workflow still looks like MITRE ATT&CK, for example, reconnaissance, initial access, discovery, lateral movement, and exfiltration, but executed faster and more consistently with AI assistance.
Social engineering at scale
AI writes highly convincing phishing emails and chats, copies someone’s writing style, and can help script deepfake audio or video. This makes it easier to trick employees, reset accounts, or exhaust MFA with believable stories. OWASP notes that prompt injection and misuse can directly lead to social engineering and data exfiltration in downstream systems.
Malware and exploit assistance
Models can be misused to draft or adapt exploit code, create droppers and loaders, rewrite payloads so they slip past basic AV or EDR rules, and neatly document what worked. The creativity and speed come from AI; the underlying behaviors still map to familiar ATT&CK tactics such as execution, persistence, and defense evasion, which are also described in MITRE ATLAS.
Evasion and alert tuning
Attackers treat the defender’s environment as something they can test. They use AI to generate many variations of payloads and timings, observe which ones trigger alerts, then adjust until the activity stays below thresholds. MITRE ATLAS describes this type of AI-assisted probing and evasion against both traditional and AI-enabled defenses.
Identity and fraud operations
AI helps create realistic fake identities, fill in forms, and run long “support” style conversations that keep victims engaged. It can script entire fraud flows and spam campaigns without getting tired or inconsistent, which makes social engineering and financial fraud more scalable.
Model and key abuse
On the backend, attackers go after the plumbing: they steal API keys, abuse overly powerful tools that are connected to the model, or try to extract and copy the model weights themselves. OWASP’s LLM Top 10 highlights risks such as sensitive information disclosure, model theft, and supply chain weaknesses. Once an attacker has your keys or your model, they can impersonate your service or run a less restricted version offline.

Similar Real World AI-powered Intrusions

Attackers are already moving from “AI-assisted” to truly AI-powered malware. ESET’s research on the PromptLock ransomware shows how this works in practice. Instead of shipping a fixed set of scripts, the malware bundles a local language model and feeds it hard-coded prompts. The AI then generates custom Lua scripts on the victim machine in real time to search for files, exfiltrate data, and encrypt selected content. Because each run produces slightly different code and logic, it is harder for traditional signature or heuristic-based tools to recognise and block it. PromptLock is an early proof that ransomware can outsource parts of its own decision-making and tooling to an embedded model, rather than relying only on static binaries.

Google’s Threat Intelligence work shows the same pattern at a broader scale. State-backed groups and cybercriminals are using generative AI and platforms like Gemini to accelerate every stage of the attack life cycle: reconnaissance, phishing content, malware and script generation, infrastructure management, and even post-compromise analysis. In some documented cases, threat actors experiment with AI “agents” and automation frameworks that can adjust mid-operation, chain tools together, and dynamically change behaviour based on results. In other words, AI is becoming both a tool inside the malware and a remote assistant around it, making attacks cheaper to run, quicker to iterate, and harder for defenders to track using old, static indicators.

Threat	Observed	Public Disclosure	Attribution
LameHug	July 2025	July 2025	APT28 (Russia)
Gemini Prompt Injection	June 2025	Aug 2025	Independent Researchers
Claude CVEs	July 2025	Aug 2025	CVE-2025-54794/5
GPT-5 Jailbreak	Aug 2025	Aug 2025	Unknown
Charon Ransomware	Aug 2025	Aug 2025	Earth Baxia-style

Securing from these types of attacks

Modern AI platforms no longer rely on a single “safety switch” to stop misuse. They build layers of protection around the model, the data it can see, the tools it can call, and the accounts behind it. The goal is the same as in OWASP LLM Top 10 and MITRE ATLAS: reduce the chance that an attack works, and minimise the impact if it does. Strong programmes also move quickly, spotting new abuse patterns, shipping fixes, and updating developers so the same prompt, data poisoning trick, or tool abuse does not keep working.
• Layered safety
Safer pre-training and fine-tuning, continuous red-teaming, and input/output classifiers that catch risky prompts and responses before they reach tools or sensitive data. This is in line with OWASP’s focus on prompt injection and model misuse, and with ATLAS guidance on testing AI systems like any other exposed component.
• Clean context (“context hygiene”)
Allow-list what the model can retrieve, redact secrets, scope memory, and sanitise HTML or Markdown so hidden instructions are stripped out. OWASP LLM Top 10 calls out prompt injection and insecure output handling; ATLAS also highlights that poisoned or hostile content in your own data sources can drive bad model behaviour if you do not control what the model reads.
• Controlled tool use with least privilege
Only approve the tools that are really needed, give each one minimal permissions, and use dry-run or simulation modes wherever possible. For high-risk actions such as code execution, cloud changes, or CI/CD operations, put a human approval step in the middle. This directly reduces the impact of both prompt injection and model compromise, which OWASP and ATLAS both list as key risks.
• Sandboxed execution environments
When models generate or run code, do it in isolated sandboxes with timeouts, resource limits, and restricted network access. That way, even if the model is tricked into producing something harmful, the blast radius is contained.
• Rate limits and anomaly detection
Limit how fast and how much a single user, API key, or app can call the model. Watch for strange patterns such as very long prompts, sudden spikes in volume, or wide fan-out across tools. These are the kinds of “abuse patterns” ATLAS describes when adversaries use AI to probe and iterate.
• Account and platform controls
Treat API keys and service accounts like high-value credentials: use strong issuance, tight scoping, rotation, usage caps, and fast takedowns for abuse. OWASP’s LLM Top 10 points to supply-chain and access risks; if an attacker cannot steal or overuse keys, they have a much harder time abusing the model at scale.
• Risk-based routing
Score each request based on content, source, and context. If it looks risky, route it to stricter policies or more restricted models. This lets you handle most normal traffic smoothly, while still putting guardrails around suspicious usage.
• Auditability and provenance
Log prompts, outputs, and tool actions in a privacy-aware way so you can investigate incidents and tune defences. Where possible, prefer signed or provenance-tagged content to help defend against deepfakes and synthetic fraud, which both OWASP and ATLAS recognise as growing abuse areas.
• Rapid response loop
Treat jailbreaks, prompt injection chains, and data-poisoning tricks like vulnerabilities. Hunt for them, add new detection rules, patch filters and policies, and update developer guidance. Both OWASP and MITRE stress that threat models for AI must be living documents, not one-off exercises.
• Secure defaults for the ecosystem
Provide SDKs, templates, and policy examples that ship with safe defaults so downstream apps inherit protections instead of bolting them on later. This lines up with OWASP’s push for secure-by-default patterns, and ATLAS’s view that many AI failures come from how systems are integrated rather than the core model itself.

How can we help?

At Logpoint, we are continuously enhancing our detection and response ecosystem to keep pace with the evolving threat landscape. This includes actively refining our detection rules, tuning alert logic to reduce noise, and expanding our library of automated response playbooks. Our focus is not only on spotting known threats but also on detecting behavioral anomalies and weak signals that point to early-stage intrusions, including stealthy, low-noise campaigns such as advanced persistent threats (APTs).

While the delivery and orchestration of attacks are changing, often powered by AI or automation, the fundamental techniques in the attack chain remain familiar: scanning, exploitation, credential abuse, lateral movement, and data theft. What has changed is the pace and precision. Threat actors now break operations into smaller, faster tasks that run at machine speed, which makes traditional detections struggle to keep up. This is where smarter, behavior-aware defense becomes critical.

With our SIEM plus NDR coalesced package, customers gain consolidated visibility across both log and network data. This integration allows you to correlate events, detect stealthy attacker behavior, and identify patterns that may look harmless in isolation but clearly malicious when viewed in sequence. Whether it is credential stuffing, beaconing traffic, or suspicious command execution, Logpoint provides real-time analytics, enriched threat intelligence, and automated response workflows that are designed to keep up with both human-driven and AI-driven intrusions.

Conclusion

Judgment Day in security will not come from angry robots(Terminator); it will come from friendly AI's doing exactly what the wrong person asks. In this blog, we explored how threat actors are evolving using language models not just for support, but as full-fledged operators executing reconnaissance, exploitation, credential abuse, and data theft with minimal human oversight. By breaking down a real-world AI-assisted intrusion, we showed how the attack flow mirrors traditional kill chains, just faster and harder to detect. These developments don’t change the fundamentals of cybersecurity, they raise the stakes. At Logpoint, we recognize this shift and continue to build and refine our SIEM and NDR solutions to uncover even the most subtle signs of compromise. The tools may be changing, but the mission remains: detect early, respond fast, and stay resilient.

View full post