Two Kinds of “The Operator Cannot See Your Prompt”

A map of private inference in 2026

Darkbloom launched this week, and the response on Hacker News — 470 points, hundreds of comments — is a clean signal: developers want cheaper inference, and they want it without a hyperscaler reading their prompts. The technical pitch is striking. Idle Apple Silicon Macs serve inference. Requests are end-to-end encrypted. Debuggers are denied at the kernel level. An operator with full physical custody of the machine cannot read what flows through it.

That last claim is the one worth pausing on. Because “the operator cannot see your prompt” is true of Darkbloom in roughly the same sense that it is true of Apple’s Private Cloud Compute, NVIDIA Confidential Computing, and also — in a completely different way — true of systems built on fully homomorphic encryption, multi-party computation, and zero-knowledge proofs.

These are not the same technology. They are not even the same category of technology. They defend against different adversaries, rely on different trust anchors, and fail in different ways. Treating them interchangeably is how buyers end up deploying the wrong tool for their actual threat model.

This post is a map. It is not an argument that one approach is correct and another is wrong. Both are real. Both are shipping. Both have uses the other cannot cover. The goal is to give you the distinctions you need to read a “private AI” claim without getting fooled — yours or someone else’s.

“Operator cannot see the prompt” has two meanings

The two meanings are worth stating plainly before anything else.

The TEE-based meaning. The data is decrypted inside a hardware-isolated execution environment — an Apple Secure Enclave, an Intel TDX enclave, an AMD SEV-SNP guest, an NVIDIA H100 or Blackwell GPU running in confidential-compute mode. Inside that environment, the data is in the clear. Computation happens on plaintext, at full hardware speed. What prevents the operator from seeing it is a combination of hardware isolation, memory encryption at the bus level, cryptographic attestation of the software stack, and policy choices like disabling debuggers and logging. The guarantee is: an attacker who controls everything except the silicon root of trust cannot observe the data.

The cryptographic meaning. The data is never decrypted at all during computation. It remains ciphertext end-to-end. The server performs arithmetic on encrypted values and returns encrypted results. Only the key holder can read the output. What prevents the operator from seeing the data is mathematics — specifically, the hardness of lattice problems underlying schemes like CKKS, BFV, and TFHE. The guarantee is: an attacker who controls everything, including the silicon, cannot observe the data, because no component of the system ever holds it in plaintext.

These are different guarantees with different costs. The first is fast and practical for realistic workloads but requires you to trust the hardware vendor. The second is slow and narrow but requires you to trust no one in particular.

The TEE family, examined

Start with Darkbloom’s concrete design, because it is a good representative of the current state of TEE-based AI privacy. The provider process runs in-process with the inference engine — no subprocess, no local server, no IPC. PT_DENY_ATTACH blocks debuggers at the kernel level. Memory-reading APIs are denied. The coordinator encrypts each request with the provider’s X25519 key before forwarding; only the hardened provider process decrypts. Attestation data is publicly verifiable. The trust anchor is the Apple Secure Enclave.

Apple’s own Private Cloud Compute is a close cousin of this architecture, deployed at hyperscaler scale. PCC uses custom Apple Silicon servers, a hardened OS, and cryptographic attestation of every software image running in the data center. Requests are routed through an anonymizing relay so that Apple cannot link a request to a user. Crucially, and this is explicit in Apple’s threat model, PCC does not encrypt data during runtime on the node. Data is decrypted inside the trusted environment and processed in the clear. What PCC provides is a hardened path to that environment and a cryptographic guarantee about what code will run once the data arrives.

NVIDIA’s Confidential Computing on H100 and Blackwell GPUs extends the same pattern to GPU workloads. The GPU has an on-die hardware root of trust, encrypted memory, and an encrypted bounce buffer between CPU and GPU. In confidential-compute mode, data stays encrypted on the bus and in GPU memory until it is inside the TEE boundary. Blackwell adds TEE-I/O, which extends the protected path over NVLink, so multi-GPU workloads can stay confidential across the interconnect. Published benchmarks put Blackwell’s confidential mode at nearly the same throughput as unencrypted — a dramatically different cost curve than the FHE world.

What all three share is the trust model. You are trusting:

  1. That the hardware vendor designed the root of trust correctly.
  2. That the hardware vendor did not insert a backdoor, whether deliberately or under government compulsion.
  3. That the attestation chain has no exploitable flaw between the hardware measurement and the code running inside.
  4. That side channels — timing, power, electromagnetic, speculative-execution — do not leak enough information to defeat the isolation.
  5. That the supply chain delivered the actual chip the vendor designed, without tampering.

These are not trivial assumptions. They are routinely challenged by academic research, including recent in-depth analyses of NVIDIA’s GPU confidential-computing architecture. But for most commercial threat models — “don’t let the cloud provider’s engineers read my prompts,” “don’t let a compromised host OS steal my model weights” — TEEs are a perfectly reasonable answer, and they run at production speeds.

The cryptographic family, examined

FHE, MPC, and ZKP are not one technology but three closely related ones, each with different primitives and different trade-offs. They share a structural property: the adversary is assumed to be unbounded in their access to the system, and the security guarantee follows from mathematics rather than from hardware.

Fully homomorphic encryption allows arbitrary arithmetic on ciphertext. Modern schemes — CKKS for approximate arithmetic, BFV/BGV for exact integer arithmetic, TFHE for boolean circuits — encode a vector of plaintext values into a ring-element ciphertext and support ciphertext addition and multiplication with noise that grows with circuit depth. Bootstrapping refreshes the noise but is expensive. The security reduction is to the Ring Learning With Errors problem, which is believed hard against both classical and quantum adversaries.

Multi-party computation splits data across several parties such that no single party sees the plaintext; computation proceeds through interaction between the parties, and the result is correct as long as some threshold of parties remains honest. Threshold FHE is the natural fusion: the decryption key itself is secret-shared across parties, so no single party can decrypt at all.

Zero-knowledge proofs let a prover convince a verifier that a statement is true without revealing anything beyond the fact of its truth. For private inference, this matters because you often want not just the answer but a proof that the answer was computed correctly from the encrypted input.

The honest story about FHE performance in 2026 is that it is improving fast and is still very slow compared to plaintext. Recent surveys put FHE overhead at roughly 10^5× slower than cleartext for realistic deep learning. GPU-accelerated CKKS implementations have brought CNN inference on CIFAR-10 down from thousands of seconds to a few seconds per image. For LLMs, the state of the art is something like GPT-2 small with LoRA, reporting on the order of 1.6 seconds per token under carefully engineered parameter choices. Recent ICLR work on FHE-based transformer inference reports single-digit-hours per prefill for small models. This is not a technology you plug into your Claude replacement and expect interactive chat.

Where FHE genuinely shines is in computations with modest arithmetic depth applied to data from mutually distrusting parties. Private set intersection. Encrypted database queries. Summing encrypted supplier-level CO₂ emissions across a supply chain so that aggregate Scope 3 reporting becomes possible without any supplier revealing raw data to any other. Matching encrypted medical records across hospitals — an organ-transplant problem, for instance — where Threshold FHE removes the question of “who holds the decryption key” by ensuring nobody does. These are not LLM inference workloads. They are workloads where the privacy requirement is structural, where the parties have legal or competitive reasons to distrust one another, and where latency budgets are measured in minutes or hours rather than milliseconds.

A threat model comparison

Laying the two families side by side against concrete adversaries clarifies where each fits.

A cloud operator’s curious engineer. TEEs defeat this attacker decisively — data is encrypted in transit, attested on arrival, processed only by audited code. FHE defeats this attacker too, but you paid 10^5× in compute to defeat an adversary a TEE would have beaten for free. The engineer is the TEE’s home turf.

A malicious host OS or hypervisor. TEEs handle this — that is precisely what confidential VMs and confidential containers are designed for. FHE handles it trivially, because the host never sees plaintext at all. Either works.

A sophisticated physical attacker with a bus analyzer and a DRAM cooling attack. TEEs mitigate this at considerable effort — Apple explicitly includes this attacker in PCC’s threat model; NVIDIA’s published threat model for Hopper and Blackwell confidential mode addresses PCIe bus probing with in-line encryption. Whether the mitigation is sufficient depends on the specific attack and the specific hardware generation. FHE is indifferent to this attacker by construction. The bus carries only ciphertext.

The hardware vendor itself, or a state actor compelling the hardware vendor. TEEs cannot defend against this — the root of trust is the vendor. This is not a flaw in the technology; it is a definition. FHE defends against this, because no hardware component is assumed trustworthy. If your threat model includes “what if Apple or NVIDIA is compromised,” only the cryptographic family applies.

A future adversary with a cryptographically relevant quantum computer, decrypting harvested ciphertext from 2026. This is the “harvest now, decrypt later” concern. Most TEE-based systems today use ECC-based key exchange — X25519 is the common choice, including in Darkbloom — which is quantum-broken. The data-in-use guarantee is unaffected by quantum attacks because the data is never encrypted during computation, but the transport layer is. FHE based on lattice assumptions (RLWE) is believed post-quantum secure for the data itself. Mature designs in either family are beginning to adopt ML-KEM for key exchange; check the specific system.

Side channels — timing, cache, power, EM. Both families are vulnerable, in different ways. TEEs have a substantial published literature of side-channel breaks; mitigating them is an ongoing effort. FHE implementations have their own side-channel issues, particularly around bootstrapping and noise management. Neither is a silver bullet.

No single technology dominates across this table. That is the whole point.

Which one should you use

The honest guidance is that these are complementary, not competing, for most realistic deployments.

For interactive LLM inference at scale, TEEs are the only practical answer in 2026. The cost curve simply does not support FHE-based chat completion. Darkbloom, Apple PCC, NVIDIA confidential GPU instances, Azure confidential containers with H100 — this is where the industry is, and it is a reasonable place to be for most commercial privacy requirements.

For fixed-depth arithmetic over encrypted data from mutually distrusting sources, the cryptographic family is frequently the correct choice and sometimes the only choice. Aggregating Scope 3 emissions across a supply chain where each supplier’s raw data is competitively sensitive. Matching medical records across hospitals where legal constraints forbid any party from holding plaintext from another party. Financial settlement calculations where the regulator, the counterparties, and the platform operator are all potential adversaries to each other. Voting systems where verifiability and ballot secrecy must hold simultaneously. These are not LLM problems. They are problems where the structure of distrust is irreducible, and trying to solve them with a TEE means picking which participant gets to be the trusted party — which is often the problem you were hired to eliminate.

For systems where the threat model genuinely includes the hardware vendor, only the cryptographic family is responsive. This is a smaller market than the previous two, but it exists, and it is where certain strands of post-quantum cryptographic infrastructure are being built.

A useful heuristic: if you can name the party who holds the decryption key, you are in TEE territory and should probably just use a good TEE. If the honest answer is “nobody holds the key, and that is the point,” you are in cryptographic territory and should not try to reduce it to a hardware problem.

The 2026 picture

What I see when I look at this landscape is not a competition but a division of labor that is still being worked out in public.

TEE-based private inference is having a commercial moment. Darkbloom’s Apple-Silicon-on-idle-Macs architecture, Apple’s data-center PCC deployment, and NVIDIA’s Blackwell confidential GPUs are all maturing at the same time, and they collectively make “private by construction” a realistic default for AI workloads rather than a research curiosity. The remaining questions are not technical so much as governance-shaped: how is attestation verified, who audits the hardened OS images, how are side-channel disclosures handled, how does the supply chain prove itself.

Cryptographic private computation is having a different kind of moment. GPU-accelerated CKKS has crossed the threshold where small CNN inference is genuinely practical. Threshold FHE is being deployed in real multi-party workflows. Zero-knowledge systems are standardizing. The workloads being unlocked are not “chat with a model” — they are the structural-distrust workloads that TEEs cannot cleanly serve, and there are more of those than the LLM-centric discourse usually admits.

The mistake to avoid in either direction is conflation. If you read “operator cannot see your prompt” and do not ask which guarantee is being offered, you will eventually end up with the wrong one. If you read “privacy-preserving AI” and do not ask whether the trust root is silicon or mathematics, you cannot evaluate whether the claim matches your threat model.

Both of these families are real technologies solving real problems. The point is to know which one is in front of you when somebody says the word “private.”


The author works on cryptographic infrastructure for supply-chain and healthcare applications, including post-quantum key management (hyde), GPU-accelerated CKKS (plat), and multi-party organ-matching (Niobi).

コメント

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です

IP: 取得中...
216.73.216.99216.73.216.99