/

/

P/D disaggregated serving

↑↓ pour naviguer
↵ pour ouvrir
⟶ pour sélectionner
⌘ ⌥ ↵ pour ouvrir dans un panneau
esc pour rejeter

⌘ '

raccourcis clavier

P/D disaggregated serving

and scaling in hyperscaler.

Étiquette

ml

publié à
16 juin 2025
modifié à
17 juin 2025
durée
1 min de lecture (28 words)
source
llms.txt

P/D disaggregated serving

The idea is for a inference engine to have separate prefill/decode node and ratio to scale independently. Think of DeepSeek R1

See also: distributed inference for LLMs

Prefill/Decode

The idea is for a inference engine to have separate prefill/decode node and ratio to scale independently. Think of DeepSeek R1

See also: distributed inference for LLMs

Prefill/Decode

Vous pourriez aimer ce qui suit

It is Valentine, and I don't have any plans.

Cholesky decomposition

The Prisoner's Dilemma

Questions about Apology

Liens retour

and the backbone of the AI progress.

Quartz v4.5.0 © 2025