search

Akihiko Komada a.k.a 駒田明彦 (@aki1770)

Akihiko Komada a.k.a 駒田明彦 (@aki1770), LLM & Modeller kategorisinde yer alan bir yapay zeka kaydıdır ve LLM & Modeller başlıklarıyla ilişkilidir.

How it Works Attention Residuals (AttnRes) replace the fixed accumulation used in standard PreNorm residual connections with a dynamic, attention‑based mechanism.
 • – learned pseudo‑query vector for layer l The softmax

arrow_backTüm İçerikler
neurologyLLM & Modeller
Akihiko Komada a.k.a 駒田明彦
Akihiko Komada a.k.a 駒田明彦@aki1770

Akihiko Komada a.k.a 駒田明彦 (@aki1770)

Akihiko Komada a.k.a 駒田明彦 (@aki1770)
How it Works Attention Residuals (AttnRes) replace the fixed accumulation used in standard PreNorm residual connections with a dynamic, attention‑based mechanism.
 • – learned pseudo‑query vector for layer l The softmax attention lets each layer selectively aggregate earlier representations based on the current input, rather than treating all prior layers equally. Variants •Full AttnRes – every layer attends to allprevious layer outputs.
Memory: O(Ld) ( L = number of layers). •Block AttnRes – layers are grouped into N blocks (e.g., ~8 blocks). •Within a block: standard residual accumulation. •Across blocks: attention is applied only to block‑level summaries plus any partial sum from the current incomplete block.
Memory: O(Nd). Both variants are drop‑in replacements that keep the two‑phase transformer computation (attention → MLP) and typically use RMSNorm for stability. Why It Matters Uniform residuals in deep PreNorm transformers cause: •Gradient dilution – earlier layers receive weaker updates. •Uncontrolled hidden‑state growth – magnitudes explode with depth. AttnRes introduces learned, input‑dependent depth selection, which: •Keeps output norms bounded. •Distributes gradients uniformly across layers. •Improves training dynamics and scaling efficiency. Empirical Gains On a 48 B‑parameter Kimi Linear MoE model (3 B activated, 1.4 T tokens):
model

interestsLLM & Modeller kategorisinden

Haftalık AI Bülteni

Her hafta en önemli yapay zeka gelişmelerini e-postana gönderelim.