upcarta
  • Sign In
  • Sign Up
  • Explore
  • Search

Transformer Math 101

  • Article
  • Apr 18, 2023
  • #Math
blog.eleuther.ai
Read on blog.eleuther.ai
1 Recommender
1 Mention
A lot of basic, important information about transformer language models can be computed quite simply. Unfortunately, the equations for this are not widely known in the NLP community... Show More

A lot of basic, important information about transformer language models can be computed quite simply. Unfortunately, the equations for this are not widely known in the NLP community. The purpose of this document is to collect these equations along with related knowledge about where they come from and why they matter.

Note: This post is primarily concerned with training costs, which are dominated by VRAM considerations. For an analogous discussion of inference costs with a focus on latency, check out this excellent blog post by Kipply.

Show Less
Recommend
Post
Save
Complete
Collect
Mentions
See All
Jean de Nyandwi @Jeande_d ยท Apr 20, 2023
  • Post
  • From Twitter
Transformer Math 101 An excellent blog post about basic math related to computation and memory usage for transformers. Nicely explained!!
  • upcarta ©2025
  • Home
  • About
  • Terms
  • Privacy
  • Cookies
  • @upcarta