Post
                  
                    Andrej Karpathy @karpathy
                  
                  
                    
                      ยท
                      Jul 18, 2022
                    
                  
                
                
              Great post on the technical challenges of training a 176B Transformer Language Model. ~10 years ago you'd train neural nets on your CPU workstation with Matlab. Now need a compute cluster and very careful orchestration of its GPU memory w.r.t. both limits and access patterns.
Replies
No replies yet