upcarta
  • Sign In
  • Sign Up
  • Explore
  • Search

From Deep to Long Learning?

  • Article
  • Mar 28, 2023
  • #ArtificialIntelligence #MachineLearning #ChatGPT
Dan Fu
@realDanFu
(Author)
Michael Poli
@MichaelPoli6
(Author)
hazyresearch.stanford.edu
Read on hazyresearch.stanford.edu
1 Recommender
1 Mention
For the last two years, a line of work in our lab has been to increase sequence length. We thought longer sequences would enable a new era of machine learning foundation models: the... Show More

For the last two years, a line of work in our lab has been to increase sequence length. We thought longer sequences would enable a new era of machine learning foundation models: they could learn from longer contexts, multiple media sources, complex demonstrations, and more. All data ready and waiting to be learned from in the world! It’s been amazing to see the progress there. As an aside, we’re happy to play a role with the introduction of FlashAttention (code, blog, paper) by Tri Dao and Dan Fu from our lab, who showed that sequence lengths of 32k are possible–and now widely available in this era of foundation models (and we’ve heard OpenAI, Microsoft, NVIDIA, and others use it for their models too–awesome!).

Show Less
Recommend
Post
Save
Complete
Collect
Mentions
See All
Matt Clifford @matthewclifford · Apr 9, 2023
  • Post
  • From Twitter
Fascinating work: “These models hold the promise to have context lengths of millions… or maybe even a billion!” Also shows huge power of new primitives in ML: even if there were *no* more research, remixing of existing ideas would take us a long way…
  • upcarta ©2025
  • Home
  • About
  • Terms
  • Privacy
  • Cookies
  • @upcarta