The genesis of this blog post stems from a thoughtful discussion I had with my roommate. The impetus for this conversation originated from the recent widespread popularity of chatGPT, which utilizes transformer technology and, consequently, has put the attention mechanism once again in the spotlight. While there have been many discussions surrounding the attention mechanism, it is widely known that its structure comprises the “q-k-v” formulation - a fact that has largely been established since its inception in the article “Attention is all you need”. However, this prompts the question:‘where did it originate?’ This thought-provoking topic serves as the basis of this blog post.