The Emergence of LLMs¶

Author: Hetul Patel | Published on: 6 Dec, 2023

timeline
        title Brief History of NLP
        section 1967
          MIT’s Eliza (First Chatbot) : 👌🏼 Groundbreaking human-computer interaction : 👎🏼 Limited contextual understanding
        section 1986
          Recurrent Neural Networks (RNNs) : 👌🏼 Memory for sequences : 👎🏼 Vanishing gradient for long sentences
        section 1997
          Long Short-Term Memory (LSTM) : 👌🏼 Selective ability to memorize or forget, retained long-term dependencies : 👎🏼 Complexity due to 3 different gates
        section 2014
          Gated Recurrent Units (GRUs) : 👌🏼 Simplified gating, efficient using reset and update gates : 👎🏼 Limited contextual understanding for long sequences
          Attention Mechanism: 👌🏼 Dynamic sequence processing, better context retention, offered fresh perspective : 👎🏼 Increased computational complexity
        section 2017
          Transformer Architecture : 👌🏼 Parallel sequence processing through multi-head attention : 👎🏼 High computational demand. Due to their size and complexity

timeline
        title Building Upon The Transformers
        section 2018
          OpenAI’s GPT-1, Google's BERT Model : 👌🏼 Bert - bidirectional encoder only <br>  GPT - unidirectional, decoder only : 👎🏼 Requires task specific fine-tuning
        title Building Upon The Transformers
        section 2019
          OpenAI's GPT-2, Google’s T5 : 👌🏼 Multi task solving, massive amount of compressed knowledge e.g. GPT-2 (40B data), T5 (7TB data) : 👎🏼 Model size, training complexity
        section 2020
          OpenAI's GPT-3 : 👌🏼 Unprecedented versatility, Few shot learning : 👎🏼 Enormous computational requirements, ethical concerns
        section 2022
          OpenAI's InstructGPT : 👌🏼 Learn from human feedback during training to follow human instructions better : 👎🏼 Tailored for instructions oriented tasks. Not suitable for natural, dynamic conversation
          ChatGPT : 👌🏼 Sibling of InstructGPT, optimized for conversations : 👎🏼 Works only with textual data, prone to hallucination, limited knowledge of world upto 2022
        section 2023
          GPT-4 : 👌🏼 Handles both text and image, human level on various benchmarks, allows integration of external tools such as web-browsing and code-interpreter : 👎🏼 Lacks other modalities