Skip to content

The Emergence of LLMs

Author: Hetul Patel | Published on: 6 Dec, 2023


timeline
        title Brief History of NLP
        section 1967
          MIT’s Eliza (First Chatbot) : πŸ‘ŒπŸΌ Groundbreaking human-computer interaction : πŸ‘ŽπŸΌ Limited contextual understanding
        section 1986
          Recurrent Neural Networks (RNNs) : πŸ‘ŒπŸΌ Memory for sequences : πŸ‘ŽπŸΌ Vanishing gradient for long sentences
        section 1997
          Long Short-Term Memory (LSTM) : πŸ‘ŒπŸΌ Selective ability to memorize or forget, retained long-term dependencies : πŸ‘ŽπŸΌ Complexity due to 3 different gates
        section 2014
          Gated Recurrent Units (GRUs) : πŸ‘ŒπŸΌ Simplified gating, efficient using reset and update gates : πŸ‘ŽπŸΌ Limited contextual understanding for long sequences
          Attention Mechanism: πŸ‘ŒπŸΌ Dynamic sequence processing, better context retention, offered fresh perspective : πŸ‘ŽπŸΌ Increased computational complexity
        section 2017
          Transformer Architecture : πŸ‘ŒπŸΌ Parallel sequence processing through multi-head attention : πŸ‘ŽπŸΌ High computational demand. Due to their size and complexity
timeline
        title Building Upon The Transformers
        section 2018
          OpenAI’s GPT-1, Google's BERT Model : πŸ‘ŒπŸΌ Bert - bidirectional encoder only <br>  GPT - unidirectional, decoder only : πŸ‘ŽπŸΌ Requires task specific fine-tuning
        title Building Upon The Transformers
        section 2019
          OpenAI's GPT-2, Google’s T5 : πŸ‘ŒπŸΌ Multi task solving, massive amount of compressed knowledge e.g. GPT-2 (40B data), T5 (7TB data) : πŸ‘ŽπŸΌ Model size, training complexity
        section 2020
          OpenAI's GPT-3 : πŸ‘ŒπŸΌ Unprecedented versatility, Few shot learning : πŸ‘ŽπŸΌ Enormous computational requirements, ethical concerns
        section 2022
          OpenAI's InstructGPT : πŸ‘ŒπŸΌ Learn from human feedback during training to follow human instructions better : πŸ‘ŽπŸΌ Tailored for instructions oriented tasks. Not suitable for natural, dynamic conversation
          ChatGPT : πŸ‘ŒπŸΌ Sibling of InstructGPT, optimized for conversations : πŸ‘ŽπŸΌ Works only with textual data, prone to hallucination, limited knowledge of world upto 2022
        section 2023
          GPT-4 : πŸ‘ŒπŸΌ Handles both text and image, human level on various benchmarks, allows integration of external tools such as web-browsing and code-interpreter : πŸ‘ŽπŸΌ Lacks other modalities