The Emergence of LLMs¶
Author: Hetul Patel | Published on: 6 Dec, 2023
timeline
title Brief History of NLP
section 1967
MITβs Eliza (First Chatbot) : ππΌ Groundbreaking human-computer interaction : ππΌ Limited contextual understanding
section 1986
Recurrent Neural Networks (RNNs) : ππΌ Memory for sequences : ππΌ Vanishing gradient for long sentences
section 1997
Long Short-Term Memory (LSTM) : ππΌ Selective ability to memorize or forget, retained long-term dependencies : ππΌ Complexity due to 3 different gates
section 2014
Gated Recurrent Units (GRUs) : ππΌ Simplified gating, efficient using reset and update gates : ππΌ Limited contextual understanding for long sequences
Attention Mechanism: ππΌ Dynamic sequence processing, better context retention, offered fresh perspective : ππΌ Increased computational complexity
section 2017
Transformer Architecture : ππΌ Parallel sequence processing through multi-head attention : ππΌ High computational demand. Due to their size and complexity
timeline
title Building Upon The Transformers
section 2018
OpenAIβs GPT-1, Google's BERT Model : ππΌ Bert - bidirectional encoder only <br> GPT - unidirectional, decoder only : ππΌ Requires task specific fine-tuning
title Building Upon The Transformers
section 2019
OpenAI's GPT-2, Googleβs T5 : ππΌ Multi task solving, massive amount of compressed knowledge e.g. GPT-2 (40B data), T5 (7TB data) : ππΌ Model size, training complexity
section 2020
OpenAI's GPT-3 : ππΌ Unprecedented versatility, Few shot learning : ππΌ Enormous computational requirements, ethical concerns
section 2022
OpenAI's InstructGPT : ππΌ Learn from human feedback during training to follow human instructions better : ππΌ Tailored for instructions oriented tasks. Not suitable for natural, dynamic conversation
ChatGPT : ππΌ Sibling of InstructGPT, optimized for conversations : ππΌ Works only with textual data, prone to hallucination, limited knowledge of world upto 2022
section 2023
GPT-4 : ππΌ Handles both text and image, human level on various benchmarks, allows integration of external tools such as web-browsing and code-interpreter : ππΌ Lacks other modalities