Blog (est 17Feb2003). Archived a few times since. As a Current Affairist, I comment on varied topics of interest. I welcome comments and a good debate. You can also find me on twitter/@prasunchat Care is taken that anything sourced from the internet is referenced by the URL link, where it was found. All credits to where they are due.
Monday, November 27, 2023
About OSWAS
Top priority of Higher Education Department in Odisha in terms of technology
Wednesday, November 15, 2023
Generative AI and LLM - A primer
1. Background and Evolution of Large Language Model (LLM)
LLM is a Generative model that can input large set of unstructured data and generate large volume of textual output.
1.1 A timeline of LLM evolution :-
The development of LLM is not new, It went through a gradual evolution since 2002-3:
- 2003 - Bag of words : ML to do natural Language Processing NLP
- 2008 - TF-IDF : Multi-task Learning
- 2013 - Co-occurrence Matrix : Word embeddings
- 2013 - Word to Vec/G Love: NLP Neural Nets
- 2014 - Seq to Seq Learning
- 2015 - Transformer Models , Attention
- 2019 - ELMOs/BERT/XLNet : Pre-trained Models
- Nov 2022 - OpenAIs GPT3.5
- Dec 2022 - Google's MedPaLM
- Feb 2023 - Amazon's Multimodal-CoT
- Feb 2023 - Meta's LLaMA
- Feb 2023 - Microsoft's Kosmos-1
- Mar2023 - Salesforce's einstien GPT
- Mar 2023 - OpenAI's GPT-4
- Mar 2023 - Google's Bard
- Mar 2023- Bloomberg's LLM
- Apr 2023 - Amazon's Bedrock
The transformer architecture dispenses of any recurrence and instead relies solely on a self-attention (or intra-attention) mechanism.
In terms of computational complexity, self-attention layers are faster than recurrent layers when the sequence length n is smaller than the representation dimensionality d …
– Advanced Deep Learning with Python, 2019.
The encoder-decoder architecture has been extensively applied to sequence-to-sequence (seq2seq) tasks for language processing. Examples of such tasks within the domain of language processing include machine translation and image captioning.
The earliest use of attention was as part of RNN based encoder-decoder framework to encode long input sentences [Bahdanau et al. 2015]. Consequently, attention has been most widely used with this architecture.
Encoder-decoder models are typically used for natural language processing tasks that involve understanding input sequences and generating output sequences, often with different lengths and structures. They are particularly good at tasks where there is a complex mapping between the input and output sequences and where it is crucial to capture the relationships between the elements in both sequences. Some common use cases for encoder-decoder models include text translation and summarization.
Some notable examples of these new encoder-decoder models include
In the context of LLM, attention is defined as a mechanism that allows the model to selectively focus on different parts of the input text. This mechanism helps the model attend to the input text’s most relevant parts and generate more accurate predictions
The use of attention in LLMs is to improve the model’s ability to understand the context of the input text and generate more coherent and relevant output. Attention mechanisms in LLMs, particularly the self-attention mechanism used in transformers, allow the model to weigh the importance of different words or phrases in a given context.
There are two types of attention mechanisms in LLMs: self-attention and cross-attention.
Self-attention is used to weigh the importance of different words or phrases within the same input text,
Cross-attention is used to weigh the importance of different words or phrases between two different input texts.
The measurement of attention in LLMs is done by calculating the attention weights assigned to each word or phrase in the input text. These weights are calculated using a softmax function, which normalizes the weights and ensures that they sum up to 1
Here are a couple of examples of how attention is used in LLMs:
LLM stands for Large Language Model. It is a machine learning model that is trained on large amounts of data to generate text. LLMs are used in various natural language processing tasks such as language translation, text summarization, and question answering .
Parallelism is used to train the model faster by distributing the workload across multiple processors or GPUs. There are two types of parallelism: data parallelism and model parallelism.
Data parallelism involves splitting the data into smaller batches and processing them in parallel across multiple processors or GPUs. This technique is useful when the model is too large to fit into a single GPU memory.
Model parallelism involves splitting the model into smaller parts and processing them in parallel across multiple processors or GPUs. This technique is useful when the model is too large to fit into a single processor or GPU memory.
Scalability is used to train the model on larger datasets or with more complex architectures. Scalability can be measured in terms of speedup and efficiency.
Speedup is the ratio of the time taken to complete a task on a single processor or GPU to the time taken to complete the same task on multiple processors or GPUs. A higher speedup indicates better scalability.
Efficiency is the ratio of the speedup to the number of processors or GPUs used. A higher efficiency indicates better scalability.
Here are a couple of examples of LLMs:
GPT-3: It is a state-of-the-art LLM developed by OpenAI that has 175 billion parameters. It is used for various natural language processing tasks such as language translation, text summarization, and question answering 1.
BERT: It is another popular LLM developed by Google that has 340 million parameters. It is used for various natural language processing tasks such as sentiment analysis, named entity recognition, and question answering 1.
In tale above, what we are saying is LLaMA is a good open source model for text output. codellama-7b-instruct has 7billion constructs that can be used for writing code.
- Integrate to other apps
- Virtual Assistants
- Developer Co-Pilot
- Custom Applications
- Content Generation: Lead generation/ Onboarding/ Customer Management/ Delinquency & Foreclosures
- Workflow management:
- Client Experience and Interaction
- Security Compliance
- Workflow Optimization
In the context of LLMs, SFT stands for Supervised Fine-Tuning. It is a technique used to fine-tune a pre-trained LLM on a specific task by providing it with labeled examples.
RLHF stands for Reinforcement Learning from Human Feedback. It is a method used to train LLMs to align their output with human intentions and values. RLHF involves teaching an LLM to understand human preferences by assigning scores to different responses from the base model. The goal is to use the preference model to alter the behavior of the base model in response to a prompt.
Here are some examples of how RLHF is used in LLMs:
ChatGPT: It is a state-of-the-art LLM developed by AssemblyAI that uses RLHF to learn human preferences and provide a more controlled user experience.
NVIDIA SteerLM: It is a technique developed by NVIDIA that uses RLHF to customize LLMs during inference.
A graph is a versatile data structure that lends itself well to the way data is organized in many real-world scenarios. We can think of an image as a graph, where each pixel is a node, directly connected to its neighboring pixels …
– Advanced Deep Learning with Python, 2019.
In the encoder-decoder attention-based architectures, the set of vectors that encode the input sequence can be considered external memory, to which the encoder writes and from which the decoder reads. However, a limitation arises because the encoder can only write to this memory, and the decoder can only read.
Memory-Augmented Neural Networks (MANNs) are recent algorithms that aim to address this limitation.
The Neural Turing Machine (NTM) is one type of MANN. It consists of a neural network controller that takes an input to produce an output and performs read and write operations to memory. Examples of applications for MANNs include question-answering and chat bots, where an external memory stores a large database of sequences (or facts) that the neural network taps into.