Monday, November 27, 2023

About OSWAS

OSWAS project in Odisha info:

- Name and description: OSWAS stands for Odisha State Workflow Automation System. It is the state government's most prestigious IT initiative to automate the functions at all levels of the administrative hierarchy of government for smooth functioning and timely delivery of government services¹.

- Status and progress: The new version of OSWAS application with improved features has been put on soft-launch in July 2018 for making the users familiar with it. It was targeted to make the new system Go-Live in September 2018 after necessary customization. As part of the 5T initiative of Government of Odisha, OSWAS is being extended to all Directorates/ HoDs for their integration with respective Administrative Departments in order to speed up the process of decision making².

- Stakeholders and partners: The stakeholders and partners involved in the implementation of OSWAS are the government departments, agencies, OCAC, IT department, vendors, consultants, and beneficiaries¹².

- Impact and outcomes: The impact and outcomes of the implementation of OSWAS are:

    - Increase efficiency and effectiveness of the processes
    - Increase employee productivity
    - Efficient management of data
    - Better communication and co-ordination and advancement towards knowledge-led governance
    - Provision of latest IT infrastructure, connectivity, state-of-the-art data centre and security facilities
    - 24x7 secure access, digital signature, online publishing of notices and circulars, SMS and email notification and real time executive dash board
    - In-built Odia plugin for noting and drafting
    - Mobile and cloud ready with web responsive design
    - Robust disaster recovery link with OCAC, State Data Center and National Data Center
    - Principal applications like correspondence management, file management, file processing, record room, knowledge bank, internal messaging, dash board, notice board, MIS reports, audit management and advanced search engine
    - Common applications like Assembly questions, RTI, online telephone directory, vehicle management, tours management, leave management¹².

- Budget and resources: Not enough info available.

Source: dt 11/27/2023
(1) OSWAS | The Odisha Computer Application Centre. https://www.ocac.in/en/services/schemes/oswas.
(2) Odisha Secretariat Workflow Automation System (OSWAS) put on soft .... https://orissadiary.com/odisha-secretariat-workflow-automation-system-oswas-put-soft-launch-making-users-familiar/.
(3) Odisha State Wide Network (OSWAN). https://oswan.gov.in/default.asp?GL=1.
(4) Orissa State Wide Network (OSWAN). https://www.oswan.gov.in/AboutOswan.asp?GL=2.

Top priority of Higher Education Department in Odisha in terms of technology

HED PRIORITIES 

- Improving quality and equity of selected institutions and enhancing governance of the higher education system
- Investing in infrastructure and facilities, such as libraries, labs, computers, and internet
- Implementing e-governance initiatives, such as online modules for UC submission, laptop distribution, pensioners' portal, etc.
- Promoting online and blended learning, and providing access to digital resources and platforms

Source: Conversation with Bing, 11/27/2023
(1) Odisha Higher Education Program for Excellence and Equity. https://projects.worldbank.org/en/projects-operations/project-detail/P160331.
(2) How Odisha's university paves the way for higher education expansion. https://www.indiatoday.in/education-today/news/story/how-odishas-university-paves-the-way-for-higher-education-expansion-2457148-2023-11-02.
(3) Home | Higher Education Department. https://dhe.odisha.gov.in/.
(4) Department of Higher Education (Odisha) - Wikipedia. https://en.wikipedia.org/wiki/Department_of_Higher_Education_%28Odisha%29.


Rashtriya Uchchatar Shiksha Abhiyan

RUSA stands for Rashtriya Uchchatar Shiksha Abhiyan, which is a central government scheme to improve the access, equity, and quality of higher education in India. It aims to support the state governments in planning and developing their higher education systems through grants and reforms. Odisha joined RUSA in 2013 and has received funds for infrastructure and facilities, e-governance initiatives, online and blended learning, and quality enhancement¹²³.

Source: Conversation with Bing, 11/27/2023
(1) Odisha – RUSA. http://rusa.nic.in/odisha/.
(2) About RUSA | Higher Education Department - Odisha. https://dhe.odisha.gov.in/Schemes-and-Scholarship/RUSA/About-RUSA.
(3) Overview – RUSA. https://rusa.nic.in/odisha/overview/.


Wednesday, November 15, 2023

Generative AI and LLM - A primer

1. Background and Evolution of Large Language Model (LLM)

LLM is a Generative model that can input large set of unstructured data and generate large volume of textual output.

1.1 A timeline of LLM evolution :-

The development of LLM is not new, It went through a gradual evolution since 2002-3: 

  • 2003 -           Bag of words : ML to do natural Language Processing NLP
  • 2008 -           TF-IDF : Multi-task Learning
  • 2013 -           Co-occurrence Matrix : Word embeddings
  • 2013 -           Word to Vec/G Love:  NLP Neural Nets
  • 2014 -           Seq to Seq Learning
  • 2015 -           Transformer Models , Attention 
And then comes the explosion of development on LLM:
  • 2019 -           ELMOs/BERT/XLNet : Pre-trained Models
  • Nov 2022 -   OpenAIs GPT3.5
  • Dec 2022 -   Google's MedPaLM
  • Feb 2023 -   Amazon's Multimodal-CoT
  • Feb 2023 -   Meta's LLaMA
  • Feb 2023 -   Microsoft's Kosmos-1
  • Mar2023 -   Salesforce's einstien GPT
  • Mar 2023 -  OpenAI's GPT-4
  • Mar 2023 -  Google's Bard
  • Mar 2023-   Bloomberg's LLM
  • Apr 2023 -   Amazon's Bedrock
1.2 Genesis of Transformer Model  : (Ref Research Paper : Google's Attention is all you need, 2017)
[Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017)]

Earlier to 2016, Deep learning models were using Recurring Neural Network (RNN), or Neural Network deep-learning based. These were not easy to scale, architecture was linear, sequential, computing one output to pass into next input. Google got in transformer blocks, where it essentially modelled non sequentially. example a sentence to be processed word by word, transformer uses Attention to build a relationship to other words in the input sequence as a block. This makes thinking paralelly, scale much faster, revolution in architecture. Volumes of inputs increased manifold from GPT1, 2 and now 3 and 4 where corpus of data to train models kept increasing with billions of data sets. Transformers brought in the key revolution to LLM, in that whie it still implements encoder-decoder architecture, it does not rely on the use of recurrent neural networks..

The transformer architecture dispenses of any recurrence and instead relies solely on a self-attention (or intra-attention) mechanism. 

In terms of computational complexity, self-attention layers are faster than recurrent layers when the sequence length n is smaller than the representation dimensionality d …

– Advanced Deep Learning with Python, 2019.

Transformers can capture global/long range dependencies between input and output, support parallel processing, require minimal inductive biases (prior knowledge), demonstrate scalability to large sequences and datasets, and allow domain-agnostic processing of multiple modalities (text, images, speech) using similar processing blocks.


1.3 Three basic sort of LLMs (as per "Attention is all you need" paper*):-

The encoder-decoder architecture has been extensively applied to sequence-to-sequence (seq2seq) tasks for language processing. Examples of such tasks within the domain of language processing include machine translation and image captioning.  

The earliest use of attention was as part of RNN based encoder-decoder framework to encode long input sentences [Bahdanau et al. 2015]. Consequently, attention has been most widely used with this architecture.

– An Attentive Survey of Attention Models, 2021.

> Encoder-Only
> Decoder-only
> Encoder-Decoder.

Let's see what these are :



        1.3.1 Encoder Only:  Ex : GPT/OpenAI (content in same language)
       Compacts/encodes one set of input into something like sentiment analysis

                            - Popularized via successful architectures like BERT*
                            - Very good for predictive use on unstructured data

Encoder-only models are still very useful for training predictive models based on text embeddings versus generating texts.
*[Over the years, various encoder-only architectures have been developed based on the encoder module of the original transformer model outlined above. Notable examples include BERT (Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018) and RoBERTa (A Robustly Optimized BERT Pretraining Approach, 2018). 
BERT (Bidirectional Encoder Representations from Transformers) is an encoder-only architecture based on the Transformer's encoder module. The BERT model is pretrained on a large text corpus using masked language modeling and next-sentence prediction tasks.]

        1.3.2 Decoder Only -  Ex: BERT Architecture : Lot of text data and need one particular                 Output like sentiment or topic of a discussion 
                             - Popularised via original GPT models
                             - Driving the Gen AI market buzz

Decoder-only models are used for generative tasks including Q&A. 
The GPT (Generative Pre-trained Transformer) series are decoder-only models pretrained on large-scale unsupervised text data and finetuned for specific tasks such as text classification, sentiment analysis, question-answering, and summarization. The GPT models, including GPT-2, (GPT-3 Language Models are Few-Shot Learners, 2020), and the more recent GPT-4, have shown remarkable performance in various benchmarks and are currently the most popular architecture for natural language processing.


        1.3.3 Encoder-Decoder : Compacts input into an output. Ex : French to English translation 
                            - The original paper creator transformer architecture
                            - Translation tasks, cross attention

Encoder-decoder models are typically used for natural language processing tasks that involve understanding input sequences and generating output sequences, often with different lengths and structures. They are particularly good at tasks where there is a complex mapping between the input and output sequences and where it is crucial to capture the relationships between the elements in both sequences. Some common use cases for encoder-decoder models include text translation and summarization.


Some notable examples of these new encoder-decoder models include


*Ref: 
https://magazine.sebastianraschka.com/p/understanding-encoder-and-decoder


1.5 LLMs are based on the three types of building blocks 

        1.5.1 Attention:  

In the context of LLM, attention is defined as a mechanism that allows the model to selectively focus on different parts of the input text. This mechanism helps the model attend to the input text’s most relevant parts and generate more accurate predictions

The use of attention in LLMs is to improve the model’s ability to understand the context of the input text and generate more coherent and relevant output. Attention mechanisms in LLMs, particularly the self-attention mechanism used in transformers, allow the model to weigh the importance of different words or phrases in a given context.

There are two types of attention mechanisms in LLMs: self-attention and cross-attention

    Self-attention is used to weigh the importance of different words or phrases within the same input text, 

    Cross-attention is used to weigh the importance of different words or phrases between two different input texts.

The measurement of attention in LLMs is done by calculating the attention weights assigned to each word or phrase in the input text. These weights are calculated using a softmax function, which normalizes the weights and ensures that they sum up to 1

Here are a couple of examples of how attention is used in LLMs:

  1. In machine translation, attention is used to align the source and target sentences and generate more accurate translations 
  2. In question answering, attention is used to identify the most relevant parts of the input text that can help answer the question 


        1.5.2 Parallelism and Scalability: 

LLM stands for Large Language Model. It is a machine learning model that is trained on large amounts of data to generate text. LLMs are used in various natural language processing tasks such as language translation, text summarization, and question answering .

Parallelism is used to train the model faster by distributing the workload across multiple processors or GPUs. There are two types of parallelism: data parallelism and model parallelism.

  • Data parallelism involves splitting the data into smaller batches and processing them in parallel across multiple processors or GPUs. This technique is useful when the model is too large to fit into a single GPU memory.

  • Model parallelism involves splitting the model into smaller parts and processing them in parallel across multiple processors or GPUs. This technique is useful when the model is too large to fit into a single processor or GPU memory.

Scalability is used to train the model on larger datasets or with more complex architectures. Scalability can be measured in terms of speedup and efficiency.

  • Speedup is the ratio of the time taken to complete a task on a single processor or GPU to the time taken to complete the same task on multiple processors or GPUs. A higher speedup indicates better scalability.

  • Efficiency is the ratio of the speedup to the number of processors or GPUs used. A higher efficiency indicates better scalability.

Here are a couple of examples of LLMs:

  1. GPT-3: It is a state-of-the-art LLM developed by OpenAI that has 175 billion parameters. It is used for various natural language processing tasks such as language translation, text summarization, and question answering 1.

  2. BERT: It is another popular LLM developed by Google that has 340 million parameters. It is used for various natural language processing tasks such as sentiment analysis, named entity recognition, and question answering 1.


         1.5.3 Sequence Modeling:  


2. GenAI - Popular ones being used in 2023-24 




You can play around the GenAI Use cases in one place with example, you can try  playground.katonic.ai

In tale above, what we are saying is LLaMA is a good open source model for text output. codellama-7b-instruct has 7billion constructs that can be used for writing code. 

Prompts are inputs that can be asked to LLM for desired output. You have to use inherent prompt inputs for best output. This is an engineering discipline called Prompt Engineering.

3. What does Generative AI do?

Traditional AI used to take time in development, iterations, deployment, consumption, data training etc.
Gen AI can be quickly standing in a matter of weeks. 

Gen AI can basically do one of the following functions and some examples:-

3.1. Summarization      : Regulatory Guidelines/ Risk reports/ UW/Claims/policy/ Corporate Functions
3.2. Reference & Co-Pilot : Extract key information like Information extraction/ Risk Analysis/ Sentiment mining/ Fraud/ Event detection/ Web mining/ CX/CJ Insights
3.3. Expansion: Automated mails/ descriptions/ qualitative reports/ Synthetic data/ Advisory-B2B/ B2C
3.4. Transformation       : Change Language structure Ex Translation/ Code writing/ Data format change/ Tone change/ AI driven BI

4. Ways to use LLM APIs
  • Integrate to other apps
  • Virtual Assistants
  • Developer Co-Pilot
  • Custom Applications
5.  Example Use case from Insurance:-
  • Content Generation: Lead generation/ Onboarding/ Customer Management/ Delinquency & Foreclosures
  • Workflow management: 
  • Client Experience and Interaction
  • Security Compliance
  • Workflow Optimization
6. There are 2 broad patterns in which use cases fall:-

6.1. Retrieval Augmented Generation (RAG)- Retrieve and answer in context (in-context learning)
Agent based architecture/ Fine tuning for optimization/  

Question/Task > LLM > (Indexing) Indexed Query > Vector Store >  Give Context > Contextual Prompt Creation > LLM> Output > Output parsing> Answer 

LLM is fixed/OOB domain agnostic GPT3/4. You can give context to ask the question.
Context can be given in zero shot learning/ few shot learning (like examples of desired output).
No data leakage problem. 

If you pass data directly (fine-tuning -> given this context, I need examples of right answer, then model aligns) into LLM, it can give direct answers. but it has a security challenge. Fine tuning increases accuracy and alignment. But data can become stale, as it's not always live. It is recommended to go with RAG which is outside the LLM model, instead of fine tuning the model through your own data.

So in the Underwriting example:-
    a) Agent-Assistance : smart bot to enable agent answer diff customer/ prospect questions, Tap into existing contractual docs, guidelines, benefits, calculations, research & insights ex Knowledge management, chatbot,...
    b) Underwriting/Risk Co-pilots for Mortgage : The co-pilot helps underwriter go through several steps to assess risk, summarize the qualifying income through reviewing and identifying various sources of income, do some appraisals, contextualize credit and assets. LLM can curate much better than analytic AI. Automatically analyze live data, write output, and generally help the underwriter 

6.2. Multi-Hop/ Multi-stage Problem Solving :  
    Insight Agents: conversational business intelligence, data quality assurance, analytics and insights co-pilot, decision support agent.

Multi-Hop problem solving is different from RAG. Some large LLMs have logic and reasoning built. You can build agents to build say conversational BI instead of conversational Bot, which is fully textual. Multi-hop can build insights, cross match data, chain of thoughts etc. Reason and Act - REACT is a sequence of steps to accomplish a task before output can be given. More complex reasoning can also be tried. You can need graphical analysis example concentration of claims that can point to one garage , or relationship between claims... Knowledge graphs between data sets can throw up unique intelligence based on simple textual question prompts. Can help in decision support systems. 


7. Some concepts to know here:-
- May require fine tuning or reform of the RAG : To change the model as per your data. Here, the use your own data sets to train the model. Private Trainership is available from GPT4. 

        -  SFT and RLHF

In the context of LLMs, SFT stands for Supervised Fine-Tuning. It is a technique used to fine-tune a pre-trained LLM on a specific task by providing it with labeled examples.

RLHF stands for Reinforcement Learning from Human Feedback. It is a method used to train LLMs to align their output with human intentions and values. RLHF involves teaching an LLM to understand human preferences by assigning scores to different responses from the base model. The goal is to use the preference model to alter the behavior of the base model in response to a prompt.

Here are some examples of how RLHF is used in LLMs:

  1. ChatGPT: It is a state-of-the-art LLM developed by AssemblyAI that uses RLHF to learn human preferences and provide a more controlled user experience.

  2. NVIDIA SteerLM: It is a technique developed by NVIDIA that uses RLHF to customize LLMs during inference.


- Foundational Model shift            

- If trainings are not proper, LLM can hallucinate, missing context.

8. SUMMARY of ARCHITECTURES
For completeness of text, let me mention that these are the main Attention-based Architectures:-
8.1. Encoder-Decoder (RNN based)
8.2. Transformer Model (non-RNN)
8.3. Graph Neural Networks (GNN)
8.4. Memory Augmented Neural Networks

A graph is a versatile data structure that lends itself well to the way data is organized in many real-world scenarios. We can think of an image as a graph, where each pixel is a node, directly connected to its neighboring pixels …

– Advanced Deep Learning with Python, 2019.

Of particular interest are the Graph Attention Networks (GAT) that employ a self-attention mechanism within a graph convolutional network (GCN), where the latter updates the state vectors by performing a convolution over the nodes of the graph.

In the encoder-decoder attention-based architectures, the set of vectors that encode the input sequence can be considered external memory, to which the encoder writes and from which the decoder reads. However, a limitation arises because the encoder can only write to this memory, and the decoder can only read.

Memory-Augmented Neural Networks (MANNs) are recent algorithms that aim to address this limitation. 

The Neural Turing Machine (NTM) is one type of MANN. It consists of a neural network controller that takes an input to produce an output and performs read and write operations to memory. Examples of applications for MANNs include question-answering and chat bots, where an external memory stores a large database of sequences (or facts) that the neural network taps into. 


9. Policy and Principles around GenAI:-
    Guardrails, policy enforcements, or Governance regarding inputs to GenAI is an open subject. There could be LLM tools or technology that can enforce these policy guardrails that can ensure security,privacy etc.