top of page
Search

Retrieval-Augmented Generative AI with Fine-Tuned Mistral-7B Model and llama

  • Writer: Kyuta Yasuda
    Kyuta Yasuda
  • Jan 24
  • 2 min read

Overview

This project combines Retrieval-Augmented Generation (RAG) with a fine-tuned generative AI model to create a personalized conversational assistant. It dynamically retrieves context from uploaded documents and generates accurate, context-aware responses. The assistant, named KyeGPT, serves as a virtual representation of me, answering questions about my professional background, skills, and experiences.


Key Features

  1. Semantic Search for Contextual Retrieval:

    • Utilizes a dense vector embedding model (BAAI/bge-small-en-v1.5) for semantic search, ensuring contextually relevant documents are retrieved based on the meaning of queries rather than keyword matching.

  2. Fine-Tuned Language Model:

    • Fine-tuned Mistral-7B-Instruct-v0.2-GPTQ model using PEFT (Parameter-Efficient Fine-Tuning) to generate responses tailored to my resume, cover letter, and project experiences.

  3. Dynamic Prompt Engineering:

    • Integrates retrieved content into prompts, ensuring the generated responses are enriched with accurate and contextually relevant information.

  4. Deployment-Ready Packaging:

    • Saves the fine-tuned model and tokenizer for deployment, enabling the conversational assistant to be hosted on platforms such as AWS EC2.



How It Works

  1. Document Processing:

    • Upload documents (e.g., resumes, cover letters) to be indexed.

    • Chunk the documents into smaller sections for efficient retrieval.

  2. Retrieval-Augmented Generation:

    • Retrieve the top 3 most relevant document chunks based on a similarity threshold.

    • Pass the retrieved content to the fine-tuned model as additional context for query-based text generation.

  3. Response Generation:

    • Generates accurate and concise answers to queries using the integrated retrieval and generation pipeline.


Technical Stack

  • Language Model: Mistral-7B-Instruct-v0.2-GPTQ

  • Embedding Model: BAAI/bge-small-en-v1.5

  • Libraries:

    • LlamaIndex: For document indexing and retrieval.

    • Transformers: For model inference and tokenization.

    • PEFT: For fine-tuning the language model efficiently.

  • Tools and Frameworks:

    • Python, Flask (for API), and AWS (for deployment).


Innovations

  1. Seamless Integration of RAG with Generative AI:

    • Combines the strengths of retrieval systems with generative models to enhance response accuracy and contextual relevance.

  2. Parameter-Efficient Fine-Tuning:

    • Optimized training using PEFT to ensure computational efficiency while achieving high-quality output.


Sample Query and Response

  • Query: "Why should I hire you?"

  • Generated Response: "As a highly motivated individual with expertise in machine learning, data analytics, and AI, I bring a unique combination of technical skills, problem-solving ability, and teamwork. My journey from being a disciplined baseball player to excelling in data science has equipped me with resilience and adaptability to tackle complex challenges effectively."


Future Improvements

  • Advanced Retrieval Techniques:

    • Explore graph-based approaches or hybrid retrieval models for better contextual understanding.

  • User Interaction:

    • Develop a web-based GUI for easier interaction with the assistant.

  • Scalability:

    • Deploy the system using containerization (e.g., Docker) for efficient scaling across multiple environments.

 
 
 

Comentarios


bottom of page