Retrieval-Augmented Generative AI with Fine-Tuned Mistral-7B Model and llama
- Kyuta Yasuda
- Jan 24
- 2 min read
Overview
This project combines Retrieval-Augmented Generation (RAG) with a fine-tuned generative AI model to create a personalized conversational assistant. It dynamically retrieves context from uploaded documents and generates accurate, context-aware responses. The assistant, named KyeGPT, serves as a virtual representation of me, answering questions about my professional background, skills, and experiences.
Key Features
Semantic Search for Contextual Retrieval:
Utilizes a dense vector embedding model (BAAI/bge-small-en-v1.5) for semantic search, ensuring contextually relevant documents are retrieved based on the meaning of queries rather than keyword matching.
Fine-Tuned Language Model:
Fine-tuned Mistral-7B-Instruct-v0.2-GPTQ model using PEFT (Parameter-Efficient Fine-Tuning) to generate responses tailored to my resume, cover letter, and project experiences.
Dynamic Prompt Engineering:
Integrates retrieved content into prompts, ensuring the generated responses are enriched with accurate and contextually relevant information.
Deployment-Ready Packaging:
Saves the fine-tuned model and tokenizer for deployment, enabling the conversational assistant to be hosted on platforms such as AWS EC2.
How It Works
Document Processing:
Upload documents (e.g., resumes, cover letters) to be indexed.
Chunk the documents into smaller sections for efficient retrieval.
Retrieval-Augmented Generation:
Retrieve the top 3 most relevant document chunks based on a similarity threshold.
Pass the retrieved content to the fine-tuned model as additional context for query-based text generation.
Response Generation:
Generates accurate and concise answers to queries using the integrated retrieval and generation pipeline.
Technical Stack
Language Model: Mistral-7B-Instruct-v0.2-GPTQ
Embedding Model: BAAI/bge-small-en-v1.5
Libraries:
LlamaIndex: For document indexing and retrieval.
Transformers: For model inference and tokenization.
PEFT: For fine-tuning the language model efficiently.
Tools and Frameworks:
Python, Flask (for API), and AWS (for deployment).
Innovations
Seamless Integration of RAG with Generative AI:
Combines the strengths of retrieval systems with generative models to enhance response accuracy and contextual relevance.
Parameter-Efficient Fine-Tuning:
Optimized training using PEFT to ensure computational efficiency while achieving high-quality output.
Sample Query and Response
Query: "Why should I hire you?"
Generated Response: "As a highly motivated individual with expertise in machine learning, data analytics, and AI, I bring a unique combination of technical skills, problem-solving ability, and teamwork. My journey from being a disciplined baseball player to excelling in data science has equipped me with resilience and adaptability to tackle complex challenges effectively."
Future Improvements
Advanced Retrieval Techniques:
Explore graph-based approaches or hybrid retrieval models for better contextual understanding.
User Interaction:
Develop a web-based GUI for easier interaction with the assistant.
Scalability:
Deploy the system using containerization (e.g., Docker) for efficient scaling across multiple environments.
Comentarios