RAG workflow project

Ken Munson
Nov 15, 2025
4 min read

Updated: Nov 17, 2025

Here’s a summary (with a little bit of detail) of the full Load → Chunk → Embed → Save to FAISS → Reload → Query workflow I built.

I don't guess I have to say that I used ChatGPT 5.1 for help, especially with the coding in Python as well as some deep troubleshooting this code.

Because I had to spend a lot of time troubleshooting a Chroma (Chroma is an open-source vector database for AI applications) on Windows 11, this project took longer that it should have. If you want to recreate this project (which I highly recommend), and you are going against the grain and doing the development on a windows machine, high highly recommend staying away from Chroma as your vector embedding facility. Trust me on this.

All things considered, this first part of the project, detailed here, probably took 30 hours. Would have been 20 minus the Chroma drama. With what I have written below, and me eventually getting the repo on GitHub, you could probably get it done in 5, depending on OS and API and other issues.

The project:

REPO at https://github.com/kmunson007/RAG-Workflow

RAG Pipeline Overview RAG - Retrieval Augmented Generation (adding super specific things you want to a language model so it can be "informed" about that information - that it wouldn't normally have access to - like a specific company policy, etc.

Load → Chunk → Embed (Vertex AI) → Save to FAISS → Reload → Query

Below is a “conceptual summary” plus the important implementation details you actually used in our project.

---

1. Load Documents

Purpose:

Bring raw files (PDF, TXT, etc.) into memory so they can later be split into chunks and embedded.

* I created a `data/` directory and placed the input PDF there.

* The Python ingestion script (`ingest.pipeline`) scans that directory.

It uses *LangChain document loaders**:

* `PyPDFLoader` → extracts text page by page.

* `TextLoader` → reads raw text files.

Output:

A list of raw `Document` objects, each containing:

* `page_content`

* `metadata` (filename, page number, etc.)

Your log output example:

```

Loaded 5 raw docs.

```

(The 5 docs came from splitting the PDF into 5 pages.)

---

2. Chunk the Documents

Purpose:

Break long documents into small, searchable pieces.

LLMs and embedding models work much better on short, coherent chunks.

How I did it:

I used LangChain’s `RecursiveCharacterTextSplitter`:

```python

RecursiveCharacterTextSplitter(

chunk_size=1000,

chunk_overlap=200,

)

```

This produces overlapping chunks so no important sentence gets cut in half.

Output example:

```

Created 16 chunks.

```

So: 5 pages (in the pdf I uploaded) → 16 good, LLM-friendly chunks.

---

3. Embed Each Chunk (Vertex AI Embeddings)

Purpose:

Convert text into numerical vectors so FAISS can index them.

-- Embedding a chunk using Vertex AI embeddings involves converting a segment of text (a "chunk") into a dense, numerical representation called a vector. This vector, also known as an embedding, captures the semantic meaning of the text and enables efficient retrieval and comparison of related information.

How I did it:

* We used Google’s Vertex AI through `langchain_google_vertexai`.

Model: *text-embedding-004**

* Configured with your Project ID + Region (`us-central1`).

```python

emb = VertexAIEmbeddings(

model_name="text-embedding-004",

project=settings.project_id,

location=settings.location,

)

```

Each chunk becomes a high-dimensional vector (size ≈ 768–1024 depending on model).

These vectors represent the semantic meaning of the text.

---

4. Store Vectors in a Persisted FAISS Index

Purpose:

Save embeddings locally so you can reload them later and run similarity search.

I switched away from Chroma (because of Windows segfaults + dependency conflicts) and moved to FAISS (flat index) persisted to disk. This was the breakthrough. I spent 10 hours trying to get this dumb Chroma vector store to work!

How I did it:

```python

vs = FAISS.from_documents(chunks, emb)

vs.save_local(settings.faiss_dir)

```

This writes:

```

vectorstore/

index.faiss ← the FAISS database

index.pkl ← metadata (docs + embeddings)

```

Ingest output example:

```

Done. Vector store persisted at: vectorstore (approx. vectors=16)

```

---

5. Reload the Index Later

To query the vectorstore, we reload it from disk:

```python

vs = FAISS.load_local(settings.faiss_dir, emb)

```

Because FAISS stores raw vectors, you must reload using the same embedding model.

---

6. Query: Similarity Search → LLM Answering

Purpose:

Take a user question, find the most relevant chunks, and pass them into the LLM.

I implemented two modes:

Similarity Search

Returns the nearest chunks based on vector distance:

```python

docs = vs.similarity_search_with_relevance_scores(query, k=5)

```

Max Marginal Relevance (MMR)

Returns diverse chunks to reduce redundancy:

```python

docs = vs.max_marginal_relevance_search(query, k=5, fetch_k=10)

```

Answering with the LLM (Gemini 2.0 Flash-001)

Then:

1. Take the retrieved chunks

2. Format them as context

3. Pass both the context + user question to a Gemini model

This happens in `app/chain.py`.

---

7. CLI Tool to Ask Questions

The script:

```bash

python -m scripts.query_cli "What does the document say about tracking ML experiments?"

```

Steps:

1. Query is received

2. FAISS retrieves top chunks

3. Gemini 2.0 Flash-001 is invoked

4. You see the final answer

---

Why This Works (Conceptually)

RAG = Retrieval Augmented Generation

Instead of asking the LLM to "remember" the document, it did the following:

1. Load

2. Chunk

3. Embed

4. Store in FAISS

5. Retrieve relevant chunks at query time

6. Give retrieved chunks to the LLM

7. LLM answers with grounded citations

This ensures:

* Correct answers

* No hallucinations

* Answers tied to your actual document

* Cost-efficient (embeddings are cheap, retrieval is local)

---

This is a fully functioning local RAG pipeline backed by FAISS with Vertex AI embeddings and Gemini 2.0 — the same components used in production systems at Google, OpenAI, and other major enterprise RAG deployments.

Quantum Journey

RAG workflow project

Recent Posts

Comments

Quantum Journey