Architecture IDEA
A reusable RAG platform serving an HR and an electrical-engineering-rules assistant in production.
A production RAG platform for IDEA, currently running two internal assistants. Guillermo is the company’s sole AI engineer and mentors one engineer building a LangGraph agent on the platform.
Problem
Two assistants were needed for different audiences — one for HR questions, one for electrical-engineering rules — each with its own documents, retrieval needs, and definition of a good answer. Built separately, they would have meant two ingestion pipelines, two retrieval stacks, and two ways to measure quality. The task was not one chatbot. It was a platform that could stand up new assistants without rebuilding the same machinery each time.
Constraints
- Two distinct assistants must run from one shared codebase.
- New assistants must be addable without rewriting ingestion or retrieval.
- Answer quality must be measured, not assumed, before each release.
- Every retrieval and generation step must be observable in production.
- A second engineer must be able to contribute and own a component.
Approach
As IDEA’s only AI engineer, I designed, built, and now lead its production RAG platform on Docker with Azure CI/CD, PostgreSQL for application data, and Qdrant as the vector store. The architecture turns on three reusable parts.
First, an endpoint factory generates the API surface for each assistant from shared building blocks — because otherwise every new assistant copies and drifts from the last one.
Second, reusable ingestion and retrieval libraries, so both assistants draw on one tested pipeline rather than two parallel ones — because shared code is the only way a platform stays a platform.
Third, an integrated RAGAS evaluation module wired into the platform, so retrieval and answer quality are scored as a build step, not checked by hand. I added Langfuse observability across the stack so every trace — retrieval, ranking, generation — is inspectable once an assistant is live.
I am the company’s only AI engineer; I lead the platform and mentor one engineer building a LangGraph agent against it, defining the contract their component must meet.
PLATFORM
Claude did
Drafted boilerplate for the endpoint factory and the reusable library interfaces, proposed first-pass retrieval code, and generated scaffolding for the RAGAS evaluation harness from a written brief.
Guillermo did
Owned the platform architecture, decided the split between shared libraries and per-assistant code, set the RAGAS thresholds the platform measures against, and reviewed every contribution — including those from the engineer I mentor — before it merged.
One exchange
The reusable-vs-specific boundary is the whole design. I held the line that ingestion and retrieval stay in shared libraries while only assistant-specific config diverges — because otherwise the second assistant forks the pipeline and the platform stops being one thing.
Stack
- Models · Orchestration: LangGraph agent (in progress) · endpoint factory
- Data: PostgreSQL · Qdrant
- Evaluation · Observability: integrated RAGAS module · Langfuse
- Infra: Docker · Azure CI/CD
Outcome
- Two assistants live in production: an HR assistant and an electrical-engineering-rules assistant.
- Both running on one shared platform: endpoint factory, reusable ingestion and retrieval libraries, integrated RAGAS evaluation.
- Reported as strong metrics in production — [PLACEHOLDER — Guillermo to confirm: specific figures, e.g. RAGAS scores or HR/engineering query volume per week].
- A second engineer is building a LangGraph agent on the platform under my mentoring.
Lessons
- A platform is the boundary between shared and specific code — defend it on day one.
- Build evaluation into the pipeline; quality you do not measure is quality you only hope for.
- Observability is not optional once an assistant is answering real users.
Next
- Ship the LangGraph agent the mentored engineer is building against the platform.
- Replace the metrics placeholder with measured RAGAS and usage figures.