A Retrieval-Augmented Generation (RAG)–based AI learning assistant for a large engineering-focused EdTech platform.
Cybermind Works created a Retrieval-Augmented Generation (RAG)–based AI learning assistant for Skill-Lync, a large engineering-focused EdTech platform. The goal was to develop a secure and scalable system that allows students to search through and understand thousands of hours of recorded courses, workshops, project materials, and platform information, while safeguarding paid content and maintaining academic integrity.
This solution was designed as an AI chatbot featuring structured content ingestion, intelligent search and retrieval, access-controlled responses, result reranking, monitoring, and ongoing evaluation.
Skill-Lync is a premium EdTech provider offering:
As the platform scaled, students faced difficulty navigating large volumes of content, while mentors and support teams handled repetitive questions. The client wanted to introduce AI in a way that augmented learning, reduced support load, and maintained strict control over paid content.
Cybermind Works created a production-grade Agentic RAG system that serves as a learning assistant rather than a typical chatbot. The system supports:
High-level view of the RAG-based learning assistant system

The platform is designed as a reliable learning intelligence system. The architecture prioritizes scalability, predictable costs, academic integrity, and learner trust, while remaining simple enough for dependable large-scale operation.
The system consists of six tightly integrated subsystems, each addressing a core challenge in deploying AI for large-scale learning platforms:

The ingestion pipeline is fully asynchronous and job-driven, allowing content updates to scale independently of learner traffic. This separation is essential because courses, workshops, and learning materials are frequently updated.
Ingestion is orchestrated using Node.js services, with each ingestion task pushed to Amazon SQS. This enables:
For videos:
Specialized open-source libraries are used to process documents like PDFs and PPTs:
Each content item is tracked using: updated_at, last_ingested_at, content hash. During every synchronization cycle, timestamps and hashes are compared. Only new or modified content is sent to the ingestion queue. This approach:
Content is chunked using a recursive character-based splitter, producing chunks of 800–1200 words. This method was chosen over semantic chunking due to scale and cost considerations, while still delivering stable retrieval performance. Each chunk is enriched with metadata such as course ID, lesson, workshop, timestamps, or page numbers. Embeddings are generated and stored in PostgreSQL using pgvector, enabling incremental re-indexing driven by timestamps and content hashes.

At query time, the system performs hybrid retrieval, combining semantic understanding with exact matching to mirror real learner behavior.

| Tool Name | Purpose | Key Responsibilities |
|---|---|---|
Knowledge Search Tool search_knowledge(query: string, max_results: number) | Retrieves authoritative, learner-accessible knowledge for answering queries |
|
Support Escalation Tool schedule_support_session(topic: string, priority: enum) | Escalates queries that cannot be safely or accurately answered by the system |
|
The platform uses a single-agent Agentic RAG architecture, avoiding linear prompt chains or complex multi-agent setups. This design keeps behavior predictable and production-ready while still enabling agent-style reasoning and tool usage.
A single RAG Agent manages the full reasoning loop per learner query, including:
For each query, the agent determines whether it relates to:
Learner identity and entitlements are resolved at the API layer, allowing downstream tools to naturally enforce access control. Before generating any response, the agent verifies that sufficient and relevant evidence exists. Speculative, ungrounded, or policy-violating responses are explicitly blocked.

In large learning platforms, simply "finding related content" is not enough. Early-stage AI systems often overwhelm users by passing too much loosely related material to the language model, which can result in fragmented answers, missed details, or confident-sounding but incorrect responses.
To ensure learners receive precise, trustworthy, and syllabus-aligned explanations, the system must carefully select only the most relevant evidence before generating an answer. To address this, Skill-Lync's AI assistant introduces an explicit reranking and grounding layer that acts as a quality gate between content retrieval and answer generation.
Retrieved results are passed through a cross-encoder reranker, implemented as a standalone Python service. It evaluates query–chunk relevance and selects the most useful context, typically the top 10–12 chunks.
The LLM receives only this reranked context when generating responses. All answers include explicit citations, such as timestamps or page numbers. If adequate evidence is unavailable, the system either declines to answer or requests clarification. This approach significantly reduces inaccuracies and builds learner trust.

Seeing how and why decisions are made is necessary to run an AI system at scale. Problems like inaccurate responses, increasing expenses, or poor performance become challenging to identify and dangerous to resolve in the absence of clear observability. The platform is made to be completely observable from beginning to end.
The platform is fully instrumented using Langfuse, capturing signals such as:

AI systems lose accuracy as curricula evolve and content expands. Even minor changes to prompts, retrieval logic, or models can subtly deteriorate answer quality or introduce inconsistencies across courses in the absence of systematic testing. We created a dedicated, ongoing evaluation framework to guarantee the learning assistant stays dependable, predictable, and in line with instructor intent.
Curated question sets are maintained across courses, workshops, and projects, and are continuously expanded as the platform evolves. Each change is evaluated using a combination of automated metrics and LLM-based evaluation, measuring:
Learner feedback plays a direct role in quality improvement. Flagged responses and recurring failure patterns are converted into new test cases and incorporated into the evaluation suite.
This section outlines the practical engineering challenges faced while building and operating the Skill-Lync AI chatbot in production.
Skill-Lync regularly updates course videos, workshop recordings, PDFs, and project documents. A naive approach of re-ingesting all content during every update quickly became impractical due to:
We implemented incremental ingestion using updated_at timestamps and content hashing:
While semantic chunking initially appeared attractive, it was quite expensive:
We adopted a recursive character-based text splitter with chunk sizes between 800–1,200 words:
Early implementations using only vector-based semantic search resulted in:
We implemented hybrid retrieval:
A learner asked: "What are the different types of joins explained in Everything about Database – 2.0, and when should each be used?"
Relevant information exists clearly within Everything about Database – 2.0, where joins are explained using:
along with usage scenarios
However, with pure semantic search, the system often retrieved:
Because vector search prioritizes semantic similarity, chunks that explicitly listed and compared specific JOIN types were sometimes ranked lower than broader conceptual text.
As a result:
Solution:
We implemented hybrid retrieval combining:
Semantic search (pgvector)
To capture conceptual explanations around database relationships
Lexical search (PostgreSQL Full-Text Search)
To guarantee retrieval of exact syllabus terms like:
Results from both systems were:
Even with hybrid retrieval, the top-K results (chunks) often contained partially relevant or noisy chunks, especially for longer or multi-part questions.
We introduced a cross-encoder reranking service (Python-based):
A learner asked: "How is an end-to-end e-commerce application structured in the program?"
Relevant information exists across multiple officially listed components:
However, initial retrieval often returned:
Passing all of these raw chunks to the LLM resulted in:
Solution:
We introduced a cross-encoder reranking service (Python-based) that operates after hybrid retrieval:
Only these refined chunks are passed to the LLM.
Skill-Lync operates under a strict paid-access model. Any leakage of content from unenrolled courses or workshops was unacceptable.
We enforced metadata-based access control at retrieval time, not during generation:
A learner enrolled just in The Complete Front-End Development – 2.0 asked: "Show me the Java Spring Microservices configuration used in the insurance policy project."
Without strict content gating:
This would inadvertently expose advanced backend design patterns or microservices architecture that the learner has not been taught yet in their current modules.
Solution:
We enforce metadata-based access control at retrieval time:
LLMs tend to generate confident answers even when supporting evidence is weak or missing, which is especially dangerous in an educational context.
We enforced a retrieval-first, grounding-controlled generation strategy:
A student asked: "Show the exact Spring Boot configuration used in the course to secure REST APIs using JWT authentication."
If the system failed to retrieve:
A naive LLM could:
This creates confusion for learners when their implementation differs from:
Solution:
We enforce a retrieval-first, grounding-controlled generation strategy:
If the required configuration or code is not present in retrieved content, the system:
Without strong observability, it is impossible to understand why answers fail, where costs spike, or how quality degrades over time.
We integrated Langfuse for full-stack observability:
Retrieval quality is highly sensitive to hyperparameters such as chunk size, hybrid retrieval weights, top-K limits, and reranker thresholds. Small tuning changes can silently degrade answer relevance, grounding quality, or citation accuracy across existing courses and workshops.
We introduced a continuously maintained Curated Test Case Set derived from real learner queries:
End-to-end query processing sequence diagram

Entity relationship diagram showing the core data structures

This engagement showcases Cybermind Works' ability to deliver production-ready AI systems built for real-world use and constraints. The platform functions as a core learning tool and integrates agentic orchestration, hybrid retrieval, and strict grounding and observability to provide accurate, citation-supported answers while preserving academic integrity and premium content quality.
Designed for continuous evolution, the system safely handles large content volumes, frequent updates, and evolving curricula without compromising reliability or learner experience.
About Us
Portfolio
Careers
CyberMind Works LLP
10/15, K.M Towers - 1st Floor, Chakrapani Road,
Guindy, Chennai, Tamil Nadu, 600042
Copyright © 2026, CyberMind Works | All rights reserved.