Challenge
HR policies, SOPs, and benefits documents pile up across PDFs and Word files. When a staff member or manager has a question, the answer exists — somewhere — but finding it means digging through folders, and the same questions reach HR again and again. Institutional knowledge sits in documents no one can search.
What we built
An AI retrieval system that works over the organization's own documents:
- An ingestion pipeline that parses PDFs and Word files
- Semantic chunking that keeps each passage's context intact
- Vector embeddings for meaning-based search, not just keyword matching
- A search interface that returns answers and cites the document they came from
It runs in Python with FastAPI and a local vector store, so sensitive HR documents stay in the organization's control rather than being handed to a third-party service.
Outcomes
- Instant, sourced answers instead of digging through folders.
- Fewer repetitive questions landing on HR.
- Institutional knowledge that stays findable as documents and staff change.
- A private, self-hosted approach that keeps sensitive HR data in-house.