A RAG-based Classification Tool for Automating and Standardising Competency Mapping in Public Service

Conference: Symposium on Data Science and Statistics (SDSS) 2026
04/29/2026: 1:15 PM - 2:45 PM CDT
Lightning 

Description

1 Introduction

The Singapore public service is the largest employment in Singapore managing a diverse workforce (e.g., policy, medical, and military), utilizes a comprehensive competency framework to ensure a good fit for the roles. This framework comprises approximately 500 functional competencies with defined proficiency levels, serving as a common language for strategic HR planning. However, the scale and diversity of roles make manual mapping of jobs to this framework a cumbersome, time-consuming, and subjective exercise. This ambiguity leads to inconsistent data across agencies, hindering sector-wide talent analysis and strategic insights

2 Methods
To address these challenges, we have developed an automated classification tool. The system leverages a Retrieval-Augmented Generation (RAG) architecture where we vectorized the competency framework using embedding model (nomic-embed-text-v1.5) and stored in a Faiss index for high-speed retrieval. This retrieval system is integrated with a large language model (LLM), which analyzes a given job title and description. The tool retrieves the most relevant competencies and then prompts the LLM to recommend the appropriate competency and proficiency level.

3 Data/Results
The model, iteratively developed and validated with HR leaders, achieved 90% accuracy in alignment with human expert validation. This innovation has resulted in a 30% reduction in the time required for manual tagging and assessment. More significantly, it ensures the generation of uniform, standardized data, enabling robust, consolidated insights for strategic workforce planning across the entire public sector.

Keywords

Large Language Models (LLM)

Retrieval-Augmented Generation (RAG)

Automated Classification 

Presenting Author

Lian Ping Ler, Ministry of Manpower

First Author

Zhihan Chen

Tracks

AI and LLM Applications
Symposium on Data Science and Statistics (SDSS) 2026