A domain-specific language model optimized for finance, healthcare, and legal applications
Shakti-250M is a Small Language Model (SLM) designed to deliver efficient and targeted performance across domain-specific applications in finance, healthcare, and legal services. With 250 million parameters, it offers a balanced trade-off between computational efficiency and task-oriented language capabilities, making it suitable for deployment on resource-constrained environments such as mobile devices, IoT systems, and edge computing platforms. Built on the Shakti-2.5B framework and optimized for smaller devices, Shakti-250M is ideal for enterprises needing accurate domain-specific language capabilities without heavy compute requirements.
Fine-tuned on specialized datasets for accurate financial forecasting, medical question answering, and legal summarization.
Designed for smartphones, tablets, and IoT devices with real-time inference capabilities.
Handles multi-turn dialogues with context-aware follow-ups.
Shakti-250M includes 250 million parameters and 16 layers, offering a solid balance between performance and efficiency. It uses a model dimension of 1024 and a 4096 FFN (Feed Forward Network) dimension for handling complex language tasks.
Shakti-250M is trained on a structured combination of general-purpose and domain-specific datasets to ensure both broad language coverage and specialized knowledge in finance, healthcare, and legal domains. Using domain-specific datasets is crucial because general language models often struggle with terminology, structure, and contextual nuances specific to professional fields.
Builds foundational language understanding and introduces domain-relevant patterns.
Improves performance on instruction-following and structured domain tasks.
Aligns model outputs with preferred user responses in domain-specific tasks.
Shakti-250M was trained in multiple stages, progressing from general and domain-specific understanding to specialized instruction following and human preference alignment.
Conducted on large-scale general corpora and domain-specific texts (finance, legal) using a next-token prediction objective. Standard Transformer-based architecture with rotary positional embeddings (RoPE) and mixed-precision training (FP16 + bfloat16) was used to capture both general language patterns and specialized terminology.
Focused on domain-specific instruction datasets in healthcare, finance, and legal sectors. Tasks included medical Q&A, legal summarization, patient-doctor dialogues, and finance-related discussions to improve performance on structured domain tasks.
Used preference-labeled datasets across healthcare, legal, and finance domains to align the model with preferred user responses. This stage replaced RLHF to enable computational efficiency and ensure high-quality, real-time outputs suitable for deployment on edge devices.
Shakti-250M delivers strong performance while being smaller and more efficient than many larger models like Boomer-1B and Llama 3.2 1B. Despite using fewer training tokens, it handles a wide range of NLP tasks effectively due to its clean, high-quality training data and optimized learning process.
This model proves that smart training and well-curated datasets matter more than just size. Its lightweight design makes it ideal for real-world applications on mobile devices, IoT systems, and other resource-limited environments.
| Benchmark | Shakti-250M | Boomer-1B | Boomer-634M | Qwen2.5-0.5B | SmolLM-360M | Llama 3.2 1B |
|---|---|---|---|---|---|---|
| MMLU | 28.98 | 25.92 | 25.23 | 47.5 | 34.4 | 32.2 |
| BigBenchHard | 13.75 | 28.65 | 21.11 | 20.3 | 24.4 | 30.93 |
| IFEval | 12.83 | 23.81 | 22.22 | 27.9 | 19.8 | 59.5 |
| Hellaswag | 29.96 | 31.66 | 34.08 | 52.1 | 51.8 | 41.2 |
| Anli | 33.40 | 32.57 | 27.5 | 26.85 | - | 22.56 |
| Piqa | 63.22 | 60.78 | 62.57 | 72.50 | 71.6 | 80.64 |
| OpenbookQA | 16.60 | 22.56 | 35.76 | 30.73 | 37.2 | 37 |
| Truthfulqa(MC2) | 20.69 | 25.69 | 27.57 | 40.2 | - | 30.7 |
| WinoGrande | 52.97 | 45.79 | 51.07 | 56.3 | 52.8 | 60 |
| ARC Challenge | 41.20 | 40.78 | 62.57 | 35.6 | 50.1 | 32.8 |
| SQuAD | 23.25 | 67 | 57.5 | 52.94 | - | 49.2 |
| Trivia QA | 1.68 | 25.25 | 2.73 | 12.5 | 9.1 | 25.69 |
| GSM8K | 2.3 | 1.5 | 0.91 | 41.6 | - | 44.4 |
| MATH | 21.71 | - | 23.38 | 19.5 | - | - |
Shakti-250M demonstrates exceptional performance in the healthcare and finance domains, making it a versatile model for domain-specific applications. In healthcare, it excels in tasks requiring complex medical reasoning and shows strong capabilities in understanding and applying clinical knowledge. The model's compact size and efficiency make it an excellent choice for edge devices and IoT deployment in both healthcare and finance applications.
| Benchmark | Shakti-250M | Phi-1.5-1.3B | Gemma-2B | Opt-2.7B |
|---|---|---|---|---|
| MedQA | 41.25 | 31.11 | 29.22 | 27.1 |
| MedMCQA | 34.87 | 34.31 | 30.22 | 25.63 |
| PubMedQA | 58.21 | 67.8 | 66.4 | 60.8 |
| MMLU Professional Medicine | 28.4 | 29.04 | 18.01 | 16.54 |
| MMLU Medical genetics | 31.42 | 42 | 28 | 23 |
| MMLU College Medicine | 30.45 | 37.57 | 31.21 | 24.86 |
| MMLU College Biology | 31.25 | 34.03 | 33.33 | 20.14 |
| MMLU Clinical Knowledge | 36.78 | 46.04 | 35.47 | 23.02 |
| MMLU Anatomy | 39.42 | 39.26 | 37.04 | 32.59 |
| PatronusAI finance-bench-test | 32.2 | - | - | - |
| jan-hq finance-benchmark mcq | 23.1 | - | - | - |
The Shakti-250M is a compact and efficient language model specifically designed for domain-focused applications in finance, healthcare, and legal sectors. With just 250 million parameters, it balances performance and resource usage, making it ideal for use on mobile, IoT, and edge devices. Despite its smaller size, Shakti-250M delivers strong results on domain-specific benchmarks, often outperforming larger models. Its fine-tuned datasets and optimization techniques help it handle complex tasks like legal summarization, financial forecasting, and medical Q&A with high accuracy. Overall, Shakti-250M proves that smart design and focused training can enable small models to excel in real-world, domain-specific applications.