Shakti-250M Small Language Model

A domain-specific language model optimized for finance, healthcare, and legal applications

Introduction

Shakti-250M is a Small Language Model (SLM) designed to deliver efficient and targeted performance across domain-specific applications in finance, healthcare, and legal services. With 250 million parameters, it offers a balanced trade-off between computational efficiency and task-oriented language capabilities, making it suitable for deployment on resource-constrained environments such as mobile devices, IoT systems, and edge computing platforms. Built on the Shakti-2.5B framework and optimized for smaller devices, Shakti-250M is ideal for enterprises needing accurate domain-specific language capabilities without heavy compute requirements.

Model Capabilities

Domain Expertise

Fine-tuned on specialized datasets for accurate financial forecasting, medical question answering, and legal summarization.

Edge Efficiency

Designed for smartphones, tablets, and IoT devices with real-time inference capabilities.

Conversational Fluency

Handles multi-turn dialogues with context-aware follow-ups.

Architecture

Shakti-250M includes 250 million parameters and 16 layers, offering a solid balance between performance and efficiency. It uses a model dimension of 1024 and a 4096 FFN (Feed Forward Network) dimension for handling complex language tasks.

Key Architectural Features

•

Block Sparse Attention: Reduces memory and computation load during long-context processing while preserving accuracy.

•

Rotary Positional Embeddings (RoPE): Provides effective token position awareness without fixed sinusoidal patterns.

•

Sliding Window Attention Cache: Enables real-time streaming capabilities.

•

Pre-Normalization and SiLU Activation: Ensures numerical stability and gradient flow during deep model training.

•

LayerNorm and Dropout: Used throughout the stack for improved generalization.

Dataset Details

Shakti-250M is trained on a structured combination of general-purpose and domain-specific datasets to ensure both broad language coverage and specialized knowledge in finance, healthcare, and legal domains. Using domain-specific datasets is crucial because general language models often struggle with terminology, structure, and contextual nuances specific to professional fields.

Pre-Training

Builds foundational language understanding and introduces domain-relevant patterns.

General and Financial:

Custom Dataset
AIR-Bench/qa finance en

Legal:

Vidhaan/LegalCitationWorthiness

Supervised Fine-Tuning (SFT)

Improves performance on instruction-following and structured domain tasks.

Healthcare:

lavita/medical-qa-datasets
ruslanmv/ai-medical-chatbot
axiong/pmc llama instructions

Finance:

winddude/reddit finance 43 250k
MarinaC/question-answer-Subject-Finance-Instruct

Legal:

umarbutler/open-australian-legal-qa
mb7419/legal-advice-reddit

Direct Preference Optimization (DPO)

Aligns model outputs with preferred user responses in domain-specific tasks.

Finance and Legal:

NickyNicky/nano_finance 200k_en_es_chatML_gemma_orpo_dpo
Dhananjayg22/legal-dpo

Training Details

Shakti-250M was trained in multiple stages, progressing from general and domain-specific understanding to specialized instruction following and human preference alignment.

Phase 1: Pre-Training

Conducted on large-scale general corpora and domain-specific texts (finance, legal) using a next-token prediction objective. Standard Transformer-based architecture with rotary positional embeddings (RoPE) and mixed-precision training (FP16 + bfloat16) was used to capture both general language patterns and specialized terminology.

Phase 2: Supervised Fine-Tuning (SFT)

Focused on domain-specific instruction datasets in healthcare, finance, and legal sectors. Tasks included medical Q&A, legal summarization, patient-doctor dialogues, and finance-related discussions to improve performance on structured domain tasks.

Phase 3: Direct Preference Optimization (DPO)

Used preference-labeled datasets across healthcare, legal, and finance domains to align the model with preferred user responses. This stage replaced RLHF to enable computational efficiency and ensure high-quality, real-time outputs suitable for deployment on edge devices.

Benchmark Results and Comparison

Shakti-250M delivers strong performance while being smaller and more efficient than many larger models like Boomer-1B and Llama 3.2 1B. Despite using fewer training tokens, it handles a wide range of NLP tasks effectively due to its clean, high-quality training data and optimized learning process.

Key Insight

This model proves that smart training and well-curated datasets matter more than just size. Its lightweight design makes it ideal for real-world applications on mobile devices, IoT systems, and other resource-limited environments.

Benchmark	Shakti-250M	Boomer-1B	Boomer-634M	Qwen2.5-0.5B	SmolLM-360M	Llama 3.2 1B
MMLU	28.98	25.92	25.23	47.5	34.4	32.2
BigBenchHard	13.75	28.65	21.11	20.3	24.4	30.93
IFEval	12.83	23.81	22.22	27.9	19.8	59.5
Hellaswag	29.96	31.66	34.08	52.1	51.8	41.2
Anli	33.40	32.57	27.5	26.85	-	22.56
Piqa	63.22	60.78	62.57	72.50	71.6	80.64
OpenbookQA	16.60	22.56	35.76	30.73	37.2	37
Truthfulqa(MC2)	20.69	25.69	27.57	40.2	-	30.7
WinoGrande	52.97	45.79	51.07	56.3	52.8	60
ARC Challenge	41.20	40.78	62.57	35.6	50.1	32.8
SQuAD	23.25	67	57.5	52.94	-	49.2
Trivia QA	1.68	25.25	2.73	12.5	9.1	25.69
GSM8K	2.3	1.5	0.91	41.6	-	44.4
MATH	21.71	-	23.38	19.5	-	-

Domain Specific Benchmarks

Shakti-250M demonstrates exceptional performance in the healthcare and finance domains, making it a versatile model for domain-specific applications. In healthcare, it excels in tasks requiring complex medical reasoning and shows strong capabilities in understanding and applying clinical knowledge. The model's compact size and efficiency make it an excellent choice for edge devices and IoT deployment in both healthcare and finance applications.

Benchmark	Shakti-250M	Phi-1.5-1.3B	Gemma-2B	Opt-2.7B
MedQA	41.25	31.11	29.22	27.1
MedMCQA	34.87	34.31	30.22	25.63
PubMedQA	58.21	67.8	66.4	60.8
MMLU Professional Medicine	28.4	29.04	18.01	16.54
MMLU Medical genetics	31.42	42	28	23
MMLU College Medicine	30.45	37.57	31.21	24.86
MMLU College Biology	31.25	34.03	33.33	20.14
MMLU Clinical Knowledge	36.78	46.04	35.47	23.02
MMLU Anatomy	39.42	39.26	37.04	32.59
PatronusAI finance-bench-test	32.2	-	-	-
jan-hq finance-benchmark mcq	23.1	-	-	-

Domain Performance Highlights

✓Healthcare Excellence: Strong performance on MedQA (41.25%) and MedMCQA (34.87%)

✓Medical Knowledge: Competitive results across MMLU medical benchmarks

✓Finance Expertise: Leading performance on finance-specific benchmarks

✓Clinical Applications: Suitable for real-time medical assistance and diagnostics

✓Financial Tools: Ideal for forecasting and decision-making applications

✓Edge Deployment: Optimized for resource-constrained healthcare and finance environments

Conclusion

The Shakti-250M is a compact and efficient language model specifically designed for domain-focused applications in finance, healthcare, and legal sectors. With just 250 million parameters, it balances performance and resource usage, making it ideal for use on mobile, IoT, and edge devices. Despite its smaller size, Shakti-250M delivers strong results on domain-specific benchmarks, often outperforming larger models. Its fine-tuned datasets and optimization techniques help it handle complex tasks like legal summarization, financial forecasting, and medical Q&A with high accuracy. Overall, Shakti-250M proves that smart design and focused training can enable small models to excel in real-world, domain-specific applications.