The largest model in the Shakti SLM series, optimized for conversations, summarization, and edge deployment
Shakti-500M is the largest model in the Shakti SLM series with 500 million parameters. It is designed to work well for tasks that involve conversations, summarizing long documents, and answering questions. The model is trained to be used in both cloud and edge environments, including mobile and low-power devices. It uses quantization-aware training so that the same model can run in different formats (like int8 or int4) without losing much accuracy.
The Shakti models leverage a robust multilingual tokenizer trained across a broad linguistic spectrum, enabling accurate parsing and representation of inputs in Indian languages (e.g., Hindi, Kannada, Telugu, Tamil) and global languages (e.g., Spanish, French, German, English). Fine-tuning capabilities allow domain- and language-specific adaptation for precise outputs in multi-regional deployments.
Optimized on instruction datasets like Cosmopedia and Magma-Pro, the models handle multi-turn dialogue, instructional tasks, summarization, classification, and QA efficiently.
Utilizes Block Sparse Attention and Sliding Window cache to process and attend to sequences up to 4,096 tokens, supporting document QA and long-thread chat scenarios.
Benchmarked on HumanEval and instruction-tuned on coding datasets, the model supports function generation, code completion, and syntax correction.
Shakti-500M utilizes a 24-layer decoder-only Transformer with a hidden size of 2,048 and 16 attention heads. It incorporates the following enhancements:
The Shakti-500M model undergoes pre-training on diverse corpora to develop general language understanding and knowledge across various domains. The supervised fine-tuning (SFT) phase adapts the model for instruction-based applications, enhancing problem-solving, conversational AI, and coding capabilities. RLHF further refines responses through human feedback, ensuring contextual relevance and accuracy.
Approximately 2T tokens drawn from diverse and high-quality sources:
Focused on instruction-following and task-specific capabilities:
UltraFeedback with binary preference annotations for model output ranking to improve response quality and alignment with human preferences.
Shakti-500M was trained in multiple stages, progressing from general language understanding to specialized instruction following and human preference alignment.
Conducted on 2T tokens using next-token prediction objective, with mixed-precision training (FP16 + bfloat16). Block-sparse attention and RoPE integrated during this phase to establish foundational language understanding across diverse domains.
Focused on task-specific instruction sets. Tasks span summarization, question answering, and conversational AI to improve the model's ability to follow instructions and generate contextually appropriate responses.
UltraFeedback-based preference modeling using reward model scoring and PPO fine-tuning to improve response helpfulness and relevance, ensuring outputs align with human expectations and preferences.
Shakti-500M performs strongly across a variety of tasks, holding its own against both similar-sized and larger models. It was trained using a well-curated and balanced dataset, which helps it make the most of its compact and optimized design. The model handles different challenges reliably and delivers competitive results.
While larger models may have an edge in certain areas, Shakti-500M offers a great balance between performance and efficiency, making it ideal for use in low-resource environments like mobile devices, IoT systems, and edge computing.
| Benchmark | Shakti-500M | Boomer-1B | Boomer-634M | Qwen2.5-0.5B | Llama 3.2 1B |
|---|---|---|---|---|---|
| MMLU | 38.90 | 25.92 | 25.23 | 47.5 | 32.2 |
| BigBenchHard | 33.1 | 28.65 | 21.11 | 20.3 | 30.93 |
| IFEval | 36.62 | 23.81 | 22.22 | 27.9 | 59.5 |
| Hellaswag | 68.56 | 31.66 | 34.08 | 52.1 | 41.2 |
| Anli | 40.70 | 32.57 | 27.5 | 26.85 | 22.56 |
| Piqa | 74.59 | 60.78 | 62.57 | 72.50 | 80.64 |
| Med MCQA | 32.61 | 17.56 | 37.50 | 42.5 | 37.57 |
| OpenbookQA | 39.80 | 22.56 | 35.76 | 30.73 | 37 |
| WinoGrande | 60.67 | 45.79 | 51.07 | 56.3 | 60 |
| SQuAD | 71.40 | 67 | 57.5 | 52.94 | 49.2 |
| Trivia QA | 31.11 | 1.5 | 0.91 | 41.6 | 4.44 |
| GSM8K | 9.2 | 1.67 | 1 | 1.69 | - |
| MATH | 31.97 | - | 23.38 | 19.5 | - |
Shakti-500M stands out as a well-rounded, efficient small language model optimized for real-world use. With strong multilingual support, instruction following, and long-context handling, it excels in tasks like conversations, summarization, question answering, and code generation. Its architecture combines advanced techniques like block sparse attention, RoPE, and quantization-aware training, enabling smooth performance across cloud and edge devices, including mobile and low-power environments. Backed by diverse pretraining data and refined through SFT and RLHF, Shakti-500M delivers reliable and context-aware outputs. While not the largest model, it competes well against both peer and larger models in benchmark evaluations, offering a solid balance of accuracy, efficiency, and deployment flexibility for multilingual and domain-specific applications.