A compact language model built for smart devices and edge deployment
Shakti-100M is a compact language model built for smart devices like mobile phones, IoT systems, and other edge devices. Unlike large models that need powerful servers and internet access, Shakti-100M runs directly on your device. It offers fast responses, strong privacy, and low power usage making it ideal for offline, real-time applications. With only 100 million parameters, it brings natural language capabilities to low-resource environments, enabling smarter user experiences without cloud dependency.
Optimized for everyday NLP tasks such as text generation, completion, summarization, and basic question answering.
Tailored for ultra-low-power devices and offline systems with minimal memory and compute requirements.
Enables fully on-device processing to ensure data privacy in personal assistants, healthcare apps, and private chat interfaces.
Shakti-100M includes 100 million parameters and 10 layers, offering a solid balance between performance and efficiency. It uses a model dimension of 640 and a 2560 FFN (Feed Forward Network) dimension for handling complex language tasks.
Shakti-100M was trained using a diverse set of lightweight and instruction-focused datasets, carefully selected to ensure strong generalization, efficient task execution, and alignment with user intent—while maintaining a compact footprint suitable for edge deployment.
Utilized general-purpose corpora such as Common Crawl to establish foundational language understanding across diverse topics and linguistic styles.
Employed instruction-tuned datasets to enhance performance on everyday tasks like summarization, dialogue, and instruction following:
Used preference-labeled data from UltraFeedback Binarized to align outputs with user expectations in a computationally efficient manner, ensuring responsiveness on low-resource and edge devices.
Shakti-100M was trained in multiple stages, progressing from general language understanding to specialized instruction following and human preference alignment.
Trained on diverse general-purpose corpora using a next-token prediction objective with a compact Transformer architecture, Rotary Positional Embeddings (RoPE), and mixed-precision training (FP16 + bfloat16) to build foundational language understanding optimized for low-resource settings.
Fine-tuned on instruction-based datasets focused on everyday tasks such as summarization, simple Q&A, and dialogue to improve task adherence and conversational accuracy.
Aligned using human preference-labeled outputs from lightweight tasks to enhance relevance and clarity while maintaining computational efficiency suitable for real-time inference on mobile and embedded devices.
The Shakti-100M model demonstrates strong benchmark performance despite being significantly smaller than many competing models. It consistently matches or outperforms larger models in key evaluations, highlighting the effectiveness of its optimized training process on a carefully curated dataset.
A critical factor in Shakti-100M's success is the balanced size of its pre-training dataset. With a 1T token pre-training dataset, Shakti-100M maintains optimal balance, delivering strong performance across diverse tasks while emphasizing the importance of strategic data selection and curation.
| Benchmark | Shakti-100M | Boomer-634M | SmolLM-135M | SmolLM-360M | AMD-Llama-135M |
|---|---|---|---|---|---|
| MMLU | 25.96 | 25.91 | 30.2 | 34.4 | 23.02 |
| BigBenchHard | 30.12 | 21.11 | 23 | 24.4 | 18.71 |
| IFEval | 24.3 | 22.22 | 15.9 | 19.8 | 22 |
| Hellaswag | 51.34 | 39.24 | 41.2 | 51.8 | 30.48 |
| Anli | 21.34 | 27.5 | - | - | 30.73 |
| Piqa | 69.2 | 62.57 | 68.4 | 71.6 | 64.20 |
| OpenbookQA | 37.9 | 35.76 | 34 | 37.2 | 30.73 |
| Truthfulqa (MC2) | 29.2 | 27.57 | - | - | 22.56 |
| WinoGrande | 61.3 | 50.67 | 51.3 | 52.8 | 50.12 |
| ARC Easy | 45.8 | 62.57 | 42.4 | 50.1 | 43.64 |
| SQuAD | 31.5 | 57.5 | - | - | 25 |
| MedQA | 28.3 | 14 | 11.02 | 12.36 | 15.57 |
| GPQA | 14.9 | 12.1 | 9.89 | 11 | 12.4 |
| Bool Q | 29.4 | 22.9 | 17.3 | 21.3 | 23.54 |
| SocialQA | 23.34 | 14.5 | 16.9 | 19 | 19.1 |
| CommonsenseQA | 35.8 | 29 | 32.7 | 35.3 | 22.56 |
| Trivia QA | 15.3 | 2.73 | 4.3 | 9.1 | 7.54 |
| GSM8K | 9.2 | 1.67 | 1 | 1.69 | - |
| MATH | 13.9 | 23.38 | 14 | 19 | 20.64 |
| Humaneval | 7.8 | - | - | - | 5.1 |
Shakti-100M is a compact language model designed for efficient performance on edge devices. With only 100 million parameters, it supports common language tasks like summarization and question answering while running entirely on-device. The model uses a lightweight architecture with features like block sparse attention and rotary embeddings to balance speed and accuracy. Through a structured training process including pre-training, supervised fine-tuning, and preference alignment, Shakti-100M delivers reliable results in low-resource environments. Its strong performance across benchmarks shows that small models, when trained effectively, can meet real-world needs without relying on cloud infrastructure.