A compact yet powerful 2.5 billion parameter Small Language Model optimized for edge AI

Introduction

Shakti-2.5B is a compact yet powerful 2.5 billion parameter Small Language Model (SLM) developed by SandLogic Technologies. Tailored for edge AI and low-resource environments, it is optimized to run efficiently on smartphones, wearables, and IoT devices without compromising on natural language understanding or reasoning capabilities. What makes Shakti stand out is its ability to deliver high performance with low latency, while supporting vernacular Indian languages and domain-specific tasks. Whether it's conversational AI, healthcare, finance, or multilingual customer service, Shakti-2.5B is purpose-built for real-world AI deployment on devices with limited computational resources.

Model Capabilities

Multilingual Support

Trained on diverse language data, with strong performance in Hindi, Kannada, Telugu, and other low-resource languages.

Domain Adaptability

Fine-tuned for industry-specific applications, including healthcare, finance, and customer service.

Real-Time Responsiveness

Supports low-latency inference with quantized deployment on CPUs, GPUs, and Apple M-series chips.

Efficient Inference

Supports Sliding Window Attention and KV Caching for seamless long-context processing.

Context Awareness

Handles sequences up to 4096 tokens, ideal for summarization, document Q&A, and instruction following.

Architecture

Shakti-2.5B is a transformer-based language model with 16 layers and 2.5 billion parameters. It uses a size of 4096 for each layer and has 32 attention heads, out of which 8 are used for keys and values to help manage memory better.

Key Architectural Features:

  • Variable Grouped Query Attention (VGQA): Faster processing with reduced memory usage
  • Rotary Positional Embeddings (RoPE): Better understanding of long text sequences
  • SwiGLU Activations: More stable training and better learning
  • Sliding Window Attention: Efficient handling of long conversations
  • Key-Value Caching: Improved inference speed

Dataset Details

Shakti was trained on 2.8 trillion tokens sourced from both global and India-focused corpora:

Training Data Sources

  • Common Crawl (CCNet) – Large-scale filtered English web data
  • C4 – Cleaned and language-identified web pages
  • Wikipedia – Structured general knowledge base
  • Sangraha – Custom dataset for Hindi, Kannada, Telugu
  • CulturaX – Culturally diverse and multilingual corpus

All data underwent preprocessing to remove noise, irrelevant content, and duplicates, ensuring high-quality learning signals.

Training Details

Shakti-2.5B was trained using a three-phase strategy to build a model that is linguistically competent, instruction-following, and ethically aligned.

Phase 1: Pretraining

Shakti-2.5B learns core language understanding through massive volumes of high-quality unlabelled text.

Training Details:

  • Training Corpus: ~2.8 trillion tokens
  • Learning Rate: 2.0 × 10⁻⁴
  • Max Sequence Length: 4096 tokens

Key Sources:

  • Common Crawl (CCNet)
  • C4, Wikipedia
  • CulturaX, Sangraha

Phase 2: Supervised Fine-Tuning (SFT)

Fine-tuned on instruction-following and task-oriented datasets to enhance real-world performance.

Configuration:

  • Learning Rate: 2.0 × 10⁻⁵
  • Sequence Length: 4096 tokens
  • Cosine decay scheduler

Datasets:

  • UltraChat 200K
  • Cosmedia V2

Phase 3: Direct Preference Optimization (DPO)

Advanced alignment technique using ranked human feedback to align responses with user preferences.

  • Learning Rate: 5.0 × 10⁻⁷
  • Prompt Length: 1024 tokens
  • Human preference alignment

Benchmark Results and Comparison

Shakti-2.5B was tested on several standard language understanding tasks, showing competitive performance despite having fewer parameters than larger models.

CategoryBenchmarkShakti-LLM (2.5B)Phi-3 Mini-4kGemma 7BMistral 7BMistral 8×7BLLaMA 3 8B
Massive Multitask Language UnderstandingMMLU(5-shot)71.7%68.8%63.6%61.7%70.5%66.5%
Commonsense ReasoningBigBenchHard(0-shot)58.2%76.7%49.8%50.0%62.2%60.5%
QA and ReasoningARC-C67.68%86.3%78.3%78.6%87.3%82.8%
Language UnderstandingHellaSwag(5-shot)52.4%76.7%49.8%58.5%70.4%71.1%
ReasoningPIQA(5-shot)86.2%84.2%78.1%77.7%86.0%75.7%
Medical KnowledgeMedQA(2-shot)60.3%53.8%49.6%50.0%62.2%60.5%
Social UnderstandingSocial QA(5-shot)79.2%76.6%65.5%74.6%75.9%73.9%
Truthful QATruthful QA(10-shot)68.4%65.0%52.1%53.0%60.1%63.1%
Factual KnowledgeBoolQ(0-shot)61.1%77.6%66.0%72.2%76.6%80.9%
Trivia QATrivia QA(5-shot)58.2%64.0%72.3%75.2%82.2%67.7%

Key Performance Highlights

MMLU: 71.7% - Better than Mistral 7B and LLaMA 3 8B
PIQA: 86.2% - Strong common sense and everyday knowledge
Social QA: 79.2% - Excellent for chat and customer service tasks
Medical Knowledge: 60.3% - Competitive performance in specialized domains

Prompt-Based Comparative Evaluation

Shakti consistently delivers contextually accurate and detailed responses across diverse real-world scenarios, excelling in both factual tasks and creative outputs while demonstrating strong multilingual capabilities.

PromptShakti ResponsePhi-3 Response
What word is used to classify a group or family of related living organisms? Two examples being Clytostoma from tropical America and Syneilesis from East Asia.The word used to classify a group or family of related living organisms is "genus." Two examples mentioned are Clytostoma from tropical America and Syneilesis from East Asia.The word used to classify a group of related living organisms is "genus." In the given examples, Clytostoma and Syneilesis are both genera of plants. Clytostoma is a genus of woody-stemmed vines from tropical America, while Syneilesis is a genus of East Asian plants.
Write a short poem that is hard to sing."In silken threads of twilight's hue / A thousand midnights confluence anew / Echoes of forgotten melodies / Whisper secrets to the wind's sighs / Moonlit shadows writhe and twist / To claim the silence as their own / As the stars conspire to atone For moments left unfold""In twilight's realm, where shadows dwell / A labyrinth of whispers none can tell / Thoughts tangle, rhythms burn / Melodies that twist and turn / Verses hard to sing, yet yearn."
Write a poem about water (Hindi).पानी के सपने / मेरे दिल में रहते हैं /निर्मल और नर्म / जीवन की ज्योति जलाते हैं / तुम्हारा स्पर्श, मेरा स्वागतपानी की ओर देखो, जीवन की सारी / शांति की सरगम, हौले हौले कहे / धरती की हुमिंग, नदियों की राहें / नृत्य करती लहरें, गीतों की प्रबलता

Model Performance Analysis

Shakti-2.5B's inference speed was tested across various hardware setups, measuring tokens generated per second while producing 512-token responses.

Virtual Machine Setup

  • AMD EPYC 7R13 processor
  • 30 GB RAM
  • NVIDIA L40s GPU
  • 4 CPU cores

Apple MacBook M3 Max

  • 36 GB RAM
  • Apple M3 Max chip
ModelQuantized TypeModel SizeGPU (token/sec)CPU (token/sec)Mac (token/sec)
Shakti Q4_KMQ4_KM1.5 GB331.0918.93128
Shakti Q5_KMQ5_KM1.71 GB305.8915.90110
Phi-3.1-mini-4k Q5_KMQ5_KM2.82 GB163.178.4474
Phi-3.1-mini-4k Q4_KMQ4_KM2.39 GB180.410.7288.21

Key Observations

Shakti leads in speed: Both Q4 and Q5 quantized versions outperform Phi-3 across all platforms
Smaller but faster: Despite being smaller in size, Shakti's efficient architecture enables significantly faster token generation
Edge-ready: Results confirm Shakti is well-suited for real-time AI applications on low-power or mobile devices

Conclusion

Shakti-2.5B is a well-balanced small language model designed for real-world applications, especially on edge devices with limited resources. With its efficient architecture, multilingual support, and strong performance across benchmarks, Shakti proves that high-quality AIdoesn't always require massive models. Whether it's for mobile apps, IoT systems, or industry-specific tasks in healthcare, finance, or customer service, Shakti-2.5B offers a practical, fast, and reliable solution for modern AI needs.