Shakti LLMs

A compact yet powerful 2.5 billion parameter Small Language Model optimized for edge AI

Introduction

Shakti-2.5B is a compact yet powerful 2.5 billion parameter Small Language Model (SLM) developed by SandLogic Technologies. Tailored for edge AI and low-resource environments, it is optimized to run efficiently on smartphones, wearables, and IoT devices without compromising on natural language understanding or reasoning capabilities. What makes Shakti stand out is its ability to deliver high performance with low latency, while supporting vernacular Indian languages and domain-specific tasks. Whether it's conversational AI, healthcare, finance, or multilingual customer service, Shakti-2.5B is purpose-built for real-world AI deployment on devices with limited computational resources.

Model Capabilities

Multilingual Support

Trained on diverse language data, with strong performance in Hindi, Kannada, Telugu, and other low-resource languages.

Domain Adaptability

Fine-tuned for industry-specific applications, including healthcare, finance, and customer service.

Real-Time Responsiveness

Supports low-latency inference with quantized deployment on CPUs, GPUs, and Apple M-series chips.

Efficient Inference

Supports Sliding Window Attention and KV Caching for seamless long-context processing.

Context Awareness

Handles sequences up to 4096 tokens, ideal for summarization, document Q&A, and instruction following.

Architecture

Shakti-2.5B is a transformer-based language model with 16 layers and 2.5 billion parameters. It uses a size of 4096 for each layer and has 32 attention heads, out of which 8 are used for keys and values to help manage memory better.

Key Architectural Features:

Variable Grouped Query Attention (VGQA): Faster processing with reduced memory usage
Rotary Positional Embeddings (RoPE): Better understanding of long text sequences
SwiGLU Activations: More stable training and better learning
Sliding Window Attention: Efficient handling of long conversations
Key-Value Caching: Improved inference speed

Dataset Details

Shakti was trained on 2.8 trillion tokens sourced from both global and India-focused corpora:

Training Data Sources

Common Crawl (CCNet) – Large-scale filtered English web data
C4 – Cleaned and language-identified web pages
Wikipedia – Structured general knowledge base

Sangraha – Custom dataset for Hindi, Kannada, Telugu
CulturaX – Culturally diverse and multilingual corpus

All data underwent preprocessing to remove noise, irrelevant content, and duplicates, ensuring high-quality learning signals.

Training Details

Shakti-2.5B was trained using a three-phase strategy to build a model that is linguistically competent, instruction-following, and ethically aligned.

Phase 1: Pretraining

Shakti-2.5B learns core language understanding through massive volumes of high-quality unlabelled text.

Training Details:

Training Corpus: ~2.8 trillion tokens
Learning Rate: 2.0 × 10⁻⁴
Max Sequence Length: 4096 tokens

Key Sources:

Common Crawl (CCNet)
C4, Wikipedia
CulturaX, Sangraha

Phase 2: Supervised Fine-Tuning (SFT)

Fine-tuned on instruction-following and task-oriented datasets to enhance real-world performance.

Configuration:

Learning Rate: 2.0 × 10⁻⁵
Sequence Length: 4096 tokens
Cosine decay scheduler

Datasets:

UltraChat 200K
Cosmedia V2

Phase 3: Direct Preference Optimization (DPO)

Advanced alignment technique using ranked human feedback to align responses with user preferences.

Learning Rate: 5.0 × 10⁻⁷
Prompt Length: 1024 tokens
Human preference alignment

Benchmark Results and Comparison

Shakti-2.5B was tested on several standard language understanding tasks, showing competitive performance despite having fewer parameters than larger models.

Category	Benchmark	Shakti-LLM (2.5B)	Phi-3 Mini-4k	Gemma 7B	Mistral 7B	Mistral 8×7B	LLaMA 3 8B
Massive Multitask Language Understanding	MMLU(5-shot)	71.7%	68.8%	63.6%	61.7%	70.5%	66.5%
Commonsense Reasoning	BigBenchHard(0-shot)	58.2%	76.7%	49.8%	50.0%	62.2%	60.5%
QA and Reasoning	ARC-C	67.68%	86.3%	78.3%	78.6%	87.3%	82.8%
Language Understanding	HellaSwag(5-shot)	52.4%	76.7%	49.8%	58.5%	70.4%	71.1%
Reasoning	PIQA(5-shot)	86.2%	84.2%	78.1%	77.7%	86.0%	75.7%
Medical Knowledge	MedQA(2-shot)	60.3%	53.8%	49.6%	50.0%	62.2%	60.5%
Social Understanding	Social QA(5-shot)	79.2%	76.6%	65.5%	74.6%	75.9%	73.9%
Truthful QA	Truthful QA(10-shot)	68.4%	65.0%	52.1%	53.0%	60.1%	63.1%
Factual Knowledge	BoolQ(0-shot)	61.1%	77.6%	66.0%	72.2%	76.6%	80.9%
Trivia QA	Trivia QA(5-shot)	58.2%	64.0%	72.3%	75.2%	82.2%	67.7%

Key Performance Highlights

✓MMLU: 71.7% - Better than Mistral 7B and LLaMA 3 8B

✓PIQA: 86.2% - Strong common sense and everyday knowledge

✓Social QA: 79.2% - Excellent for chat and customer service tasks

✓Medical Knowledge: 60.3% - Competitive performance in specialized domains

Prompt-Based Comparative Evaluation

Shakti consistently delivers contextually accurate and detailed responses across diverse real-world scenarios, excelling in both factual tasks and creative outputs while demonstrating strong multilingual capabilities.

Prompt	Shakti Response	Phi-3 Response
What word is used to classify a group or family of related living organisms? Two examples being Clytostoma from tropical America and Syneilesis from East Asia.	The word used to classify a group or family of related living organisms is "genus." Two examples mentioned are Clytostoma from tropical America and Syneilesis from East Asia.	The word used to classify a group of related living organisms is "genus." In the given examples, Clytostoma and Syneilesis are both genera of plants. Clytostoma is a genus of woody-stemmed vines from tropical America, while Syneilesis is a genus of East Asian plants.
Write a short poem that is hard to sing.	"In silken threads of twilight's hue / A thousand midnights confluence anew / Echoes of forgotten melodies / Whisper secrets to the wind's sighs / Moonlit shadows writhe and twist / To claim the silence as their own / As the stars conspire to atone For moments left unfold"	"In twilight's realm, where shadows dwell / A labyrinth of whispers none can tell / Thoughts tangle, rhythms burn / Melodies that twist and turn / Verses hard to sing, yet yearn."
Write a poem about water (Hindi).	पानी के सपने / मेरे दिल में रहते हैं /निर्मल और नर्म / जीवन की ज्योति जलाते हैं / तुम्हारा स्पर्श, मेरा स्वागत	पानी की ओर देखो, जीवन की सारी / शांति की सरगम, हौले हौले कहे / धरती की हुमिंग, नदियों की राहें / नृत्य करती लहरें, गीतों की प्रबलता

Model Performance Analysis

Shakti-2.5B's inference speed was tested across various hardware setups, measuring tokens generated per second while producing 512-token responses.

Virtual Machine Setup

AMD EPYC 7R13 processor
30 GB RAM
NVIDIA L40s GPU
4 CPU cores

Apple MacBook M3 Max

36 GB RAM
Apple M3 Max chip

Model	Quantized Type	Model Size	GPU (token/sec)	CPU (token/sec)	Mac (token/sec)
Shakti Q4_KM	Q4_KM	1.5 GB	331.09	18.93	128
Shakti Q5_KM	Q5_KM	1.71 GB	305.89	15.90	110
Phi-3.1-mini-4k Q5_KM	Q5_KM	2.82 GB	163.17	8.44	74
Phi-3.1-mini-4k Q4_KM	Q4_KM	2.39 GB	180.4	10.72	88.21

Key Observations

✓Shakti leads in speed: Both Q4 and Q5 quantized versions outperform Phi-3 across all platforms

✓Smaller but faster: Despite being smaller in size, Shakti's efficient architecture enables significantly faster token generation

✓Edge-ready: Results confirm Shakti is well-suited for real-time AI applications on low-power or mobile devices

Conclusion

Shakti-2.5B is a well-balanced small language model designed for real-world applications, especially on edge devices with limited resources. With its efficient architecture, multilingual support, and strong performance across benchmarks, Shakti proves that high-quality AIdoesn't always require massive models. Whether it's for mobile apps, IoT systems, or industry-specific tasks in healthcare, finance, or customer service, Shakti-2.5B offers a practical, fast, and reliable solution for modern AI needs.