publications

Sparsity and Superposition in Mixture of Experts
Sparsity and Superposition in Mixture of Experts
2025NeurIPS '25 Mech Interp Workshop

network sparsity, monosemanticity and expert specialization in MoEs

MoE Lens - An Expert Is All You Need

understanding experts in a MoEs through routing analysis and logit lens

blogs

how neural networks think at scale

features, neurons, polysemanticity and superposition

mathematical intuition for transformers

embeddings, attention and transformers architecture math

notes on neural nets

backpropagation, regularization, cross-entropy loss and more