publications

Sparsity and Superposition in Mixture of Experts
2025NeurIPS '25 Mech Interp Workshop
network sparsity, monosemanticity and expert specialization in MoEs

MoE Lens - An Expert Is All You Need
2025ICLR '25 (SLLM)
understanding experts in a MoEs through routing analysis and logit lens
blogs

features, neurons, polysemanticity and superposition

embeddings, attention and transformers architecture math

backpropagation, regularization, cross-entropy loss and more