artifacts

publications

Sparsity and Superposition in Mixture of Experts

Sparsity and Superposition in Mixture of Experts

2025NeurIPS '25 Mech Interp Workshop

network sparsity, monosemanticity and expert specialization in MoEs

MoE Lens - An Expert Is All You Need

MoE Lens - An Expert Is All You Need

2025ICLR '25 (SLLM)

understanding experts in a MoEs through routing analysis and logit lens

Sparse Crosscoders for diffing MoEs and Dense models

Sparse Crosscoders for diffing MoEs and Dense models

crosscoders, shared features & model diffing

blogs

how neural networks think at scale

how neural networks think at scale

features, neurons, polysemanticity and superposition

mathematical intuition for transformers

mathematical intuition for transformers

embeddings, attention and transformers architecture math

notes on neural nets

notes on neural nets

backpropagation, regularization, cross-entropy loss and more