marmik
artifactspaintingsthoughts

publications

Sparsity and Superposition in Mixture of Experts
Sparsity and Superposition in Mixture of Experts
2025NeurIPS '25 Mech Interp Workshop

network sparsity, monosemanticity and expert specialization in MoEs

MoE Lens - An Expert Is All You Need
MoE Lens - An Expert Is All You Need
2025ICLR '25 (SLLM)

understanding experts in a MoEs through routing analysis and logit lens

Sparse Crosscoders for diffing MoEs and Dense models
Sparse Crosscoders for diffing MoEs and Dense models
2025arXiv

crosscoders, shared features & model diffing

blogs

how neural networks think at scale
how neural networks think at scale
2025

features, neurons, polysemanticity and superposition

mathematical intuition for transformers
mathematical intuition for transformers
2024

embeddings, attention and transformers architecture math

notes on neural nets
notes on neural nets
2023

backpropagation, regularization, cross-entropy loss and more