Neil Houlsby

About

New (Feb 2025): I am a Member Of Technical Staff at Anthropic, and lead of our new Zurich site. We work on the research and development of safe and aligned frontier language models, with a focus on multimodal capabilities.

Previously, I was a Senior Staff Research Scientist and manager at Google Deepmind and Google Brain, where I led a team working on neural architectures, vision, and language models. I have been fortunate to work on many interesting topics, including Transformers [ViT] and other neural architectures [Mixer], parameter-efficient finetuning [Adapters], scaling [VIT-22b, V-MoE], and vision-language models [Gemini, PaLI A, B, C].

I received my PhD from the Cambridge Computational and Biological Learning lab, supervised by Prof. Zoubin Ghahramani and Prof. Máté Lengyel, where I studied statistical machine learning and cognitive science.

Selected Publications

Vision Transformers

MLP Mixer

PaLI Vision-Language Model

Parameter-efficient Adapter Layers

Big Transfer

Bayesian Active Learning with Disagreement

All Publications

1. Frozen Feature Augmentation for Few-Shot Image Classification

Authors: Andreas Bär, Neil Houlsby, Mostafa Dehghani, Manoj Kumar

First appeared: 2024-03-15

Venue: Computer Vision and Pattern Recognition (CVPR), 2024

About

Selected Publications

All Publications

1. Frozen Feature Augmentation for Few-Shot Image Classification

2. Gemini: A Family of Highly Capable Multimodal Models

3. Scaling Laws for Sparsely-Connected Foundation Models

4. From Sparse to Soft Mixtures of Experts

5. Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

6. Scaling Open-Vocabulary Object Detection

7. Image Captioners Are Scalable Vision Learners Too

8. PaLI-X: On Scaling up a Multilingual Vision and Language Model

9. Scaling Vision Transformers to 22 Billion Parameters

10. Dual PatchNorm

11. Adaptive Computation with Elastic Input Sequence

12. Massively Scaling Heteroscedastic Classifiers

13. CLIPPO: Image-and-Language Understanding from Pixels Only

14. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

15. Location-Aware Self-Supervised Transformers for Semantic Segmentation

16. Transcending Scaling Laws with 0.1% Extra Compute

17. PaLI: A Jointly-Scaled Multilingual Language-Image Model

18. Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

19. UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

20. Robust and Efficient Medical Imaging with Self-Supervision

21. Simple Open-Vocabulary Object Detection with Vision Transformers

22. Unifying Language Learning Paradigms

23. Do better ImageNet classifiers assess perceptual similarity better?

24. Learning to Merge Tokens in Vision Transformers

25. Sparse MoEs meet Efficient Ensembles

26. The Benchmark Lottery

27. Revisiting the Calibration of Modern Neural Networks

28. Scaling Vision with Sparse Mixture of Experts

29. Scaling Vision Transformers

30. SI-Score: An image dataset for fine-grained analysis of robustness to object location, rotation and size

31. MLP-Mixer: An all-MLP Architecture for Vision

32. Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

33. Supervised Transfer Learning at Scale for Medical Imaging

34. Underspecification Presents Challenges for Credibility in Modern Machine Learning

35. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

36. Deep Ensembles for Low-Data Transfer Learning

37. Representation Learning From Videos In-the-Wild: An Object-Centric Approach

38. Training General Representations for Remote Sensing Using In-Domain Knowledge

39. Scalable Transfer Learning with Expert Models

40. On Robustness and Transferability of Convolutional Neural Networks

41. Automatic Shortcut Removal for Self-Supervised Representation Learning

42. Big Transfer (BiT): General Visual Representation Learning

43. Self-Supervised Learning of Video-Induced Visual Invariances

44. A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

45. Parameter-Efficient Transfer Learning for NLP

46. Neural Architecture Search Over a Graph Search Space

47. Self-Supervised GANs via Auxiliary Rotation Loss

48. On Self Modulation for Generative Adversarial Networks

49. Transfer Learning with Neural AutoML

50. Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

51. A Filtering Approach to Stochastic Variational Inference

52. Efficient Bayesian Active Learning and Matrix Modelling

53. Cold-start Active Learning with Robust Ordinal Matrix Factorization

54. Probabilistic Matrix Factorization with Non-random Missing Data

55. Stochastic Inference for Scalable Probabilistic Modeling of Binary Matrices

56. Statistical Fitting of Undrained Strength Data

57. Cognitive Tomography Reveals Complex Task-Independent Mental Representations

58. A Scalable Gibbs Sampler for Probabilistic Entity Linking

59. Active learning for Interactive Visualization

60. Experimental Adaptive Bayesian Tomography

61. Collaborative Gaussian Processes for Preference Learning

62. Adaptive Bayesian Quantum Tomography

63. Bayesian Active Learning for Classification and Preference Learning