Arxiv Dives - How Mistral 7B works
What is Mistral 7B?
Mistral 7B is an open weights large language model by Mistral.ai that was build for
Mamba: Linear-Time Sequence Modeling with Selective State Spaces - Arxiv Dives
What is Mamba 🐍?
Mamba at it's core is a recurrent neural network architecture, that outperforms Transformers with faster
Practical ML Dive - How to customize a Vision Transformer on your own data
Welcome to Practical ML Dives, a series spin off of Arxiv Dives.
In Arxiv Dives, we cover state of the
Arxiv Dives - Zero-shot Image Classification with CLIP
CLIP explores the efficacy of learning image representations from scratch with 400 million image-text pairs, showcasing zero-shot transfer capabilities across
Arxiv Dives - Vision Transformers (ViT)
With all of the hype around Transformers for natural language processing and text, the authors of this paper beg the
Arxiv Dives - A Mathematical Framework for Transformer Circuits - Part 2
Every Friday at Oxen.ai we host a paper club called "Arxiv Dives" to make us smarter Oxen
Arxiv Dives - A Mathematical Framework for Transformer Circuits - Part 1
Every Friday at Oxen.ai we host a paper club called "Arxiv Dives" to make us smarter Oxen
Arxiv Dive Manifesto
Every Friday the team at Oxen.ai gets together and goes over research papers, blog posts, or books that help
Arxiv Dives - Attention Is All You Need
Every Friday at Oxen.ai we host a paper club called "Arxiv Dives" to make us smarter Oxen
Arxiv Dives - How LoRA fine-tuning works
Every Friday at Oxen.ai we host a paper club called "Arxiv Dives" to make us smarter Oxen