Arxiv Dives

Arxiv Dives

Each week we dive deep into a topic in machine learning, data management, or general artificial intelligence research. These are notes from a live reading group we do every Friday. Captured for future reference. Join the community here: https://lu.ma/oxen
Apr
15
How RWKV-7 Goose Works 🪿 + Notes from the Author

How RWKV-7 Goose Works 🪿 + Notes from the Author

In this special Arxiv Dive, we're joined by Eugene Cheah - author, lead in RWKV org, CEO of
17 min read
Mar
25
How Phi-4 Cracked Small Multimodality

How Phi-4 Cracked Small Multimodality

Phi-4 extends the existing Phi model’s capabilities by adding vision and audio all in the same model. This means
8 min read
Feb
11
Why GRPO is Important and How it Works

Why GRPO is Important and How it Works

Last week on Arxiv Dives we dug into research behind DeepSeek-R1, and uncovered that one of the techniques they use
12 min read
Feb
04
How DeepSeek R1, GRPO, and Previous DeepSeek Models Work

How DeepSeek R1, GRPO, and Previous DeepSeek Models Work

In January 2025, DeepSeek took a shot directly at OpenAI by releasing a suite of models that “Rival OpenAI’s
15 min read
Jan
29
No Hype DeepSeek-R1 Reading List

No Hype DeepSeek-R1 Reading List

DeepSeek-R1 is a big step forward in the open model ecosystem for AI with their latest model competing with OpenAI&
27 min read
Jan
21
arXiv Dive: RAGAS - Retrieval Augmented Generation Assessment

arXiv Dive: RAGAS - Retrieval Augmented Generation Assessment

RAGAS is an evaluation framework for Retrieval Augmented Generation (RAG). A paper released by Exploding Gradients, AMPLYFI, and CardiffNLP. RAGAS
13 min read
Dec
23
OpenCoder: The OPEN Cookbook For Top-Tier Code LLMs

OpenCoder: The OPEN Cookbook For Top-Tier Code LLMs

Welcome to the last arXiv Dive of 2024! Every other week we have been diving into interesting research papers in
14 min read
Dec
09
LLaVA-CoT: Let Vision Language Models Reason Step-By-Step

LLaVA-CoT: Let Vision Language Models Reason Step-By-Step

When it comes to large language models, it is still the early innings. Many of them still hallucinate, fail to
12 min read
Nov
18
How Upcycling MoEs Beat Dense LLMs

How Upcycling MoEs Beat Dense LLMs

In this Arxiv Dive, Nvidia researcher, Ethan He, presents his co-authored work Upcycling LLMs in Mixture of Experts (MoE). He
1 min read
Nov
11
Thinking LLMs: General Instruction Following with Thought Generation

Thinking LLMs: General Instruction Following with Thought Generation

The release of OpenAI-O1 has motivated a lot of people to think deeply about…thoughts 💭. Thinking before you speak is
14 min read