Practical ML - Oxen.ai

Mar

05

Training a Rust 1.5B Coder LM with Reinforcement Learning (GRPO)

Group Relative Policy Optimization (GRPO) has proven to be a useful algorithm for training LLMs to reason and improve on

Mar 5, 2025

17 min read

Feb

05

🧠 GRPO VRAM Requirements For the GPU Poor

Since the release of DeepSeek-R1, Group Relative Policy Optimization (GRPO) has become the talk of the town for Reinforcement Learning

Feb 5, 2025

9 min read

Jul

21

ArXiv Dives: How ReFT works

ArXiv Dives is a series of live meetups that take place on Fridays with the Oxen.ai community. We believe

Jul 21, 2024

10 min read

Apr

29

How to Train Diffusion for Text from Scratch

This is part two of a series on Diffusion for Text with Score Entropy Discrete Diffusion (SEDD) models. Today we

Apr 29, 2024

16 min read

Apr

15

ArXiv Dives: Text Diffusion with SEDD

Diffusion models have been popular for computer vision tasks. Recently models such as Sora show how you can apply Diffusion

Apr 15, 2024

11 min read

Apr

08

ArXiv Dives: The Era of 1-bit LLMs, All Large Language Models are in 1.58 Bits

This paper presents BitNet b1.58 where every weight in a Transformer can be represented as a {-1, 0, 1}

Apr 8, 2024

9 min read

Mar

20

How to train Mistral 7B as a "Self-Rewarding Language Model"

About a month ago we went over the "Self-Rewarding Language Models" paper by the team at Meta AI

Mar 20, 2024

17 min read

Jan

06

Practical ML Dive - Building RAG from Open Source Pt 1

RAG was introduced by the Facebook AI Research (FAIR) team in May of 2020 as an end-to-end way to include

Jan 6, 2024

14 min read

Dec

20

Practical ML Dive - How to train Mamba for Question Answering

What is Mamba 🐍? There is a lot of hype about Mamba being a fast alternative to the Transformer architecture. The

Dec 20, 2023

22 min read