Practical ML

Apr
29
How to Train Diffusion for Text from  Scratch

How to Train Diffusion for Text from Scratch

This is part two of a series on Diffusion for Text with Score Entropy Discrete Diffusion (SEDD) models. Today we
16 min read
Apr
15
ArXiv Dives: Text Diffusion with SEDD

ArXiv Dives: Text Diffusion with SEDD

Diffusion models have been popular for computer vision tasks. Recently models such as Sora show how you can apply Diffusion
11 min read
Apr
08
ArXiv Dives: The Era of 1-bit LLMs, All Large Language Models are in 1.58 Bits

ArXiv Dives: The Era of 1-bit LLMs, All Large Language Models are in 1.58 Bits

This paper presents BitNet b1.58 where every weight in a Transformer can be represented as a {-1, 0, 1}
9 min read
Mar
20
How to train Mistral 7B as a "Self-Rewarding Language Model"

How to train Mistral 7B as a "Self-Rewarding Language Model"

About a month ago we went over the "Self-Rewarding Language Models" paper by the team at Meta AI
17 min read
Jan
06
Practical ML Dive - Building RAG from Open Source Pt 1

Practical ML Dive - Building RAG from Open Source Pt 1

RAG was introduced by the Facebook AI Research (FAIR) team in May of 2020 as an end-to-end way to include
14 min read
Dec
20
Practical ML Dive - How to train Mamba for Question Answering

Practical ML Dive - How to train Mamba for Question Answering

What is Mamba 🐍? There is a lot of hype about Mamba being a fast alternative to the Transformer architecture. The
22 min read