![Data Version Control 101 with Oxen](/content/images/size/w750/2024/04/Screenshot-2024-04-02-at-4.51.18-PM.png)
Latest
Jul
25
![Fine-tuning Llama 3 in 14 minutes using ReFT](/content/images/size/w750/2024/07/Screenshot-2024-07-25-at-9.36.55-AM.png)
Fine-tuning Llama 3 in 14 minutes using ReFT
If you have been fine-tuning models recently, you have most likely used LoRA. While LoRA has been the dominant PEFT
8 min read
Jul
21
![ArXiv Dives: How ReFT works](/content/images/size/w750/2024/07/ReFT.jpg)
ArXiv Dives: How ReFT works
ArXiv Dives is a series of live meetups that take place on Fridays with the Oxen.ai community. We believe
10 min read
Jun
26
![ArXiv Dives:💃 Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling](/content/images/size/w750/2024/06/Screenshot-2024-06-25-at-2.58.40-AM.png)
ArXiv Dives:💃 Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Modeling sequences with infinite context length is one of the dreams of Large Language models. Some LLMs such as Transformers
4 min read
Jun
04
![ArXiv Dives: Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet](/content/images/size/w750/2024/06/Screenshot-2024-06-03-at-10.56.20-PM.png)
ArXiv Dives: Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
The ability to interpret and steer large language models is an important topic as they become more and more a
9 min read
May
29
![ArXiv Dives: Efficient DiT Fine-Tuning with PixART for Text to Image Generation](/content/images/size/w750/2024/05/Screenshot-2024-05-22-at-11.39.57-AM.png)
ArXiv Dives: Efficient DiT Fine-Tuning with PixART for Text to Image Generation
Diffusion Transformers have been gaining a lot of steam since OpenAI's demo of Sora back in March. The
8 min read
May
17
![ArXiv Dives: Evaluating LLMs for Code Completion with HumanEval](/content/images/size/w750/2024/05/DALL-E-2024-05-16-09.56.11---A-llama-sitting-at-a-desk--typing-on-a-laptop-computer--surrounded-by-code-on-multiple-screens.-The-llama-is-wearing-glasses-and-looks-focused.-The-en.webp)
ArXiv Dives: Evaluating LLMs for Code Completion with HumanEval
Large Language Models have shown very good ability to generalize within a distribution, and frontier models have shown incredible flexibility
15 min read
Apr
29
![How to Train Diffusion for Text from Scratch](/content/images/size/w750/2024/04/IMG_1485-1.jpeg)
How to Train Diffusion for Text from Scratch
This is part two of a series on Diffusion for Text with Score Entropy Discrete Diffusion (SEDD) models. Today we
16 min read
Apr
15
![ArXiv Dives: Text Diffusion with SEDD](/content/images/size/w750/2024/04/TextDiffusion.jpg)
ArXiv Dives: Text Diffusion with SEDD
Diffusion models have been popular for computer vision tasks. Recently models such as Sora show how you can apply Diffusion
11 min read
Apr
08
![ArXiv Dives: The Era of 1-bit LLMs, All Large Language Models are in 1.58 Bits](/content/images/size/w750/2024/04/Screenshot-2024-04-08-at-1.13.31-PM.png)
ArXiv Dives: The Era of 1-bit LLMs, All Large Language Models are in 1.58 Bits
This paper presents BitNet b1.58 where every weight in a Transformer can be represented as a {-1, 0, 1}
9 min read
Apr
01
![ArXiv Dives: Evolutionary Optimization of Model Merging Recipes](/content/images/size/w750/2024/04/Screenshot-2024-04-02-at-1.37.08-PM.png)
ArXiv Dives: Evolutionary Optimization of Model Merging Recipes
Today, we’re diving into a fun paper by the team at Sakana.ai called “Evolutionary Optimization of Model Merging
10 min read