Minish Blog
All Minish news and tutorials
Tokenlearn 2.0 Release
May 31, 2025
We’ve released a new version of Tokenlearn! It contains usability improvements, fixes some bugs, and has a new learning algorithm under the hood that improves performance. Read on to see what it does and how you can use it.
READ MORE
New Improvements to Model2Vec Distillation
February 5, 2025
We’ve made a lot of improvements to Model2Vec since it came out, many of which target the baseline performance of our distillation process. In this post, we walk through each change and explain why it matters for making your models smaller and faster.
READ MORE
ModernBERT Support (and Why It Doesn’t Work)
January 29, 2025
Our newest shiny release is here! 0.3.8! This is a small release in line for a big one we’ll be releasing next week. See here for the release notes, and read on for details about ModernBERT compatibility (spoiler: it’s trickier than you’d think).
READ MORE
semhash: Deduplication and Dataset Multitool
January 12, 2025
We’re super excited to announce the release of semhash, our semantic deduplication and dataset multitool (other features coming soon).
READ MORE
POTION: Bag of Tricks Leads to Better Models
October 29, 2024
This blogpost describes the Tokenlearn method, which is a method to pre‐train Model2Vec models.
READ MORE
Model2Vec Introduction Blogpost
October 14, 2024
This blog was first posted on the Hugging Face blog. We’re also posting it here for archival purposes.
READ MORE