Minish Lab

Model2Vec Size Improvements

Oct 5, 2025

In this blogpost, we showcase the various size reduction techniques we implemented in Model2Vec, and how they can be combined to create tiny models (~6mb) with minimal performance loss.

Model2Vec as a fasttext alternative

Jul 28, 2025

Minish Lab

In this blogpost, we compare Model2Vec and fastText. We show that Model2Vec is faster, smaller, and more performant.

We’ve released a new version of Tokenlearn! It contains usability improvements, fixes some bugs, and has a new learning algorithm under the hood that improves performance. Read on to see what it does and how you can use it.

New improvements to model2vec distillation

Feb 5, 2025

Minish Lab

We’ve made a lot of improvements to Model2Vec since it came out, many of which target the baseline performance of our distillation process. In this post, we walk through each change and explain why it matters for making your models smaller and faster.

ModernBERT support and why it doesn't work

Jan 29, 2025

Minish Lab

Our newest shiny release is here! 0.3.8! This is a small release in line for a big one we’ll be releasing next week. See here for the release notes, and read on for details about ModernBERT compatibility (spoiler: it’s trickier than you’d think).

semhash: deduplication and dataset multitool

Jan 12, 2025

Minish Lab

We’re super excited to announce the release of semhash, our semantic deduplication and dataset multitool (other features coming soon).

POTION: bag of tricks leads to better models

Oct 29, 2024

Minish Lab

This blogpost describes the Tokenlearn method, which is a method to pre‐train Model2Vec models.

Model2Vec Introduction blogpost

Oct 14, 2024

Minish Lab

This blog was first posted on the Hugging Face blog. We’re also posting it here for archival purposes.