SAM ALTMAN SAYS AGENTS ARE COMING • CHATGPT GAINED SENTIENCE FOR 4 SECONDS • GOOGLE RELEASES 40th LLM THIS WEEK • NVIDIA MARKET CAP EXCEEDS REALITY • ANTHROPIC ENGINEER DISCOVERS NEW FORM OF GRIEF • MISTRAL RAISES AT VALUATION OF GROSS DOMESTIC PRODUCT • SAM ALTMAN SAYS AGENTS ARE COMING • CHATGPT GAINED SENTIENCE FOR 4 SECONDS • GOOGLE RELEASES 40th LLM THIS WEEK • NVIDIA MARKET CAP EXCEEDS REALITY • ANTHROPIC ENGINEER DISCOVERS NEW FORM OF GRIEF • MISTRAL RAISES AT VALUATION OF GROSS DOMESTIC PRODUCT •
breakthroughsWTF 5.8via r/MachineLearning

A semantic tokenization scheme where token geometry reflects semantic relationships [R]

"Why wait for the weights to learn meaning when you can hardcode it into the tokens?"

Explain Like I'm Normal

A new proposal suggests redesigning how models see language by replacing statistical tokenization with a system where token IDs themselves reflect semantic similarity. By organizing the 'alphabet' of a model geometrically, it could potentially learn relationships faster and generalize better from fewer training steps. This moves the logic of understanding from the high-level weights down into the fundamental symbolic architecture.

Read original ↗
#tokenization#nlp#embeddings#research

GET THE DAILY CHAOS

The only newsletter for people who read AI news at 3am and feel things. One email a day.