Bytes and nibbles

By Samuel Matsuo Harris

Home

Bytes

Nibbles

The ML Reading List #2

New spins on old ideas

The ML reading list

Published: Sat, 29 Jun 2024

Introduction

It's now been a month since I last published something and I'm still in the middle of making STICI-note part two (read part one here). So I thought screw it: time to unilaterally dump my favourite articles, papers, and video from the last two months on you. No need to thank me (I know you won't).

My Recommendations

Over the past couple of months, I've seen quite a few fascinating papers, articles, video essays, and open source projects. A common theme between the most interesting things I saw was certainly innovative takes on old ideas. Here are my favourite things that I read/watched:

KAN: Kolmogorov-Arnold Networks - This is a very interesting idea. If KANs are able to be sufficiently optimised and continue to outperform MLPs in other tests, they could be an innovation as significant for the field of machine learning as transformers. Not only do they promise better performance than MLPs for some problems, they also promise lower sized models (for the same performance) and far better explainability.
From Local to Global: A Graph RAG Approach to Query-Focused Summarization - Using graphs to enhance RAG knowledge bases is a very novel idea and I can definitely see the benefit using this when applying RAG to real knowledge bases. Only time will tell whether this idea takes off or not.
A Survey on Retrieval-Augmented Text Generation for Large Language Models - I’ve recently been researching into how to apply RAG to help me implement my STICI-note project. This survey gives a great broad overview of the current frontier of RAG research as well as a very insightful framework for all of the aspects of implementing RAG.
New Discovery: LLMs have a Performance Phase - This video (part 1 of a 3 part series) gives a great introduction to an a concept that is completely counterintuitive to anyone who has trained a neural network before: grokking (not to be confused with xAI’s Grok), the phenomenon of neural networks becoming generalised after being trained far beyond the point of overfitting their training set. I was incredibly sceptical about this at first as it goes against everything I know about overfitting, but there is a paper about it in NeurIPS, so maybe the researchers are onto something.
Curse of Dimensionality: An Intuitive Exploration - I hate to repeat the title, but “intuitive” is undeniably the best word to describe this explanation of the curse of dimensionality. Salih Salih does a very good job at explanation what the curse of dimensionality is and how it affects our ML models.
Good tokenizers is all you need - This was very eye-opening. I never I never quite realised the importance of tokenisers until I read this article. Even if you don’t plan to create your own LLMs from scratch, this is well worth the read as it explains a lot strange behaviours in LLMs.
2024 GitHub Accelerator: Meet the 11 projects shaping open source AI - So it turns out that GitHub has an accelerator program where they sponsor and give advice to a selection of open-source source projects on their platform. This year, the theme was “Advancing AI”, so I just had to take a look at it. Out of the 11 chosen projects, the following three really stood out to me as projects that I am likely to use:
- LLMWare - A framework for building enterprise RAG applications.
- Giskard - A framework for evaluating LLMs and ML models.
- Marimo - An alternative to Jupyter notebook that seeks to improve on the much-loved tool.

Grokking: the most counterintuitive phenomenon I have ever seen in machine learning (colourised). Source: Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets.