Researchers have developed a new way to compress the memory used by AI models to increase their accuracy in complex tasks or help save significant amounts of energy.
Paired with Whisper for quick voice to text transcription, we can transcribe text, ship the transcription to our local LLM, and then get a response back. With gpt-oss-120b, I manage to get about 20 ...