It's probably based on Q learning, which has been around for 30+ years, and I'm guessing the star is a nod to A* because it's an optimization of some kind.
When a major cultural event occurs, symbols and words can be forced into new meanings in that society after having gone through the significant or traumatic event.
The swastika, historically a symbol symbolizing representing well being and prosperity, now cannot be seen without associating it with hatred fascism.
I think it's relevant for a western country to consider the concept that "Q" can be interpreted in different ways than it was before but a significant number of people, and a large company should have something like that on their radar, especially for a marketing/branding perspective.
Ever since last week’s dramatic events at OpenAI, the rumor mill has been in overdrive about why the company’s chief scientific officer, Ilya Sutskever, and its board decided to oust CEO Sam Altman.
Reuters and The Information both report that researchers had come up with a new way to make powerful AI systems and had created a new model, called Q* (pronounced Q star), that was able to perform grade-school-level math.
According to the people who spoke to Reuters, some at OpenAI believe this could be a milestone in the company’s quest to build artificial general intelligence, a much-hyped concept referring to an AI system that is smarter than humans.
A machine that is able to reason about mathematics, could, in theory, be able to learn to do other tasks that build on existing information, such as writing computer code or drawing conclusions from a news article.
OpenAI has, for example, developed dedicated tools that can solve challenging problems posed in competitions for top math students in high school, but these systems outperform humans only occasionally.
Just last year, tech folks were saying the same things about Google DeepMind’s Gato, a “generalist” AI model that can play Atari video games, caption images, chat, and stack blocks with a real robot arm.
The original article contains 948 words, the summary contains 211 words. Saved 78%. I'm a bot and I'm open source!
I'm curious about auto-regressive token prediction vs planning.
The article just very briefly mentions "planning" and then never explains what it is.
As someone who's not in this area, what's the definition/mechanism of "planning" here?
It's extremely hard to separate out the actual technical terms from the hyperventilating booster lingo in this space. Unfortunately a lot of it is because there's overlap. "Hallucination" is a well-defined technical term now, but it is also a booster term because it implies a consciousness that doesn't exist.