2y ago

Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds. Researchers found wild fluctuations—called drift—in the technology’s abi...

ChatGPT can get worse over time, Stanford study finds | Fortune

Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds. Researchers found wild fluctuations—called drift—in the technology’s abi...::ChatGPT went from answering a simple math correctly 98% of the time to just 2%, over the course of a few months.

Technology @lemmy.ml

cyu @sh.itjust.works

2y ago

ChatGPT can get worse over time, Stanford study finds | Fortune

fortune.com /2023/07/19/chatgpt-accuracy-stanford-study/amp/

149 comments

- You wildly overestimate the competency of management and the capital owners they answer to.
  I guarantee a significant % of entities will grow dependent on AI well before it’s dependable. The profit motive will be too high (source: the frequent failure that is outsourcing).
  
  This is spot on. Source: 10+ years at F500 companies.
  Senior management and/or board members read one article in Forbes, or some other "business" publication, and think that they know everything they need to know about an emerging technology. Risk management is either a ☑ exercise or extremely limited in scope, usually only including threats that have already been observed and addressed in the past.
  Not enough people understand the limitations of this kind of tech, and contextualize it in the same frame as outsourcing because as long as the output mostly looks correct, the decision makers can push the blame for any issues down to the middle managers and below.
  Gonna be a wild time!
  
  Definitely not my experience at F100, they are cautious as fuck about everything. Definitely having the right discussions and exploring all sorts of technology, but risk management remains a huge calculation in making these kind of decisions.
  
  I think we'll see a very large filtering out of companies who do this.
  
  We've already seen people firing tech support staff and switching to "AI".
- I don't understand why anyone even considers that. It's a toy. A novelty, a thing you mess with when you're bored and want to see how Hank Hill would explain the plot of Full Metal Alchemist, not something you would entrust anything significant to.
  
  These models are black boxes right now, but presumably we could open it up and look inside to see each and every function the model is running to produce the output. If we are then able to see what it is actually doing and fix things up so we can mathematically verify what it does will be correct, I think we would be able to use it for mission critical applications. I think a more advanced LLM likes this would be great for automatically managing systems and to do science+math research.
  But yeah. For right now these things are mainly just toys for SUSSY roleplays, basic customer service, and generating boiler plate code. A verifiable LLM is still probably 2-4 years away.
  
  The problem is if you open it up, you just get trillions of numbers. We know what each function does, it takes a set of numbers between -1 and 1 that other nodes passed it, adds them up, checks if the sum is above or below a set threshold, and passes one number to the next nodes if it's above and one if it's below, some nodes toss in a bit of random variance to shake things up. The black box part is the fact that there are trillions of these numbers and they have no meaning individually.

149 comments