Why are there AI boxes popping up everywhere? They are useless. How many times do we need to repeat that LLMs are trained to give convincing answers but not correct ones. I've gained nothing from asking this glorified e-waste something, pulling out my phone and verifying it.
Plenty of free apps get monetized just fine. They just have to offer something people want to use that they can slather ads all over. The AI doo-dads haven't shown they're useful. I'm guessing the dedicated hardware strategy got them more upfront funding from stupid venture capital than an app would have, but they still haven't answered why anybody should buy these. Just postponing the inevitable.
They have pushed AI so hard in the last couple of years they have convinced many that we are 1 year away from Terminator travelling back in time to prevent the apocalypse
Because money, both from tech hungry but not very savvy consumers, and the inevitable advertisers that will pay for the opportunity for their names to be ejected from these boxes as part of a perfectly natural conversation.
I don't necessarily disagree. You can certainly use LLMs and achieve something in less time than without it. Numerous people here are speaking about coding and while I had no success with them, it can work with more popular languages. The thing is, these people use LLMs as a tool in their process. They verify the results (or the compiler does it for them). That's not what this product is. It's a standalone device which you talk to. It's supposed to replace pulling out your phone to answer a question.
I haven't seen much of them here, but I use other media too. E.g, not long ago there was a lot of coverage about the "Humane AI Pin", which was utter garbage and even more expensive.
I just started diving into the space from a localized point yesterday. And I can say that there are definitely problems with garbage spewing, but some of these models are getting really really good at really specific things.
A biomedical model I saw seemed lauded for it's consistency in pulling relevant data from medical notes for the sake of patient care instructions, important risk factors, fall risk level etc.
So although I agree they're still giving well phrased garbage for big general cases (and GPT4 seems to be much more 'savvy'), the specific use cases are getting much better and I'm stoked to see how that continues.
I think it's a delayed development reaction to Amazon Alexa from 4 years ago. Alexa came out, voice assistants were everywhere. Someone wanted to cash in on the hype but consumer product development takes a really long time.
So product is finally finished (mobile Alexa) and they label it AI to hype it as well as make it work without the hard work of parsing wikipedia for good answers.
Alexa is a fundamentally different architecture from the LLMs of today. There is no way that anyone with even a basic understanding of modern computing would say something like this.
The best convincing answer is the correct one. The correlation of AI answers with correct answers is fairly high. Numerous test show that. The models also significantly improved (especially paid versions) since introduction just 2 years ago.
Of course it does not mean that it could be trusted as much as Wikipedia, but it is probably better source than Facebook.
"Fairly high" is still useless (and doesn't actually quantify anything, depending on context both 1% and 99% could be 'fairly high'). As long as these models just hallucinate things, I need to double-check. Which is what I would have done without one of these things anyway.
Hallucinations are largely dealt with if you use agents. It won't be long until it gets packaged well enough that anyone can just use it. For now, it takes a little bit of effort to get a decent setup.
Also if you want a computer that you don't have to double check, you literally are expecting software to embody the concept of God. This is fucking stupid.
It's all about context. Asking a bunch of 4 year olds questions about trigonometry, 1% of answers being correct would be fairly high. 'Fairly high' basically only means 'as high as expected' or 'higher than expected'.
Also if you want a computer that you don’t have to double check, you literally are expecting software to embody the concept of God. This is fucking stupid.
Hence, it is useless. If I cannot expect it to be more or less always correct, I can skip using it and just look stuff up myself.
Obviously the only contexts that would apply here are ones where you expect a correct answer. Why would we be evaluating a software that claims to be helpful against 4 year old asked to do calculus? I have to question your ability to reason for insinuating this.
So confirmed. God or nothing. Why don't you go back to quills? Computers cannot read your mind and write this message automatically, hence they are useless
I don't expect a correct answer because I've used these models quite a lot last year. At least half the answers were hallucinated. And it's still a common complaint about this product as well if you look at actual reviews (e.g., pretty sure Marques Brownlee mentions it).
Something seems to fly above your head: quality is not optional and it's good engineering practice to seek reliable methods of doing our work. As a mature software person, you look for tools that give less room for failure and want to leave as little as possible for humans to fuck up, because you know they're not reliable, despite being unavoidable. That's the logic behind automated testing, Rust's borrow checker, static typing...
If you've done code review, you know it's not very efficient at catching bugs. It's not efficient because you don't pay as much attention to details when you're not actually writing the code. With LLMs, you have to do code review to ensure you meet quality standards, because of the hallucinations, just like you've got to test your work before committing it.
I understand the actual software engineers that care about delivering working code and would rather write it in order to be more confident in the quality of the output.
"AI" is a really dumb term for what we're all using currently. General LLMs are not intelligent, it's assigning priorities to tokens (words) in a database, based on what tokens were provided before it, to compare and guess the next most logical word and phrase, really really fast. Informed guesses, sure, but there's not enough parameters to consider all the factors required to identify a rhyme.
That said, honestly I'm struggling to come up with 2 rhyming L words? Lol even rhymebrain is failing me. I'm curious what you went with.
That might be because of how it works under the hood and how it tokenized words, characters, and sentences. It may not have anything telling it that a specific word starts with a specific letter. It might only have the whole word. That's my guess.
I’ve asked GPT4 to write specific Python programs, and more often than not it does a good job. And if the program is incorrect I can tell it about the error and it will often manage to fix it for me.
You have every right not to, but the "useless" word comes out a lot when talking about LLMs and code, and we're not all arguing in bad faith. The reliability problem is still a strong factor in why people don't use this more, and, even if you buy into the hype, it's probably a good idea to temper your expectations and try to walk a mile in the other person's shoes. You might get to use LLMs and learn a thing or two.
I only "believe the hype" because a good developer friend of mine suggested I try copilot so I did and was impressed. It's an amazing technical achievement that helps me get my job done. It's useful every single day I use it. Does it do my job for me? No of fucking course not, I'm not a moron who expected that to begin with. It speeds up small portions of tasks and if I don't understand or agree with its solution, it's insanely easy not to use it.
People online mad about something new is all this is. There are valid concerns about this kind of tech, but I rarely see that. Ignorance on the topic prevails. Anyone calling ai "useless" in a blanket statement is necessarily ignorant and doesn't really deserve my time except to catch a quick insult for being the ignorant fool they have revealed themselves to be.
I'm glad that you're finding this useful. When I say it's useless, I speak in my name only.
I'm not afraid to try it out, and I actually did, and, while I was impressed by the quality of the English it spits out, I was disappointed with the actual substance of the answers, which makes this completely unusable for me in my day to day life. I keep trying it every now and then, but it's not a service I would pay for in its current state.
Thing is, I'm not the only one. This is the opinion of the majority of people I work with, senior or junior. I'm willing to give it some time to mature, but I'm unconvinced at the moment.
You would need to be pulling some trickery on Microsoft to get access to copilot for more than a single 30 day trial so I'm skeptical you've actually used it. Sounds like you're using other products which may be much worse. It also sounds like you work in a conservative shop. Good luck with that
I have not tried Copilot, no. I'm not giving any tool money, personal info and access to my code when it can't reliably answer a question like: "does removing from a std::vector invalidate iterators?" (not a prompt I tried on LLMs but close enough).
That shit's just dangerous, for obvious reasons. Especially when you consider the catastrophic impact these kinds of errors can have.
There needs to be a fundamental shift to something that detects and fixes the garbage, which just isn't there ATM.
I never said I was an expert on Copilot, I've consistently said LLMs are not where they should be in terms of reliability, which is also true of Copilot.
Edit: oh and sorry for not being willing to waste my time trying out every new piece of tech on the block when all they're doing is rehashing unsound ideas 🤷♂️
I just used ChatGPT to write a 500-line Python application that syncs IP addresses from asset management tools to our vulnerability management stack. This took about 4 hours using AutoGen Studio. The code just passed QA and is moving into production next week.
Intermediate? Nah, junior. They're cheaper after all.
But senior devs do a lot more than output code. Sometimes - like Bill Atkinson's famous -2000 line change to Quickdraw - their jobs involve a lot of complex logic and very little actual code output.
It's a shortcut for experience, but you lose a lot of the tools you get with experience. If I were early in my career I'd be very hesitant relying on it as its a fragile ecosystem right now that might disappear, in the same way that you want to avoid tying your skills to a single companies product. In my workflow it slows me down because the answers I get are often average or wrong, it's never "I'd never thought of doing it that way!" levels of amazing.
You used the right tool for the job, saved you from hours of work. General AI is still a very long ways off and people expecting the current models to behave like one are foolish.
Are they useless? For writing code, no. Most other tasks yes, or worse as they will be confiently wrong about what you ask them.
I think the reason they're useful for writing code is that there's a third party - the parser or compiler - that checks their work. I've used LLMs to write code as well, and it didn't always get me something that worked but I was easily able to catch the error.
This is my expirence with LLMs, I have gotten it to write me code that can at best be used as a scaffold. I personally do not find much use for them as you functionally have to proofread everything they do. All it does change the work load from a creative process to a review process.
I don't agree. Just a couple of days ago I went to write a function to do something sort of confusing to think about. By the name of the function, copilot suggested the entire contents of the function and it worked fine. I consider this removing a bit of drudgery from my day, as this function was a small part of the problem I needed to solve. It actually allowed me to stay more focused on the bigger picture, which I consider the creative part. If I were a painter and my brush suddenly did certain techniques better, I'd feel more able to be creative, not less.
I would argue that there just isn't much gain in terms of speed of delivery, because you have to proofread the output - not doing it is irresponsible and unprofessional.
I don't tend to spend much time on a single function, but I can remember a time recently where I spent two hours writing a single function. I had to mentally run all cases to check that it worked, but I would have had to do it with LLM output anyway. And I feel like reviewing code is just much harder to do right than to write it right.
In my case, LLMs might have saved some time, but training the complexity muscle has value in itself. It's pretty formative and there are certain things I would do differently now after going through this. Most notably, in that case: fix my data format upfront to avoid edge cases altogether and save myself some hard thinking.
I do see the value proposition of IDEs generating things like constructors, and sometimes use such features, but reviewing the output is mentally exhausting, and it's necessary because even non-LLM sometimes comes out as broken. Assuming that it worked 100% of the time: still not convinced it amounts to much time saved at the end of day.
You say it's magical but never post proof. That's all I need to think it's shit. No need to debate about it for hours. Come back when you entice us with something instead of the billion REST APIs that are useless but seem to give a hard on to all the AI bros out there.
First off, this is not the kind of code I write on my end, and I don't think I'm the only one not writing scripts all day. There's a need for scripts at times in my line of work but I spend more of my time thinking about data structures, domain modelling and code architecture, and I have to think about performance as well. Might explain my bad experience with LLMs in the past.
I have actually written similar scripts in comparable amounts of times (a day for a working proof of concept that could have gone to production as-is) without LLMs. My use case was to parse JSON crash reports from a provider (undisclosable due to NDAs) to serialize it to our my company's binary format. A significant portion of that time was spent on deciding what I cared about and what JSON fields I should ignore. I could have used ChatGPT to find the command line flags for my Docker container but it didn't exist back then, and Google helped me just fine.
Assuming you had to guide the LLM throughout the process, this is not something that sounds very appealing to me. I'd rather spend time improving on my programming skills than waste that time teaching the machine stuff, even for marginal improvements in terms of speed of delivery (assuming there would be some, which I just am not convinced is the case).
On another note...
There's no need for snark, just detailing your experience with the tool serves your point better than antagonizing your audience. Your post is not enough to convince me this is useful (because the answers I've gotten from ChatGPT have been unhelpful 80% of the time), but it was enough to get me to look into AutoGen Studio which I didn't know about!
I don't think LLMs are useless, but I do think little SoC boxes running a single application that will vaguely improve your life with loosely defined AI features are useless.
In one of those weird return None combination. Also I don’t get why it insists on using try catch all the time. Last but not least, it should have been one script only with sub commands using argparse, that way you could refactor most of the code.
Also weird license, overly complicated code, not handling HTTPS properly, passwords in ENV variables, not handling errors, a strange retry mechanism (copy pasted I guess).
It’s like a bad hack written in a hurry, or something a junior would write. Something that should never be used in production. My other gripe is that OP didn’t learn anything and wasted his time. Next time he’ll do that again and won’t improve. It’s good if he’s doing that alone, but in a company I would have to fix all this and it’s really annoying.
It's no sense trying to explain to people like this. Their eyes glaze over when they hear Autogen, agents, Crew ai, RAG, Opus... To them, generative AI is nothing more than the free version of chatgpt from a year ago, they've not kept up with the advancements, so they argue from a point in the distant past. The future will be hitting them upside the head soon enough and they will be the ones complaining that nobody told them what was comming.
Thing is, if you want to sell the tech, it has to work, and what most people have seen by now is not really convincing (hence the copious amount of downvotes you've received).
You guys sound like fucking cryptobros, which will totally replace fiat currency next year. Trust me bro.
Downvotes by a few uneducated people mean nothing. The tools are already there. You are free to use them and think about this for yourself. I'm not even talking about what will be here in the future. There is some really great stuff right now. Even if doing some very simple setup is too daunting for you, you can just watch people on youtube doing it to see what is available. People in this thread have literally already told you what to type into your search box.
In the early 90s, people exactly like you would go on and on about how stupid the computerbros were for thinking anyone would ever use this new stupid "intertnet" thing. You do you, it is totally fine if you think because a handful of uneducated, vocal people on the internet agree with you that technology has mysteriously frozen for the first time in history, then you must all be right.
If everybody in society "votes" that kind of stuff "down", the hype will eventually die down and, once the dust has settled, we'll see what this is really useful for. Right now, it can't even do fucking chatbots right (see the Air Canada debacle with their AI chatbot).
Not every invention is as significant as the Internet. There's thing like crypto which are the butt of every joke in the tech community, and people peddling that shit are mocked by everyone.
I honestly don't buy that we're on the edge of a new revolution, or that LLMs are close to true AGI. Techbros have been pushing a lot of shit that is not in alignment with regular folks' needs for the past 10 years, and have maintained tech alive artificially without interest from the general population because of venture capital.
However, in the case of LLMs, the tech is interesting and is already delivering modest value. I'll keep an eye on it because I see a modest future for it, but it just might not be as culturally significant as you think it may be.
With all that said, one thing I will definitely not do is spend any time setting up things locally, or running a LLM on my machine or pay any money. I don't think this gives a competitive edge to any software engineer yet, and I'm not interested in becoming an early adopter of the tech given the mediocre results I've seen so far.
They aren't trying to have a conversation, they're trying to convince themselves that the things they don't understand are bad and they're making the right choice by not using it. They'll be the boomers that needed millennials to send emails for them. Been through that so I just pretend I don't understand AI. I feel bad for the zoomers and genas that will be running AI and futilely trying to explain how easy it is. Its been a solid 150 years of extremely rapid invention and innovation of disruptive technology. But THIS is the one that actually won't be disruptive.
I'm not trying to convince myself of anything. I was very happy to try LLM tools for myself. They just proved to be completely useless. And there's a limit to what I'm going to do to try out things that just don't seem to work at all. Paying a ton of money to a company to use disproportionate amounts of energy for uncertain results is not one of them.
Some people have misplaced confidence with generated code because it gets them places they wouldn't be able to reach without the crutches. But if you do things right and review the output of those tools (assuming it worked more often), then the value proposition is much less appealing... Reviewing code is very hard and mentally exhausting.
And look, we don't all do CRUD apps or scripts all day.
Tell me about how when you used Llama 3 with Autogen locally, and how in the world you managed to pay a large company to use disproportionate amounts of energy for it. You clearly have no idea what is going on on the edge of this tech. You think that because you made an openai account that now you know everything that's going on. You sound like an AOL user in the 90 that thinks the internet has no real use.
I don't care about the edge of that tech. I'm not interested in investing any time making it work. This is your problem. I need a product I can use as a consumer. Which doesn't exist, and may never exist because the core of the tech alone is unsound.
You guys make grandiloquent claims that this will automate software engineering and be everywhere more generally. Show us proof. What we've seen so far is ChatGPT (lol), Air Canada's failures to create working AI chatbots (lol), a creepy plushie and now this shitty device. Skepticism is rationalism in this case.
Maybe this will change one day? IDK. All I've been saying is that it's not ready yet from what I've seen (prove me wrong with concrete examples in the software engineering domain) and given that it tends to invent stuff that just doesn't exist, it's unreliable. If it succeeds, LLMs will be part of a whole delivering value.
You guys sound like Jehovah's witnesses. get a hold of yourselves if you want to be taken seriously. All I see here is hyperbole from tech bros without any proof.
You're just saying that you will only taste free garbage wine, and nobody can convince you that expensive wine could ever taste good. That's fine, you'll just be surprised when the good wine gets cheap enough for you to afford or free. Your unwillingness to taste it has nothing to do with what already exists. In this case, it's especially naive since you could just go watch videos of people using actually good wine.
Show me proof or shut up. It's that simple. This is not a subjective matter like wine tasting. There needs to be objective and tangible proof it works.
There are endless examples if you just search the things we've been mentioning. Here is a video that just came out today about a new project for making front ends called OpenUI.