Generative Artificial Intelligence
- Zuck's new Llama is a beast
cross-posted from: https://lemmy.world/post/17926715
> y2u.be/aVvkUuskmLY > > Llama 3.1 (405b) seems 👍. It and Claude 3.5 sonnet are my go-to large language models. I use chat.lmsys.org. Openai may be scrambling now to release Chatgpt 5?
- Marques Brownlee's latest vid is kinda unneededy2u.be AI the Product vs AI the Feature
The new Siri vs the RabbitR1 and Humane pinRabbit R1: https://youtu.be/ddTV12hErTc?si=tLR_GSXyRFtpgpJbHumane AI pin: https://youtu.be/TitZV6k8zfA?si=vI4mZMhN...
cross-posted from: https://lemmy.world/post/16792709
> I'm an avid Marques fan, but for me, he didn't have to make that vid. It was just a set of comparisons. No new info. No interesting discussion. Instead he should've just shared that Wired podcast episode on his X. > > I wonder if Apple is making their own large language model (llm) and it'll be released this year or next year. Or are they still musing re the cost-benefit analysis? If they think that an Apple llm won't earn that much profit, they may not make 1.
- Quantized model issues
Hey, so first off, this is my first time dabbling with LLMs and most of the information I found myself by rummaging through githubs.
I have a fairly modest set-up, an older gaming laptop with a RTX3060 video card with 6 GB VRAM. I run inside WSL2.
I have had some success running fastchat with the vicuna 7B model, but it's extremely slow, at roughly 1 word every 2-3 seconds output, with --load-8bit, lest I get a CUDA OOM error. Starts faster at 1-2 words per second but slows to a crawl later on (I suspect it's because it also uses a bit of the 'Shared video RAM' according to the task manager). So I heard about quantization which is supposed to compress models at the cost of some accuracy. Tried ready-quantized models (compatible with the fastchat implementation) from hugginface.co, but I ran into an issue - whenever I'd ask something, the output would be repeated quite a lot. Say I'd say 'hello' and I'd get 200 'Hello!' in response. Tried quantizing a model myself with exllamav2 (using some .parquet wikitext files also from hugginface for calibration) and then using fastchat but the problem persists. Endless repeated output. It does work faster, though at the actual generation, so at least that part is going well.
Any ideas on what I'm doing wrong?
- Guiding Language Models of Code with Global Context using Monitors
Language models of code (LMs) work well when the surrounding code in the vicinity of generation provides sufficient context. This is not true when it becomes necessary to use types or functionality defined in another module or library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating, e.g., using types defined in other files incorrectly. Recent work tries to overcome this issue by retrieving global information to augment the local context. However, this bloats the prompt or requires architecture modifications and additional training. Integrated development environments (IDEs) assist developers by bringing the global context at their fingertips using static analysis. We extend this assistance, enjoyed by developers, to the LMs. We propose a notion of monitors that use static analysis in the background to guide the decoding. Unlike a priori retrieval, static analysis is invoked iteratively during the entire decoding process, providing the most relevant suggestions on demand. We demonstrate the usefulness of our proposal by monitoring for type-consistent use of identifiers whenever an LM generates code for object dereference. To evaluate our approach, we curate PragmaticCode, a dataset of open-source projects with their development environments. On models of varying parameter scale, we show that monitor-guided decoding consistently improves the ability of an LM to not only generate identifiers that match the ground truth but also improves compilation rates and agreement with ground truth. We find that LMs with fewer parameters, when guided with our monitor, can outperform larger LMs. With monitor-guided decoding, SantaCoder-1.1B achieves better compilation rate and next-identifier match than the much larger text-davinci-003 model.
- [@gai](https://sopuli.xyz/c/gai) Adobe Firefly cannibalizes stock photo market for creators [https://venturebeat.com/ai/adobe-stock-creators-arent-happy-with-firefly-the-companys-commercially-safe-gen
@gai Adobe Firefly cannibalizes stock photo market for creators https://venturebeat.com/ai/adobe-stock-creators-arent-happy-with-firefly-the-companys-commercially-safe-gen-ai-tool/
- What are some differences between the kind of output you get from Microsoft's image generator and Midjourney?
With minimal tweaking, just giving relatively simple prompts to these, would you say one is measurably better than the other? in what ways? or is it more of a subjective judgement.
- KoboldAI discussion allowed in this group?
A nice fork from a main dev: https://github.com/henk717/KoboldAI
Main release: https://github.com/KoboldAI/KoboldAI-Client
- So... Alignment problem.
Thoughts? Ideas? How do we align these systems, some food for thought; when we have these systems do chain of reasoning or various methods of logically going through problems and coming to conclusions we've found that they are telling "lies" about their method, they follow no logic even if their stated logic is coherent and makes sense.
Here's the study I'm poorly explaining, read that instead. https://arxiv.org/abs/2305.04388