rufus

7 hr. ago

GDPR for Dummies. GDPR is DSGVO in German, "Datenschutz-Grundverordnung".

8 hr. ago

Why English language is sometimes "lazy", sometimes not

I think what you mean is compound words vs other words?

Wikipedia says there are lots of compound words in English.

Plaintiff is borrowed from Old French. Litigation from Latin...

I suppose it boils down to when and under what circumstances a term was needed to describe something. Sometimes there was a word from another language available. Or the whole subject came from a different culture. And sometimes they just described it with a compound of what it resembles. And how to make up terms probably also depends on what is en vogue at the time.

11 hr. ago

Anon meets his gf's parents

images(1)

2 days ago

Is there any real physical proof that Jesus christ ever existed?

My summary is oversimplified. I still think it's the correct answer to OP's question: is there physical evidence. Because there isn't anything physical. But there are written records from a bit later, suggesting that somebody with that name must have existed. Glad someone else thinks I picked the correct article. Seems it's not that easy to find good information. The English speaking internet is filled with low quality efforts to portray the facts in a way they'd like to have them.

I have a few good books though. Back when I was young (and became an atheist,) I used to read a lot about philosophy, the political message of the New Testament. And what life was like in that time.

3 days ago

Is there any real physical proof that Jesus christ ever existed?

Agree. But that specific article seems pretty alright. Also talks about the relics and history records for example by Tacitus.

There also is a Wikipedia article which I think is not written that well. And a lot of education material by churches or religious organizations which I did not cite for obvious reasons.

(And the German Wikipedia article about sources for the historicity of Jesus seems very good. But it's not exactly OP's question and I don't know if it helps: https://de.wikipedia.org/wiki/Außerchristliche_antike_Quellen_zu_Jesus_von_Nazaret )

3 days ago

Is there any real physical proof that Jesus christ ever existed?

https://www.history.com/news/was-jesus-real-historical-evidence

Tl;dr: No.

My opinion: It's a nice story. And with stories the most important thing is what it teaches us or makes us feel. Not that it's true. Maybe they took inspiration from several preaching hippies who lived back then and made one story out of that. Exaggerated everything and made stuff up. Probably all of it because the bible was't even written close to his supposed lifetime. It'd be like you now writing a story about a dude who died in ~~1870~~. Without any previous records to get information from. [Edit: The first things have probably been written down like 40-50 years after his death.]

And I mean if Jesus existed, he would certainly disapprove of what people do (and did) in his name.

3 days ago

Why are Republicans/Conservatives embracing fanaticalism these days?

American values aren't important to them. They want different things. A strong leader, be heard, simple truths and/or some people below them to hate and pick on.

3 days ago

What games did you have a good time with that you just never finished?

Yeah, Back when I was a kid we used to roam those point and click adventures a lot. We'd get stuck all the times. Sometimes a friend would have an idea and we'd make some progress. But the internet was slow and we still had dialup. I guess there weren't that much walkthroughs available. At least not in the parts of the internet I knew my way around... Yeah, and English is my second language. So information was kinda scarce antways, before I learned the main language of the internet in school... Everything developed and now I have everything available. I even have those old games sitting on an SD card on a Raspberry Pi and waiting for me. I really should finish that game at some point. On the flipside, I'm an adult now and it's difficult to find the time to play point and click adventures all day long 😆

And regarding the items: I tried ManiacMansion, which I believe is included in DOTT. That's properly impossible without an expert at your side.

4 days ago

What's the reason for the empty comments, lately?

It's a hashtag. Pretty much invisible but I can still see it.

4 days ago

What's the reason for the empty comments, lately?

Well that shouldn't happen. I guess it's technically possible because I can also change my own comments and they're not signed or anything... But changing other people's comments is kind if lying and being disingenuous. Either it's a technical issue, or your comment was so offensive that it warranted an exception to the rule, or the mods/admins don't know what is right and what is wrong.

4 days ago

What's the reason for the empty comments, lately?

Hmmh. I guess Lemmy should display that kind of information more prominently. But there isn't much progress UI wise... Guess you could just look it up in the modlog. Or do it like that 😆 Thanks for the report anyways.

4 days ago

Anyone tried a Matrix <-> Discord bridge?

bummer.

4 days ago

I was explaining to my daughter about the differences between Gimp and Photoshop and saw that Adobe had a page that claimed to compare the two. It never compares the two. It barely mentions Gimp.

I think they don't take inspiration from Photoshop. Either it's been a clone of a different product at some time or they developed it themselves. Hence the differences. I mean the whole UI doen't really resemble similarity to Photoshop.

4 days ago

What's the reason for the empty comments, lately?

Yes. I have like 3 different apps but I regularly use Eternity. I think you're right and a decent part if it is Eternity. Like half of the empty messages show up in other apps or the web interface. But not all of them. I don't quite think it's just deleted messages. Some others are definitely there and also don't show up in Eternity... Maybe it's a combination of factors. Honestly I didn't quite pay attention when I was using which app. I'm still trying to figure it out. But this definitely seems to be part of it.

4 days ago

Anyone tried a Matrix <-> Discord bridge?

Ah. maybe hand it over to the next person? I suppose people still need to switch painlessly? But I get it. We used to host lots of stuff in my university years. A forum, chat, classifieds, filesharing... A big photo album for all our pictures and events... As far as I know all of that has gone. Either due to lack of interest or nobody was able and willing to pick it up.

4 days ago

Anybody know why Anna's Archive is torrents only and not IPFS?

There are some blog posts on annas-blog.org from 2022, talking about IPFS.

5 days ago

What games did you have a good time with that you just never finished?

Day of the Tentacle

5 days ago

What's the reason for the empty comments, lately?

Ah, thanks. Maybe there's a few people like that out there.

5 days ago

Five Men Convicted of Operating Massive, Illegal Streaming Service That Allegedly Had More Content Than Netflix, Hulu, Vudu and Prime Video Combined

The article doesn't talk much at all about all the interesting technical details.

The press release talks about trouble with payment providers... So I suppose they accepted credit card payment.

Maybe the court documents are publicly available if anyone is willing to dig them up in order to find out... I don't think I'm that interested. If it's a good story, maybe someone will do a documentery or podcast episode at some point. Would probably do for a "true crime" show.

5 days ago

is it illegal to create a ticket on reddit?

I think most people here have went through the 5 stages of grief. And at this point they don't care anymore. At least not to the degree they used to. It's been a year. Life goes on. Don't waste your time on being negative and spamming someone who once let you down. Look forward and spend your time on something useful. At least that's my opinion.

But yeah, it's a question. I just think other people think it's pointless and they don't care. And some of them are going to downvote you for that even in No stupid questions. And lots of other people aren't going to upvote something like this. Hence resulting in that ratio.

Ask Lemmy @lemmy.world rufus @discuss.tchncs.de 5 days ago

What's the reason for the empty comments, lately?

In the last few weeks, I frequently see some empty comments. It's just the username and no text beneath.

Is there a deeper reason behind this? Do people nowadays strip away the text instead of deleting a comment? Or did some script surface that 'makes the internet forget'? First I thought people did this before deleting a comment and the deletion just didn't get federated. But I scrolled through some older posts and they also still have comments like that, so that can't be it. Right?

Can anyone educate me?

50

Matrix @lemmy.ml rufus @discuss.tchncs.de 5 days ago

Anyone tried a Matrix <-> Discord bridge?

Does it work well? Which one to choose? The official Matrix site shows 3 that seem maintained:

Does anyone have some insight? I don't want to try all of them.

Edit: I don't need anything super fancy like double puppeting. I just want the data from the several Discord communities I joined available through my Matrix server. And it's just me using it. But it should bridge the rooms properly and include the popular media formats, reactions etc.

15

LocalLLaMA @sh.itjust.works rufus @discuss.tchncs.de 2 wk. ago

[Paper] Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in SOTA Large Language Models

arxiv.org /abs/2406.02061

"Alice has N brothers and she also has M sisters. How many sisters does Alice’s brother have?"

The problem has a light quiz style and is arguably no challenge for most adult humans and probably to some children.

The scientists posed varying versions of this simple problem to various State-Of-the-Art LLMs that claim strong reasoning capabilities. (GPT-3.5/4/4o , Claude 3 Opus, Gemini, Llama 2/3, Mistral and Mixtral, including very recent Dbrx and Command R+)

They observed a strong collapse of reasoning and inability to answer the simple question as formulated above across most of the tested models, despite claimed strong reasoning capabilities. Notable exceptions are Claude 3 Opus and GPT-4 that occasionally manage to provide correct responses.

This breakdown can be considered to be dramatic not only because it happens on such a seemingly simple problem, but also because models tend to express strong overconfidence in reporting their wrong solutions as correct, while often providing confabulations to additionally explain the provided final answer, mimicking reasoning-like tone but containing nonsensical arguments as backup for the equally nonsensical, wrong final answers.

4

Matrix @lemmy.ml rufus @discuss.tchncs.de 1 mo. ago

Which Matrix server implemetation do I choose in 2024?

Hey, I'm fortunate enough to upgrade my home-server. I'd like to make some future-proof decisions. Which Matrix server do I choose? Dendrite? Conduit?

I like the candidate to be halfway well maintained, have active development during the next 4 years... Would be nice if it had a solid technological base and wouldn't hog that many resources. I'm okay if it still has some rough edges as long as they get ironed out in the near future.

It needs to provide service to me and a few friends and family. Audio calls would be nice and I definitely need it to connect to the Mautrix-WhatsApp/-Signal bridges.

Did I miss something? Is there another good server implemention apart from Synapse/Dendrite/Conduit?

Bonus questions:

Is Synapse the only server that can connect to SSO? Ideally I would like to maintain the user accounts via Authentik/LDAP/...whatever...
Is there a server that can handle multiple domains? Like an e-mail server that I can just tell "you handle mail for the following 5 domain names?"

5

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ @lemmy.dbzer0.com rufus @discuss.tchncs.de 2 mo. ago

Which *arr for file hosters?

I'm German and seems 'we' rely more on file hosters than torrenting. There are lots of tv series and movies with both the original audio track and the dubbed one on sites like funxd, serienjunkies, serienfans... They mostly redirect to a filecrypt.cc folder and then I get a DLC file to download the parts from turbobit or rapidgator (one-click hosters.)

What setup am I looking for, if I were to automate this? I'm aware of the Megathread but I didn't find the correct software to index those sites and then what kind of download manager people use nowadays. (Ah yes, and I don't want to pay for premium accounts.)

Edit: Replaced "one-click hosters" with "file hosters"

26

LocalLLaMA @sh.itjust.works rufus @discuss.tchncs.de 4 mo. ago

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

huggingface.co Paper page - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Join the discussion on this paper page

From the abstract: "Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}."

Would allow larger models with limited resources. However, this isn't a quantization method you can convert models to after the fact, Seems models need to be trained from scratch this way, and to this point they only went as far as 3B parameters. The paper isn't that long and seems they didn't release the models. It builds on the BitNet paper from October 2023.

"the matrix multiplication of BitNet only involves integer addition, which saves orders of energy cost for LLMs." (no floating point matrix multiplication necessary)

"1-bit LLMs have a much lower memory footprint from both a capacity and bandwidth standpoint"

Edit: Update: additional FAQ published

18

Open Source @lemmy.ml rufus @discuss.tchncs.de 5 mo. ago

Is there an Online Shop / eCommerce solution that is entirely Free Software?

I'd like to play around a bit with an online shop. Nothing professional with proper requirements, just a hobby project. When googling for open source e-Commerce solutions, I can find the usual software. But I don't like open core models, and all the projects seem to want to make some money with an add-on marketplace. And most of the times the basic product seems very limited and they want you to extend it with proprietary extensions to get it usable in real-world scenarios.

Is there a project that does things differently? I mean for invoices I can choose between several platforms that won't push me to buy anything. I just can't find an online shop solution like that. My requirements would be something along: Sells products and keeps track of remaining stock, maybe sells services like online courses and software/pdf downloads. Can generate invoices and ties into payment providers. Maybe generates shipping labels. Isn't too bloated, a small, nice and clean hobby project will do. I'd like to avoid running a Wordpress/Drupal/Joomla underneath it if possible.

I get that companies have different requirements and commercial products are somewhat the obvious thing if you're doing commerce. But there has to be something aligned with the virtues of the free software community. Something I'd like to use to sell Tux stickers and power my Etsy shop with.

20

LocalLLaMA @sh.itjust.works rufus @discuss.tchncs.de 8 mo. ago

New "Context Shifting" feature in KoboldCPP 1.48

github.com Release koboldcpp-1.48 · LostRuins/koboldcpp

koboldcpp-1.48 Harder Better Faster Stronger Edition NEW FEATURE: Context Shifting (A.K.A. EvenSmarterContext) - This feature utilizes KV cache shifting to automatically remove old tokens from con...

"This feature utilizes KV cache shifting to automatically remove old tokens from context and add new ones without requiring any reprocessing."

This means a major speed increase for people like me who rely on (slow) CPU inference (or big models). Consider a chatbot scenario and a long chat where old lines of dialogue need to be evicted from the context to stay within the (4096 token) context size. Previously the context had to be re-computed starting with the first changed/now missing token. This feature detects that, deletes the affected tokens from the KV cache and shifts the subsequent tokens in the KV cache so it can be re-used. Avoiding a computationally expensive re-calculation.

This is probably also more or less related to recent advancements like Streaming-LLM

This won't help once text gets inserted "in the middle" or the prompt gets changed in another way. But I managed to connect KoboldCPP as a backend for SillyTavern/Oobabooga and now I'm able to have unlimited length conversations without waiting excessively, once the chat history hits max tokens and the frontend starts dropping text.

It's just a clever way to re-use the KV cache in one specific case. But I've wished for this for quite some time.

2

LocalLLaMA @sh.itjust.works rufus @discuss.tchncs.de 8 mo. ago

Nearly 10% of people ask AI chatbots for explicit content. Will it lead LLMs astray? [Article from October 3]

www.zdnet.com Nearly 10% of people ask AI chatbots for explicit content. Will it lead LLMs astray?

Beyond programming tips and writing help, a million conversations reflect people's desires for other kinds of 'usafe' information. Here's what researchers are doing about it.

They are referencing this paper: LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset from September 30.

The paper itself provides some insight on how people use LLMs and the distribution of the different use-cases.

!

The researchers had a look at conversations with 25 LLMs. Data is collected from 210K unique IP addresses in the wild on their Vicuna demo and Chatbot Arena website.

18

LocalLLaMA @sh.itjust.works rufus @discuss.tchncs.de 9 mo. ago

Mistral 7B model

mistral.ai Mistral 7B

The best 7B model to date, Apache 2.0

Yesterday Mistral AI released a new language model called Mistral 7B. @justnasty@lemmy.kya.moe already posted the Sliding attention part here in LocalLLaMA, yesterday. But I think the model and the company behind that are even more noteworthy and the release of the model is worth it's own post.

Mistral 7B is not based on Llama. And they claim it outperforms Llama2 13B on all benchmarks (at it's size of 7B). It has additional coding abilities and a 8k sequence length. And it's released under the Apache 2.0 license. ~~So truly an 'open' model, usable without restrictions.~~ [Edit: Unfortunately I couldn't find the dataset or a paper. They call it 'open-weight'. So my conclusion regarding the open-ness might be a bit premature. We'll see.]

(It uses Grouped-query attention and Sliding Window Attention.)

Also worth to note: Mistral AI (the company) is based in Paris. They are one of the few big european AI startups and collected $113 million funding in June.

Details are on Mistral AI's Announcement
techcrunch news article including information about the company
They released an base/foundation model and an instruction-tuned one on HuggingFace
And llama.cpp is already compatible and GGUF versions out there.

I've tried it and it indeed looks promising. It certainly has features that distinguishes it from Llama. And I like the competition. Our world is currently completely dominated by Meta. And if it performs exceptionally well at its size, I hope people pick up on it and fine-tune it for all kinds of specific tasks. (The lack of a dataset and detail regarding the training could be a downside, though. These were not included in this initial release of the model.)

---

EDIT 2023-10-12: Paper released at: https://arxiv.org/abs/2310.06825 (But I'd say no new information in it, they mostly copied their announcement)

As of now, it is clear they don't want to publish any details about the training.

8

LocalLLaMA @sh.itjust.works rufus @discuss.tchncs.de 10 mo. ago

Pygmalion-2 has been released

pygmalionai.github.io Pygmalion-2

It’s been months upon months since a major announcement like this, but we’ve finally done it: new model releases. Introducing our new models: Pygmalion-2 in 7B, and 13B sizes. Where We’ve Been Link to this heading The burning question on many peoples’ minds is likely “where have we been?” Why haven’...

I might be a bit late to the party, but for those of you that like ERP and fiction writing:

Introducing Pygmalion-2

The people from Pygmalion have released a new model, usable for roleplaying, conversation and storywriting. It is based on Llama 2 and has been trained on SFW and NSFW roleplay, fictional stories and instruction following conversations. It is available in two sizes, 7b and 13b parameters. They're also releasing a mix with MythoMax-L2 called Mythalion 13B.

Furthermore they're (once again) announcing a website with character sharing and inference (later in october.)

For reference: Pygmalion-6b has been a well known dialogue model for (lewd) roleplay in the times before LLaMA. It had been followed up with an underwhelming successor based on LLaMA (Pygmalion-7b). In their new blogpost they promise to have improved with their new model.

(Personally, I'm curious how it performs compared to MythoMax. There aren't many models around, that excel at roleplay or have been designed specifically for that use case.)

10

LocalLLaMA @sh.itjust.works rufus @discuss.tchncs.de 10 mo. ago

SeamlessM4T — Massively Multilingual and Multimodal Machine Translation

ai.meta.com Bringing the world closer together with a foundational multimodal model for speech translation

SeamlessM4T provides high-quality translation, allowing people from different linguistic communities to communicate effortlessly through speech and text.

Meta just released a multimodal model for speech translation. It can do speech recognition, translation into text and speech. Supporting nearly 100 input and output languages (35 for speech output). Seamless M4T is released under CC BY-NC 4.0

Abstract

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems composed of multiple subsystems performing translation progressively, putting scalable and high-performing unified speech translation systems out of reach. To address these gaps, we introduce SeamlessM4T—Massively Multilingual & Multimodal Machine Translation—a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations, dubbed SeamlessAlign. Filtered and combined with human labeled and pseudo-labeled data (totaling 406,000 hours), we developed the first multilingual system capable of translating from and into English for both speech and text. On Fleurs, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous state-of-the-art in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. On CVSS and compared to a 2-stage cascaded model for speech-to-speech translation, SeamlessM4T-Large’s performance is stronger by 58%. Preliminary human evaluations of speech-to-text translation outputs evinced similarly impressive results; for translations from English, XSTS scores for 24 evaluated languages are consistently above 4 (out of 5). For into English directions, we see significant improvement over WhisperLarge-v2’s baseline for 7 out of 24 languages. To further evaluate our system, we developed Blaser 2.0, which enables evaluation across speech and text with similar accuracy compared to its predecessor when it comes to quality estimation. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks (average improvements of 38% and 49%, respectively) compared to the current state-of-the-art model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Compared to the state-of-the-art, we report up to 63% of reduction in added toxicity in our translation outputs. Finally, all contributions in this work—including models, inference code, finetuning recipes backed by our improved modeling toolkit Fairseq2, and metadata to recreate the unfiltered 470,000 hours of SeamlessAlign — are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication.

Paper
Blog

0

Linux @lemmy.ml rufus @discuss.tchncs.de 10 mo. ago

Can you recommend a lightweight matrix chat client to me?

My laptop is getting old and i can't have Element eat up half of my RAM. There are many more clients out there but which one is good? aka "the best? ;-)

My requirements: lightweight, encryption 100% supported, active development/community. runs neatly 24/7 in the background.

Should also support the latest features, let me customize when to get notifications: priorities / muted chatrooms. And ideally also look clean and run on the Pinephone. But that's optional.

I don't care which desktop environment or cli.

What do you use?

41

LocalLLaMA @sh.itjust.works rufus @discuss.tchncs.de 11 mo. ago

What have you been up to recently with your local LLMs?

Things are still moving fast. It's mid/late july now and i've spent some time outside, enjoying the summer. It's been a few weeks since things exploded in the month of may this year. Have you people settled down in the meantime?

I've since then moved from reddit and i miss the LocalLlama over there, that was/is buzzing with activity and AI news (and discussions) every day.

What are you people up to? Have you gotten tired of your AI waifus? Or finished indexing all of your data into some vector database? Have you discovered new applications for AI? Or still toying around and evaluating all the latest fine-tuned variations in constant pursuit of the best llama?

10