Social network Bluesky has exploded in popularity since Twitter users jumped ship en masse after the US election. We’ve been on Bluesky since it was invite-only and we can assure you: Bluesky users…
So they named the product sucking the data after the Facehugger? At least they know that they are in the abomination business. Will they be releasing an AI named Bursting Chest?
The company was named after the U+1F917 🤗 HUGGING FACE emoji.
HF is more of a platform for publishing this sort of thing, as well as the neural networks themselves and a specialized cloud service to train and deploy them, I think. They are not primarily a tool vendor, and they were around well before the LLM hype cycle.
my colleagues are kind, caring people & they were attacked (idc if I get attacked so long as it doesn't touch my company/colleagues)
we've always seen love for our work, this incident shocked me
we'll keep shipping 📦💗 can't satisfy all
Don't take out your frustration from election results on them, LOSERS
it’s really jarring seeing one of the biggest hosts for generative AI projects simultaneously do “we’re just an uwu smol bean open source passion project why are you attacking us” while boosting and officially supporting chan-coded fash shit from an e/acc account
Update: Following the publication of this article on Tuesday evening, van Strien removed the dataset. "I've removed the Bluesky data from the repo," he wrote on Bluesky. "While I wanted to support tool development for the platform, I recognize this approach violated principles of transparency and consent in data collection. I apologize for this mistake."
This is a bit of a nuanced issue though. The person merely published a dataset made from publicly available data than anyone can re-create themselves using the Bluesky Firehose API. Could it be used to train a model? Yes, but that isn't the only use case and the person who posted it has no control over what other people use it for. If someone does train a model using it then that's their legal issue to work out, not the publisher's.
It's the same argument billionaires were using to justify silencing people who posted the movement of their private jets. The billionaires argued that this data could be used to harass them, but the posters argued the data is public and they aren't responsible for what other people do with it.
The legal system is the perfect place for working out nuanced issues like this.
If I were a lawyer and making this lawsuit I would argue that "publicly available" does not mean "public domain", and that without acquiring usage rights for the data then you don't have the right to use the data.
If the courts rule against a decision like this then that would mean that any website that hosts any materials that can be accessed without an account must then provide that material to any person who accesses it free of charge which is a gigantic consequence to this nuanced issue.