Reddit blocking all major search engines, except Google
Reddit blocking all major search engines, except Google

Reddit blocking all major search engines, except Google

Reddit blocking all major search engines, except Google
Reddit blocking all major search engines, except Google
Hi, I'm new here. Because of the bullshit with Reddit. Greetings fellow Lemmy people.
Welcome aboard. It's not much, but she's got it where it counts.
Thank you very much. I'm liking it.
In the wubba-wubba
And me, hello!
Hi!
Welcome!
And my vuvuzela?
Welcome new lemmings!
Thanks
I got tired of the censorship and blatant disrespect for the end user. Also justiceserved and the constant spam messages from the mods there, never been a member of that community and i just wanted them to stop harassing me. Called me a nazi and some other stuff for participating in mandela effect subreddit.lots of quacks there but really now, a nazi?
Edit:i mean it's deeper than that but they we're very hateful and reddit muted me for 3 days over...nothing? They even actively seeked out my username on social media and attacked me there through private messages and fake accounts and when i brought this to reddit attention they muted me .
Welcome! Genuine advice for a newcomer: look around, figure out what instances you like, and shift away from lemmy.world to an instance that requires a sign-up request and which comports with your values. There is an account migration feature to make this as easy as possible.
It's different to what people are used to, but in my experience a huge number of the worst people migrating from reddit went straight to one of the open instances. A lot of them were banned over there for quite legitimate reasons.
They know that they can't operate their own asshole instances for long because they'll get defederated, and they don't want to deal with being known to an admin who has actual principles, so open sign up is their thing, and those instances are filling up with them.
Honestly I would like to see a feature that flags if a user's instance has open sign up.
It's getting to the point that if someone is still on an open instance, they're a little sus to me. It's easier to trust people who come from instances whose policies I agree with.
I mean I joined lemmy.world in the migration from Reddit and haven't really seen any problems with being here. I tried joining one of the ones that needed a sign up request when I first switched to Lemmy but I didn't want to have to deal with waiting to use Lemmy. I haven't really noticed any problems being on lemmy.world and personally I don't even look at what instances people are from. I just treat it like reddit, we're all using Lemmy at the end of the day.
Bro... What?!? I've only been here a day and I have no clue what any of that means lol
Honestly I would like to see a feature that flags if a user's instance has open sign up.
It's getting to the point that if someone is still on an open instance, they're a little sus to me. It's easier to trust people who come from instances whose policies I agree with.
You know people can just lie though, right? It's not like that's the one magical thing that would "fix Lemmy" or something lol.
Thanks for the info. I'll stay here for a while and see how everything goes.
I don't mind assholes as I think that's just a part of freedom of speech. And I'd rather not get too much moderated content as I think it creates too much of a filter bubble.
Let's two of them die together
Blocking other search engines will hurt Reddit, all else held equal. But not by that much. Google is seriously dominant in the search engine market.
kagis
Yeah.
https://gs.statcounter.com/search-engine-market-share
According to this, Google has 91.06% of the search engine market. So for Reddit, they're talking about cutting themselves off from a little under 9% of people searching out there. Which...I mean, it isn't insignificant, but it isn't likely gonna hurt them all that badly.
It's also worth noting that the 9% they cut off was probably the group more inclined to already be using alternatives to Reddit anyways.
Yeah I thought the same so it’s good to see the numbers. I don’t think people realize that to support a search engine means letting them crawl your pages which means serving all your pages to them, which costs server resources. A lot of sites get more crawler load than load from actual users viewing pages. It’s a real cost.
Still, you’d think they could manage to support DuckDuckGo at least. Or a small set of search giants to give some appearance of supporting competition.
with threads too
One only can hope, but until people learns that you can use other browser and other search engine not likely (I am talking on Google side ofc, Reddit might be affected by this in the long run).
I've posted this elsewhere, but it bears repeating:
Just use ddg bangs if you use Duckduckgo and you can search reddit directly.
sh
!reddit search term
or:
sh
!r search term
It still picks up latest posts related to reddit, it just searches reddit directly instead of searching Bing's results. It's that simple.
You can even use a redirect extension like Libredirect in conjunction with this Duckduckgo feature to redirect your search to a privacy respecting frontend like redlib.
DDG is awesome, been using it for years.
I used to sneer at the kids in my class that used it. Must have been fairly shortly after it launched, something like fourteen to fifteen years ago. I'm still grappling with a certain inertia when it comes to switching away from something I have relied on for so long, but I'm coming around to the idea of giving DDG a try at least (irrational as it is, I've been reluctant to even try - I suspect out of fear of liking it and having to change).
Past Me would be exasperated that Present Me is even toying with the idea. But then, Past Me had a lot of stupid takes anyway.
I think !reddit just sends you directly to reddit and uses reddit's search engine, which has been infamously bad. Has that changed? It doesn't seem to be quite the same as appending "reddit" to queries to search for reddit posts, but using better search engines.
Honestly, reddit's search engine is okay, but yeah it doesn't get as exact as standard search engines because I think it prioritizes keywords from the post title over comments and also prioritizes most recent posts over subject relevance. That said, the old reddit posts are still going to be accessible via standard not google search engines.
I'll admit this is somewhat of a bandaid fix, as should reddit keep this deal with google going, eventually this workaround will prove less effective than it currently is.
This workaround just gets you the newest posts related to your query, and otherwise, for older posts, the search term reddit in search engines is still superior. So I don't know, it's the best solution I can think of for now.
Libredirect is great, just added it to firefox! I can finally watch all those tiktok links people send me lol
& for anyone else thinking of trying it, if a site won't load change your default proxy instance :)
Yeah, I do wish they incorporated nitter as well, but otherwise it's got every privacy respecting frontend and has a lot of public instances in their default listings. One of the best extensions I've come across.
FUCK u/spez
Reddit responded: "Only google pays us". The content is not yours. You built this of naive user base that just wanted to share now these fuckers are taking it as their entitlement. As early an reddit user - fuck that place, I'm still angry.
should fight in court that it's not reddit's content. it belongs to the people not steve fuck face.
I'm sure the reddit TOS you agreed to during signup says otherwise....
Legally speaking, the content is theirs.
No, I don't think so. Just because you put a clause in ToS doesn't make it legally binding and most precedent is in favor of the original copyright owner.
If someone posts a copyright violation on YouTube, YouTube can go free under the safe harbor provisions of the DMCA. (In the US.) YouTube just points a finger at the user and says "it's their fault", because the user owns (or claims to own) the content. YouTube is just hosting it.
I don't know of any reason to think it's not the same for written works. User posts them, Reddit hosts them, user still owns them. Like YouTube, the user gives the host a lot of license for that content, so that they can technically copy and transmit it. But ultimately the user owns it. I assume by the time Reddit made the AI deal they probably put in wording to include "selling a copy of the data" to active they want in the TOS.
Now, determining if the TOS holds up in court is of course trickier. And did they even make us click our permission away again after they added it, it just change something we already clicked? I don't recall.
Been on Reddit since like 2009-ish. You completely nailed the point.
That just means the dumbasses will get even less traffic. Way to shoot yourself in the foot, Spazz.
I wish we had a government that functioned. This shot is 100% antitrust. How is it that this shit is let fly.
Antitrust would be the opposite.
Around here we love the idea of Reddit being totally devoid of life but the fact is it's still one of the most active public facing sites on the web. The attrition to sites like Lemmy is pretty negligible to the overall Reddit activity and bot AI activity only really affects the largest subreddits which have always been a bit spammy and click batey. The medium and small subreddits are still full of active people. Don't get me wrong, Lemmy is my daily driver for this content but I won't pretend everyone fled Reddit for this.
Additionally, exclusivity with Google isn't necessary just to keep the search results but to prevent their biggest AI competition ChatGPT and their ties to Microsoft from getting access to what is the Internet's largest database of public facing conversation.
I wonder what kind of contract they went with.
SAN FRANCISCO, Feb 21 (Reuters) - Social media platform Reddit has struck a deal with Google (GOOGL.O) , opens new tab to make its content available for training the search engine giant's artificial intelligence models, three people familiar with the matter said.
The contract with Alphabet-owned Google is worth about $60 million per year, according to one of the sources.
For perspective:
https://www.cbsnews.com/news/google-reddit-60-million-deal-ai-training/
In documents filed with the Securities and Exchange Commission, Reddit said it reported net income of $18.5 million — its first profit in two years — in the October-December quarter on revenue of $249.8 million.
So if you annualize that, Reddit's seeing revenue of about $1 billion/year, and net income of about $74 million/year.
Given that Reddit granting exclusive indexing to Google happened at about the same time, I would assume that that AI-training deal included the exclusivity indexing agreement, but maybe it's separate.
My gut feeling is that the exclusivity thing is probably worth more than $60 million/year, that Google's probably getting a pretty good deal. Like, Google did not buy Reddit, and Google's done some pretty big acquisitions, like YouTube, and that'd have been another way for Google to get exclusive access. So I'd think that this deal is probably better for Google than buying Reddit. Reddit's market capitalization is $10 billion, so Google is maybe paying 0.6% the value of Reddit per year to have exclusive training rights to their content and to be the only search engine indexing them; aside from Reddit users themselves running into content in subreddits, I'd guess that those two forms are probably the main way in which one might leverage the content there.
Plus, my impression is that the idea that a number of companies have -- which may or may not be valid -- is that this is the beginning of the move away from search engines. Like, the idea is that down the line, the typical person doesn't use a search engine to find a webpage somewhere that's a primary source to find material. Instead, they just query an AI. That compiles all the data that it can see and spits out an answer. Saves some human searcher time and reduces complexity, and maybe can solve some problems if AIs can ultimately do a better job of filtering out erroneous information than humans. We definitely aren't there yet in 2024, but if that's where things are going, I think that it might make a lot of strategic sense for Google. If Google can lock up major sources of training data, keep Microsoft out, then it's gonna put Microsoft in a difficult spot if Microsoft is gunning for the same thing.
At least on some smaller subs, there seems to be a suspicious amount of brand new accounts asking one question to get human answers.
It would not surprise me if reddit, or some other service, are seeding to get more LLM-able content. Of course, this might backfire if people start giving stupid answers to eff up the data.
If I'm not mistaken, Reddit has actual staff centered around asking questions to get engagement in small communities. Not so much for LLM reasons but to actually grow those communities (and thus edge out competition).
Is there a downside? I’m confused.
this is just going to cause indexers to ignore robots.txt
"We always obey the robots.txt"
Rate limiting could “fix” that unfortunately.
They're likely blocking user agents too, which I think also doesn't have legal enforcement (as in DuckDuckGo can just use "Google" unless they said otherwise.
LinkedIn tried blocking scraping that way but as long as the scraping isn't burdensome it's basically legal but you can still be bound by TOS and civil claims
https://natlawreview.com/article/hiq-and-linkedin-reach-proposed-settlement-landmark-scraping-case
I wish Lemmy were searchable better. The search function actually works decently well, but it's not on the same level of actual search engines, it doesn't seem to look for related/similar terms and also relevancy doesn't seem right.
I do occasionally find Lemmy in web search results. The platform is not that big (or old), but as long as it sticks around then eventually searchability will improve.
Kagi has a fediverse search option, kinda nifty, wish it wasn't an either/or situation tho
Google just enshittifying even harder. Reddit results in Google searches are often old and anemic these days.
I used to want Reddit threads to show up in search results. Now I avoid them because they are so often a waste of time. More reason to use Duck Duck Go.
I saw Reddit results in a search last night using DDG. It just said something like "It's here on Reddit, but we're not allowed to show you." I wasn't planning on using Reddit (never again), but that just irritated me.
Ah, so Google signed a contract with the company that trained their AI to ... (checks notes) ... suggest putting glue on pizza.
Sounds like a perfect match.
I'd look at what will be, rather than what is. I think that it's probably not controversial to say that AI is going to improve; these are early days. The question is to what extent.
If one is to assume that AI will improve very little over time, that ten years from now the kind of responses that you'll get generated by a computer ten years hence in response to a question will be about the same as they are today, then, yeah, it's probably an error to commit major resources to AI stuff or to expend resources acquiring training data for it.
But that assumption may not hold.
IMO, another good reason to not use Google!
Bing it is then. I hate Microsoft with the intensity of thousand suns but bing is now my jam as long as this lasts.
Try duckduckgo
DuckDuckGo also uses Bing under the hood.
Bing by any other name is still bing.
Edit: Awww some people either don't know or don't like that bing is what duckduckgo is. https://www.tomshardware.com/software/search-engines/microsoft-suffering-from-outage-bing-copilot-and-duckduckgo-inaccessible-for-several-hours
I've started a Kagi subscription for my new search engine. Basically $6 USD per month but because it's a user-pay model they have a really good privacy policy and don't sell/analyze your data.
It's currently better than Google (which I still use search in the maps for reviews)
I work for a different sort of company that hosts some publicly available user generated content. And honestly the crawlers can be a serious engineering cost for us, and supporting them is simply not part of our product offering.
I can see how reddit users might have different expectations. But I just wanted to offer a perspective. (I'm not saying it's the right or best path.)
Can you use something like the DDOS filter to prevent AI automated scrapings (too many requests per second)?
I'm not a tech person so probably don't even know what I'm talking about.
I worked with a company that used product data from competitors (you can debate the morals of it, but everyone is doing it). Their crawlers were set up so that each new line of requests came from a new IP.. I don’t recall the name of the service, and it was not that many unique IP’s but it did allow their crawlers to live unhindered..
They didn’t do IP banning for the same reasoning, but they did notice one of their competitors did not alter their IP when scraping them. If they had malicious intend, they could have changed data around for that IP only. Eg. increasing the prices, or decreasing the prices so they had bad data..
I’d imagine companies like OpenAI has many times the IP, and they’d be able to do something similarly.. meaning if you try’n ban IP’s, you might hit real users as well.. which would be unfortunate.
Blocking bots is hard, because with some work they can be made to look like users, down to simulating curved mouse movements from one button to the next if you are really ambitious.
We have a variety of tactics and always adding more
Still seems to work on Kagi
Kagi is a search aggregator, so those results are from Google.
Kagi is a search engine. They do their own indexing, and they aggregate search results.
It's right in their docs.
https://help.kagi.com/kagi/search-details/search-sources.html
You sure you’re not thinking of searxng?
Makes sense they've spent years curating other people's content and are now selling it..... Oh wait 😯.
Oh well. Time to post more questions on lemmy
With all the botting going on on Reddit, this whole Google AI deal makes me think of the recent paper that demonstrates that, as common sens would suggest, deep learning models collapse when successive generations are trained on the previous generations' output
Just like Reddit's changes last year, seems like a clear and reasonaly expected consequence of the 'our text is so valuable because AI' idea.
The web will probably continue to become more gated and more fragmented as a result of that, plus trying to get more control to force ads.
Reddit is asking for Europe to deem it a very large platform now that it's gatekeeping like this
Block Reddit!
But muh porn!
Exactly. You're addicted, Plopp.
The shackles and manacles were made of gold, but they were still there.
Ok so they are earning on our data
You just described every company
Honestly? I'd be happy to not see their trash in any search engine I use.
https://addons.mozilla.org/en-US/firefox/addon/g-search-filter/
Install this and exclude it from all search results.
This one works better: https://addons.mozilla.org/en-US/firefox/addon/hohser/ - more supported sites, and it doesn't break as often.
Why not change your search engine and set up a SearX instance? You can find all instances here: https://searx.space. For example, I have set it up like this: https://search.inetol.net/search?q=%scategorygeneral=1language=entimerange=safesearch=0theme=simple, and it works wonders. Results are still mostly from Google, or you can configure it to be whatever you want.
Thanks! Will give it a try.
am gonna exclude reddit
I'm seldom on reddit after the exodus, but when I am, I noscript the duck out of it.
You quack.
Actually, he doesn't, since he's removing the duck (and shipping it off to DuckDuckGo for reuse, no doubt).
I'm kind of curious to understand how they're blocking other search engines. I was under the impression that search engines just viewed the same pages we do to search through, and the only way to 'hide' things from them was to not have them publicly available. Is this something that other search engines could choose to circumvent if they decided to?
Search engine crawlers identify themselves (user agents), so they can be prevented by both honor-based system (robots.txt) and active blocking (error 403 or similar) when attempted.
Couldn't a search engine just aggregate the result from Google, filter the Reddit responses, and then add those results to their own organic results?
It's still possible to search with "site:reddit.com ..."
Has it been implemented yet or are they blocking non-flagged searches? Which seems odd.
You shouldn't be getting any new results if you do that, older posts will/may remain indexed.
Aha. I was wondering about that possibility.
Hot take here.
I do believe in free information.
Instead of investing money in stop crawlers why do not make the data they are trying to crawl available to everyone for free so we can have a better world all together?
Data transfer isn't free. It costs real money and energy to respond to queries. Don't be surprised to see ~50% of all requests made to your server be from bots which you may have no interest in servicing outside of search engine indexers.
If you publish your data in a friendly manner bots would have no need to crawl your site.
Data that is more interesting and requested a lot could even be served over p2p.
This moderl would generate less cost that dealing with constant bot scrappers.
It is not a technical discussion. Or a discussion about associated cost. It's a discussion about morals and economic models.
They're also blocking posts by users who aren't banned or even got a warning. It appears to the user as though it's been posted, but it hasn't.
shadowbanning is a totally different issue that's existed for a long time though.
Net neutrality?
I don't have any more info on it, but I can prove it
How many times is this going to be posted? I've seen this several times now over the past few days.
Sorry, I haven't seen it. If it's been posted here before, Send me the link to the previous post, and I'll take this one down. Even better, you can report the post, and the mods will investigate it.
Thank you!
Since you asked, here are the other four times it was posted.
There was a fifth one, but that one has since been removed.