Robots.txt has been around for a long time, and all the major search engines will honor it. Not having a full index of the Web is the norm.
That isn't to say that the practice of signing agreements isn't potentially a concern. Not sure that I like the idea of search engines paying sites money to degrade search results of competitors.
Kagi has a "search lens" specifically to search the Threadiverse. Like, they track lemmy/kbin/etc sites and you can specifically include them in their own results section, and can also have "!threadiverse" or whatever you want specifically search that.
They do the same for Usenet.
I suppose, given this new robots.txt Reddit development, that they'll probably never have a Reddit lens, though.
Kagi is a metasearch-engine (apart from their homebrew small-web index, known as Teclis), so the reddit lenses will continue to function long as one of the search engines it's querying is paying reddit.
Older results will still show up, but these search engines are no longer able to “crawl” Reddit, meaning that Google is the only search engine that will turn up results from Reddit going forward.
Robots.txt lets you ask specific user-agents not to index the site. My guess is that that's how they restricted it. I don't know how those changes are reflected in existing indexed pages -- don't know if there's any standard there -- but it'll stop crawlers from downloading new pages.
Try searching for new posts, see how DDG/Bing compares to Google.
EDIT: Yeah. They've got a sitewide ban for all crawlers. That'd normally block Google's bot too, but I bet that they have some offline agreement to have it ignore the thing, operate out-of-spec.
iirc, isn't robots.txt more of a gentlemen's agreement? I vaguely recall bots being able to crawl a site regardless, it's just that most devs respect robots.txt and don't. Could be wrong though, happy to be corrected.
Well that's annoying. One work around is to use a redirect extension like Libredirect and you can still search via the !reddit bang on DuckDuckGo. Thusly if I type into my search bar which has DuckDuckGo as default:
!reddit some new post or topic, it will search reddit for the search term, then when it attempts to load the reddit page, the libredirect extension will redirect and show the results.
Requires a bit of configuring and sure is annoying, but hey, no Google search necessary to get the up to date reddit threads.