Inuyasha often said he was evil and played the tough guy so he would be left alone, but he was usually compassionate and had a soft side.
I've been doing this for 30+ years and it seems like the push lately has been towards oversimplification on the user side, but at the cost of resources and hidden complexity on the backend.
As an Assembly Language programmer I'm used to programming with consideration towards resource consumption. Did using that extra register just cause a couple of extra PUSH and POP commands in the loop? What's the overhead on that?
But now some people just throw in a JavaScript framework for a single feature and don't even worry about how it works or the overhead as long as the frontend looks right.
The same is true with computing. We're abstracting containers inside of VMs on top of base operating systems which is adding so much more resource utilization to the mix (what's the carbon footprint on that?) with an extremely complex but hidden backend. Everything's great until you have to figure out why you're suddenly losing packets that pass through a virtualized router to linuxbridge or OVS to a Kubernetes pod inside a virtual machine. And if one of those processes fails along the way, BOOM! it's all gone. But that's OK; we'll just tear it down and rebuild it.
I get it. I understand the draw, and I see the benefits. IaC is awesome, and the speed with which things can be done is amazing. My concern is that I've seen a lot of people using these things who don't know what's going on under the hood, so they often make assumptions or mistakes that lead to surprises later.
I'm not sure what the answer is other than to understand what you're doing at every step of the way, and always try to choose the simplest route (but future-proofed).
Technically, each time that it is viewed it is a republication from copyright perspective. It's a digital copy that is redistributed; the original copy that was made doesn't go away when someone views it. There's not just one copy that people pass around like a library book.
Again, isn't that the site's prerogative?
I think there should at least be a recognized way to opt-out that archive.org actually follows. For years they told people to put
User-agent: ia_archiver
Disallow:
in robots.txt, but they still archived content from those sites. They refuse to publish what IP addresses they pull content down from, but that would be a trivial thing to do. They refuse to use a UserAgent that you can filter on.
If you want to be a library, be open and honest about it. There's no need to sneak around.
Like I said, I have no problems with individuals archiving it and not republishing it.
If I take a newspaper article and republish it on my site I guarantee you I will get a takedown notice. That will be especially true if I start linking to my copy as the canonical source from places like Wikipedia.
It's a fine line. Is archive.org a library (wasn't there a court case about this recently...) or are they republishing?
Either way, it doesn't matter for me any more. The pages are gone from the archive, and they won't archive any more.
Shouldn't that be the content creator's prerogative? What if the content had a significant error? What if they removed the page because of a request from someone living in the EU requested it under their laws? What if the page was edited because someone accidentally made their address and phone number public in a forum post?
I'm thinking about it from the perspective of an artist or creator under existing copyright law. You can't just take someone's work and republish it.
It's not allowed with books, it's not allowed with music, and it's not even allowed with public sculpture. If a sculpture shows up in a movie scene, they need the artist's permission and may have to pay a licensing fee.
Why should the creation of text on the internet have lesser protections?
But copyright law is deeply rooted in damages, and if advertising revenue is lost that's a very real example.
And I have recourse; I used it. I used current law (DMCA) to remove over 1,000,000 pages because it was my legal right to remove infringing content. If it had been legal, they wouldn't have had to remove it.
how do you expect an archive to happen if they are not allowed to archive while it is still up.
I don't want them publishing their archive while it's up. If they archive but don't republish while the site exists then there's less damage.
I support the concept of archiving and screenshotting. I have my own linkwarden server set up and I use it all the time.
But I don't republish anything that I archive because that dilutes the value of the original creator.
Yes, some wikipedia editors are submitting the pages to archive.org and then linking to that instead of to the actual source.
So when you go to the Wikipedia page it takes you straight to archive.org -- that is their first stop.
It’s user-driven. Nothing would get archived in this case. And what if the content changes but the page remains up? What then? Fairly sure this is why Wikipedia uses archives.
That's a good point.
Pretty sure mainstream ad blockers won’t block a custom in-house banner. And if it has no tracking, then it doesn’t matter whether it’s on Archive or not, you’re getting paid the same, no?
Some of them do block those kinds of ads -- I've tried it out with a few. If it's at archive.org I lose the ability to report back to the sponsor that their ad was viewed 'n' times (unless, ironically, if I put a tracker in). It also means that if sponsorship changes, the main drivers of traffic like Wikipedia may not see that. It makes getting new sponsors more difficult because they want something timely for seasonal ads. Imagine sponsoring a page, but Wikipedia only links to the archived one. Your ad for gardening tools isn't reflected by one of the larger drivers of traffic until December, and nobody wants to buy gardening tools in December.
Yes, I could submit pages to archive.org as sponsorship changes if this model continues.
It was a much bigger deal when we used Google ads a decade ago, but we stopped in early 2018 because tracking was getting out of hand.
If I was submitting pages myself I'd be all for it because I could control when it happened. But there have times when I've edited a page and totally screwed it up, and archive.org just happened to grab it at that moment when the formatting was all weird or the wrong picture was loaded. I usually fix the page and forget about it until I see it on archive.org later.
I asked for pages like that to be removed, but archive.org was unresponsive until I used a DMCA takedown notice.
I don't think you know what SEO is. I think you know what bad SEO is.
Anyhow, Wikipedia is always free to link somewhere else if they can find better content.
Someone asked a question and I answered honestly. I'm sorry that you can't understand my perspective.
What do you mean by “engagement”, exactly? Clicking on ads?
In SEO terms user engagement refers to how people interact with the website. Do they click on another link? Does a new blog posting interest them?
Lmao you think Google needs to go through Archive to scrape your site? Delusional.
Any activiity from Google is easier to track and I have a record if who downloaded content if it's coming from my servers.
The mechanisms used to serve ads over the internet nowadays are nasty in a privacy sense, and a psychological manipulation sense. And you want people to be affected by them just to line your pockets? Are you also opposed to ad blockers by any chance?
I agree that many sites use advertising in a different way. I use it in the older internet sense -- someone contacts me to sponsor a page or portion of the site, and that page gets a single banner, created in-house, with no tracking. I've been using the internet for 36 years. I'm well aware of many uses that I view as unethical, and I take great pains not to replicate them on my own site.
I disapprove of ad blockers. I approve of things that block tracking.
As far as "lining my own pockets" goes, I want to recoup my hosting costs. I spend hours researching for each article/showcase, make the content free to view, and then I'm expected to pay to share it with anyone who's interested? I have a day job. This is my hobby, but it's also my blood, sweat, and tears.
And how do you suggest a site which has been wiped off the face of the internet gets archived? Maybe we need to invest in a time machine for the Internet Archive?
archive.org could archive the content and only publish it if the page has been dark for a certain amount of time.
You misunderstood. If they view the site at Internet Archive, our site loses on the opportunity for ad revenue.
They say they want to link to something they know won't go away.
EDIT: Adding this because what you said irks me. There used to be only one banner page on the top, but that doesn't really matter.
The 'you dressed that way so you asked for it' argument really doesn't fly.
It's my content, and if I choose to wrap it in advertising, I should be allowed to. And if Wikipedia doesn't like that, they can always choose to not link to the content. But to just forcibly take what you want because you feel entitled to it... Why would that ever be OK?
I just sent a DMCA takedown last week to remove my site. They've claimed to follow meta tags and robots.txt since 1998, but no, they had over 1,000,000 of my pages going back that far. They even had the robots.txt configured for them archived from 1998.
I'm tired of people linking to archived versions of things that I worked hard to create. Sites like Wikipedia were archiving urls and then linking to the archive, effectively removing branding and blocking user engagement.
Not to mention that I'm losing advertising revenue if someone views the site in an archive. I have fewer problems with archiving if the original site is gone, but to mirror and republish active content with no supported way to prevent it short of legal action is ridiculous. Not to mention that I lose control over what's done with that content -- are they going to let Google train AI on it with their new partnership?
I'm not a fan. They could easily allow people to block archiving, but they choose not to. They offer a way to circumvent artist or owner control, and I'm surprised that they still exist.
So... That's what I think is wrong with them.
From a security perspective it's terrible that they were breached. But it is kind of ironic -- maybe they can think of it as an archive of their passwords or something.
I set up LinkWarden about a month ago for the first time and have been enjoying it. Thank you!
I do have some feature requests -- is GitHub the best place to submit those?
I'm not sure where you're getting your information.
I work there, have worked there for nearly three decades, and I can tell you that it's not the case.
(Also, it's just NCSA for trademark reasons, without 'the' in front)
It did get a lot of funding from the NSF in the early days, but the federal government didn't start pushing for public access to research done through grants and contracts until 2013. Before then it was only work done by federal agencies that was non copyrighted.
The National Science Foundation also didn't start funding Mosaic until 1994, which was after CGI had been released.
NCSA gets a lot of its funding from the private sector with partner programs, the University of Illinois, and the State of Illinois as well.
Permanently Deleted
I guess I'm becoming a dinosaur, and now I don't know where to find out about new FOSS stuff being developed, when new releases are out, etc.
I used to get it all on USENET and mailing lists, and then later on sourceforge.net and freshmeat.net. Now I track some things on https://freshcode.club/, but I don't see much that's 'fresh'. Maybe new updates, but not too many new packages. sourceforge still exists, but it doesn't seem current.
If I know about a project I'll follow it on GitHub, but I'm looking for a place to find out about new things that I didn't know I wanted yet.
tl;dr: Where can I watch to see promising new FOSS software projects?
This may be old news to some, but maybe it will help a wayward soul somewhere....
Vivaldi was really slow when starting up, and it would stay slow with multiple cores pegged at 100% on my Linux system. Eventually it would crash and I'd have to start it back up again.
Slow in this case means delays in responses to clicks, scrolls, etc.
Anyhow, I discovered that scanning pages for RSS feeds was enabled. I disabled that and my browser starts up very quickly now.
If you have a lot of tabs and RSS scanning is enabled I believe it tries to load every page and scan the contents, but it was too much for my fairly beefy system.
tl;dr: disable scanning for RSS feeds if you don't use it.
I started migrating my servers from Linode to Hetzner Cloud this month, but noticed that my quota only gave me ten instances.
I need many more, probably on the order of 25 right now and probably more later. I'd also like the ability to create test servers, etc.
I asked for an increase with all of that in mind, and Hetzner replied:
"As we try to protect our resources we are raising limits step by step and on the actuall [sic] requirement. Please tell us your currently needed limit."
I don't understand. Does Hetzner not have enough servers to accommodate me? Wouldn't knowing the size of the server be relevant if it's an actual resource question?
I manage a very large OpenStack cluster for my day job and we just give people what they pay for. I'm having a hard time wrapping my head around this unless Hetzner might not be able to give me what I ultimately want to pay for, and if that's the case, I wonder if they're the right solution for me after all.
It also makes me worry about cloud elasticity.
Does anyone have any insights that can help me understand why keeping a low limit matters?