plunki

11 mo. ago

How to make the files on a new hard drive searchable?

Get Everything: https://www.voidtools.com/downloads/

Replace windows search with it: https://github.com/srwi/EverythingToolbar

It is a million times faster and better than windows search, I use it constantly. Easier than even navigating to files normally.

11 mo. ago

Always, always archive youtube videos you want to watch later...

My yt-dlp is running most nights, but I've gotta stop... I'm never going to get around to watching everything.

Could you share the Britney Spears??? :)

11 mo. ago

Checksum for original files came out unreadable

What? The hash is just a number in a text file. Open the md5 with notepad?

You aren't very clear... Are you saying it creates the hash, but then verifying fails? I don't know what you mean by unreadable

11 mo. ago

Checksum for original files came out unreadable

Are you saying it fails to make a checksum at all? What program? Try this one: https://github.com/gurnec/HashCheck/releases/tag/v2.4.0

12 mo. ago

Long-term storage of hyperlinks for future reference?

Seriously, this isn't link hoarding!!

But really OP, anything you think you might want should be downloaded, sites are vanishing all the time. Ramp up the hoarding!

12 mo. ago

Has anyone ever had drives that lasted for decades?

I have a 50MB HDD in my first computer, a 286 Digital VaxMate that still works, from 1989 maybe?

12 mo. ago

Using Teracopy as daily cut/copy paste tool?

It's all I use

12 mo. ago

How do i make wget not logout itself?

No file, it outputs in with the usually displayed info as it is processing, just adds more

12 mo. ago

High Command Timeout & Read Error Rate on ST8000NM000A-2KE101 - Seagate Exos 7E8 8TB - Disk access is just too slow painfully

The read and seek error rates appear to be zero. Any errors would be recorded in the top 4 digits. Different manufacturers use the fields differently.

Not sure about command timeout off the top of my head

The ultraDMA CRC actually looks bad on the 1st one I think.

Check some real documentation for seagate though to know what the fields mean

12 mo. ago

There is any app to download YouTube videos placed on a playlist?

Yt-dlp does pretty much everything. It can do a channels playlists, not sure about a custom saved one, but try feeding it a link to it? Then you can just keep re-running it to download any new additions

12 mo. ago

Extension recommendation for saving videos?

Video download helped (red/yellow/blue ball icon) is the real deal

12 mo. ago

How do i make wget not logout itself?

Is it the website kicking it off? Can you run with --debug and post what it says when the request fails?

I'll be back at my computer later and can send over an idea to try. I have a wgetrc file that has gotten around some issues before

12 mo. ago

Could someone please give me a walk through on how to crawl an entire web domain and scrape the images only?

WGET is awesome, I have scraped tons with it. So many options, you can even spoof all the request header info to get around sites that try to limit auto downloaders. Here is the manual: https://www.gnu.org/software/wget/manual/wget.html

webp or any file extension will work. (note on webp, most sites actually have jpgs still, but convert and serve webp to save bandwidth if the browser says it accepts them. There is a header you can disable in firefox to not accept webp unless it is the only option:

https://addons.mozilla.org/en-CA/firefox/addon/dont-accept-webp/

Wget is not behaving identically to a browser so im unsure what this part of the request looks like or if it needs modification. If it isnt working let me know.

5 might be enough, but maybe not. Scroll down in my first link comments, they show how to set to infinite: "-l inf".

For future scraping, look at the mirror command. It sets recursion to infinite and will make a full copy of the site. You can also use the --convert-links option, which changes all the links to point to the locally downloaded files. It then behaves the same as the real website.

You cant go too deep unless you use --span-hosts, it can grab external files from different domains to make the mirrored site a true copy, but yea, you often don't need that. You also want to be more careful with recursive depth here - it can go too deep and you end up with too much data.

I'm not sure about this. I think you can turn on logging, but I'm not sure what that gets you. I've used the no-clobber command to run wget again, without re-downloading existing files. This is handy for resuming or filling in gaps that were missed due to timeout, etc.

Some sites also need to use the wait or random-wait command to avoid detection.

12 mo. ago

Could someone please give me a walk through on how to crawl an entire web domain and scrape the images only?

WGET should work, see here: https://stackoverflow.com/questions/4602153/how-do-i-use-wget-to-download-all-images-into-a-single-folder-from-a-url

12 mo. ago

What's going on with my ssd?

Smart stats are cropped off?