I want to extract and process the metadata from PNG images and the first line of .safetensors files for LLM's and LoRA's. I could spend ages farting around with sed or awk but formats of files are constantly changing. I'd like a faster way to see a summary of training and a few other details when they are available.
Specifically this version of yq - there are other versions bundled with distros that look and act very differently and lack the potency of this version.
Yeah, I've been learning some nushell. If you're dealing with data, it's just a great tool. So many sharp edges in the POSIX shell come from it being stringly typed, so having a strongly typed shell is extremely helpful.
A week ago I would have said jq, but just the other day I discovered nushell and have been loving it, if you deal with structured data often it's way easier, just bear in mind it's not POSIX compatible
I have perused it, but its both so dense and so broad that its not that helpful unless i know exactly what I'm looking for. I have also tried info and tldr. I actually like tldr the most,. although the exhaustiveness of the man pages must be admired. I dont find it to be the best teacher.
Online json parser. Throw in some data and then structure a query.
It'll keep updating the results as you tweak your query. A simple search will probably give you twenty that'll work. I can't remember what i normally use off the top of my head.
There are probably pre-written awk scripts out there that already do what you want, not that I know where they'd be.
That said, you might be better off using one of the bigger but still fairly commonly installed languages. There's bound to be things on PyPI (for Python) or CPAN (for Perl) that could be bolted together for example.
If you're really lucky there might even be something that covers your whole use-case, but I haven't checked.
I found a Python project that does enough for my needs. Jq looks super powerful though. Thanks. I managed to get yq working for PNG's, but I had trouble with both jq and yq with safetensor files. I couldn't figure out how to parse a string embedded in an inconsistent starting binary, and with massive files. I could get in and grab the first line with head. I tried some stuff with expansions, but that didn't work and sent me looking for others that have solved the issue better than myself.
For me, a C# developer by trade, this is easily solved with a one command C# call. It's possible you already have dotnet 6 or 8 on your distro as there are many C# Linux apps now.
Probably not popular opinion, but pwsh (powershell). It's got a lot of tooling built in and means I don't have to learn a different tool just because I'm in a different system.