They don't need to use unicode, because it's just an estimate of containing "information". It doesn't matter what language it's in or which encoding we chose. It's not an exercise in optimization
Why would they store 4k. They aren't trying to store a high quality historical copy. The average drawing of those days would be estimated to an equivalent image in a book or other media. 4k is not the average.
Storing information this way is not relevant, it's not an exercise in optimization, and doesn't help make the study representative or meaningful. If you google this you'll find how they estimate it. It won't focus on the computer science of it.