About 20 years ago I made a script that converts pictures to HTML tables. Back then RAM was a severe problem for this, and even for more powerful hardware browsers tended to just crash on larger pictures.
I checked it again a few years later, and things looked way better. I guess using CSS it'd be rather trivial nowadays to do the same with a short video by just cycling through showing/hiding tables of each frame.
It's a fairly common thing on onion websites, especially those who offer real-time interaction (e.g. some onion web-chats), they use this Transfer-Encoding: chunked method for fetching messages and content because JS is often discouraged and sometimes automatically blocked by onion-enabled navigators while surfing DW. HTML Forms with submit buttons are also used for this kind of interaction.