Skip Navigation

How to wget/curl files from OCI registries (docker, github packages)

tech.michaelaltfield.net /2024/09/03/container-download-curl-wget/
Embed prevented alt text

This article will describe how to download an image from a (docker) container registry.

Manual Download of Container Images with wget and curl
Manual Download of Container Images with wget and curl

Intro

Remember the good `'ol days when you could just download software by visiting a website and click "download"?

Even apt and yum repositories were just simple HTTP servers that you could just curl (or wget) from. Using the package manager was, of course, more secure and convenient -- but you could always just download packages manually, if you wanted.

But have you ever tried to curl an image from a container registry, such as docker? Well friends, I have tried. And I have the scars to prove it.

It was a remarkably complex process that took me weeks to figure-out. Lucky you, this article will break it down.

Examples

Specifically, we'll look at how to download files from two OCI registries.

  1. Docker Hub
  2. GitHub Packages

Terms

First, here's some terminology used by OCI

  1. OCI - Open Container Initiative
  2. blob - A "blob" in the OCI spec just means a file
  3. manifest - A "manifest" in the OCI spec means a list of files

Prerequisites

This guide was written in 2024, and it uses the following software and versions:

  1. debian 12 (bookworm)
  2. curl 7.88.1
  3. OCI Distribution Spec v1.1.0 (which, unintuitively, uses the '/v2/' endpoint)

Of course, you'll need 'curl' installed. And, to parse json, 'jq' too.

sudo apt-get install curl jq

What is OCI?

OCI stands for Open Container Initiative.

OCI was originally formed in June 2015 for Docker and CoreOS. Today it's a wider, general-purpose (and annoyingly complex) way that many projects host files (that are extremely non-trivial to download).

One does not simply download a file from an OCI-complianet container registry. You must:

  1. Generate an authentication token for the API
  2. Make an API call to the registry, requesting to download a JSON "Manifest"
  3. Parse the JSON Manifest to figure out the hash of the file that you want
  4. Determine the download URL from the hash
  5. Download the file (which might actually be many distinct file "layers")
One does not simply download from a container registry
One does not simply download from a container registry

In order to figure out how to make an API call to the registry, you must first read (and understand) the OCI specs here.

OCI APIs

OCI maintains three distinct specifications:

  1. image spec
  2. runtime spec
  3. distribution spec

OCI "Distribution Spec" API

To figure out how to download a file from a container registry, we're interested in the "distribution spec". At the time of writing, the latest "distribution spec" can be downloaded here:

The above PDF file defines a set of API endpoints that we can use to query, parse, and then figure out how to download a file from a container registry. The table from the above PDF is copied below:

ID Method API Endpoint Success Failure
end-1 GET /v2/ 200 404/401
end-2 GET / HEAD /v2/<name>/blobs/<digest> 200 404
end-3 GET / HEAD /v2/<name>/manifests/<reference> 200 404
end-4a POST /v2/<name>/blobs/uploads/ 202 404
end-4b POST /v2/<name>/blobs/uploads/?digest=<digest> 201/202 404/400
end-5 PATCH /v2/<name>/blobs/uploads/<reference> 202 404/416
end-6 PUT /v2/<name>/blobs/uploads/<reference>?digest=<digest> 201 404/400
end-7 PUT /v2/<name>/manifests/<reference> 201 404
end-8a GET /v2/<name>/tags/list 200 404
end-8b GET /v2/<name>/tags/list?n=<integer>&last=<integer> 200 404
end-9 DELETE /v2/<name>/manifests/<reference> 202 404/400/405
end-10 DELETE /v2/<name>/blobs/<digest> 202 404/405
end-11 POST /v2/<name>/blobs/uploads/?mount=<digest>&from=<other_name> 201 404
end-12a GET /v2/<name>/referrers/<digest> 200 404/400
end-12b GET /v2/<name>/referrers/<digest>?artifactType=<artifactType> 200 404/400
end-13 GET /v2/<name>/blobs/uploads/<reference> 204 404

In OCI, files are (cryptically) called "blobs". In order to figure out the file that we want to download, we must first reference the list of files (called a "manifest").

The above table shows us how we can download a list of files (manifest) and then download the actual file (blob).

Examples

Let's look at how to download files from a couple different OCI registries:

  1. Docker Hub
  2. GitHub Packages

Docker Hub

To see the full example of downloading images from docker hub, click here

GitHub Packages

To see the full example of downloading files from GitHub Packages, click here.

Why?

I wrote this article because many, many folks have inquired about how to manually download files from OCI registries on the Internet, but their simple queries are usually returned with a barrage of useless counter-questions: why the heck would you want to do that!?!

The answer is varied.

Some people need to get files onto a restricted environment. Either their org doesn't grant them permission to install software on the machine, or the system has firewall-restricted internet access -- or doesn't have internet access at all.

3TOFU

Personally, the reason that I wanted to be able to download files from an OCI registry was for 3TOFU.

Verifying Unsigned Releases with 3TOFU
Verifying Unsigned Releases with 3TOFU

Unfortunaetly, most apps using OCI registries are extremely insecure. Docker, for example, will happily download malicious images. By default, it doesn't do any authenticity verifications on the payloads it downloaded. Even if you manually enable DCT, there's loads of pending issues with it.

Likewise, the macOS package manager brew has this same problem: it will happily download and install malicious code, because it doesn't use cryptography to verify the authenticity of anything that it downloads. This introduces watering hole vulnerabilities when developers use brew to install dependencies in their CI pipelines.

My solution to this? 3TOFU. And that requires me to be able to download the file (for verification) on three distinct linux VMs using curl or wget.

⚠ NOTE: 3TOFU is an approach to harm reduction.

It is not wise to download and run binaries or code whose authenticity you cannot verify using a cryptographic signature from a key stored offline. However, sometimes we cannot avoid it. If you're going to proceed with running untrusted code, then following a 3TOFU procedure may reduce your risk, but it's better to avoid running unauthenticated code if at all possible.

Registry (ab)use

Container registries were created in 2013 to provide a clever & complex solution to a problem: how to package and serve multiple versions of simplified sources to various consumers spanning multiple operating systems and architectures -- while also packaging them into small, discrete "layers".

However, if your project is just serving simple files, then the only thing gained by uploading them to a complex system like a container registry is headaches. Why do developers do this?

In the case of brew, their free hosing provider (JFrog's Bintray) shutdown in 2021. Brew was already hosting their code on GitHub, so I guess someone looked at "GitHub Packages" and figured it was a good (read: free) replacement.

Many developers using Container Registries don't need the complexity, but -- well -- they're just using it as a free place for their FOSS project to store some files, man.

1

You're viewing a single thread.