-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal, Extension to Blob Upload, Dedupe API #18
Comments
Something worth considering: I don't think this works well with the (current, docker/distribution) auth model. In order to implement this, you'll need to determine if a layer is present in any images that the caller has access to pull. For certain registries, this might be impossible to calculate; for others, this might be a slow operation. |
@jonjohnsonjr That's a good point. I guess the way to solve this would be via challenge-response? i.e. server responds with some block (say 4 kilobytes), and the client can respond with a hash of the layer + 4k? That way, only 4K has to be transmitted, as opposed to the entire layer? |
Regarding not allowing dedup when a user doesn't has access, what's the
attack vector? It means a user that somehow finds a hash can use it as a
base layer and then pull it without actually having the file?
…On 25 June 2018 at 04:49, Sargun Dhillon ***@***.***> wrote:
@jonjohnsonjr <https://github.com/jonjohnsonjr> That's a good point. I
guess the way to solve this would be via challenge-response? i.e. server
responds with some block (say 4 kilobytes), and the client can respond with
a hash of the layer + 4k? That way, only 4K has to be transmitted, as
opposed to the entire layer?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPD9we7GZKro7gkl_IPezfUVhp5xYueks5uAF3fgaJpZM4U1ZKc>
.
|
@amouat essentially, yes. We don't want to make layer digests have to be secrets. You can imagine someone pulling an image that contains secret, proprietary code during a recorded presentation. Anyone watching that presentation could then upload an image with the same layers using the dedupe mechanism, then immediately re-pull that image, effectively bypassing the auth mechanism.
@sargun I don't think that completely solves the problem, since you would still be leaking the fact that a layer existed in a repository you don't have access to. I'm not sure how much we care about that. |
@jonjohnsonjr Is there really a problem with leaking the fact a layer exists with that hash? What does that tell you? |
That someone else has uploaded this layer before.
Perhaps not, but it's worth pausing to reflect on it, since we would be leaking more information than before. As a strawman example: Imagine a situation where someone (let's say Alice) might want to secretly upload an image to a private registry. Perhaps this image contains some proprietary code, and perhaps Alice's coworker (let's say Bob) also has access to this image. If Bob is suspicious that Alice has been uploading this image to, say, Now... do we care about that? I don't know. I'm sure more security-minded people can come up with a better example (or perhaps a better dedupe mechanism), but that's all I've got 😄 |
There isn't really a problem with the registry auth model; it's more a result of the implementation of Either way, I don't think you even need to change the protocol to support this functionality. If you can do it, just do it transparently. |
The original API had this parameter but took it out due to a few reasons:
In general, I don't see a huge issue with this. In CAS upload models, I've always struggled with whether the digest should be a pre-declared or commit time parameter, but it turns out, it depends on the context. The key to making this work is defining the behavior when it exists in a way that won't break old clients. I think the proposal is fine, except it should probably follow the mount behavior exactly. We can leave the policy up to the implementation. |
De-duping, and how and where de-duping is implemented seems like a registry implementation detail. Some may want to share "public" layers across registries to give a better first impression, or reduce the overall storage costs. Some, may be concerned about the security implications of sharing layers (please don't start a rant here :)) |
In the specification, layer upload de-duplication is explicitly listed as part of the spec.
Today, this is scoped to a given image, as opposed to the whole registry itself. The of de-duplication is to call the HEAD request on blobs, and see if the blob exists. Iff it doesn't, then upload it.
Unfortunately, in the case where you have two images that are built on top of a common image, this doesn't work. This is often a common pattern, a la:
See:
On upload, the underlying registry may deduplicate these layers.
Although there exists a mechanism for image-1, and image-2 to have "cross-mounted" layers, this requires that the owners of image-1, and image-2 know of one another.
My proposal is to extend the Starting An Upload API. I suggest extending the API from:
To accept the optional query parameter digest. If the optional query parameter digest exists, and we can find the layer from another image, we will return with status code
201
, and link the layer in. Given that this field would be optional, this would a backwards compatible change, and could be implemented optionally.Opinion?
The text was updated successfully, but these errors were encountered: