Create image.md #293

duglin · 2016-01-05T21:41:24Z

This is far from complete but I wanted to get it out there in time for people to look it over prior to the f2f next week. Its kind of a merging of docker's image format and appc's image format.

Signed-off-by: Doug Davis [email protected]

wking · 2016-01-05T22:24:59Z

image.md

+
+## Notational Conventions
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this and other documents of the specification are to be interpreted as described in [RFC2119](http://tools.ietf.org/html/rfc2119).


I think the phrasing from #260 already covers this, so we can remove this section here.

wking · 2016-01-05T23:20:14Z

Feedback on tactics:

I'm not sure why we have all the explicit ordering requirements
(e.g. “The order in which the data is serialized in the tar MUST be
the same as the order specified in that section” and then back off
with “There is no requirement that importing an image into a compliant
OCI implementation and then exporting it will result in the same image
since each implementation might choose to create the root filesystem
differently”. Is the ordering just intended for more efficient
unpacking to the filesystem, and not for stable hashing? I'm in favor
of efficiency gains, but think that requiring them in a spec like this
may be a bit premature. If we want to encourage that ordering right
off the bat, I'd suggest we start off with SHOULD instead of MUST for
any tar-ordering requirements.

Without a reproducible way to go from a filesystem to a hash, I don't
see a need to name vendored dependencies by hash
(e.g. sha512-1d62d181e9c1322d56ccd3a29d05018399147a16188dbd861c0279ad0dd7e14c
in your example). On the other hand, hashing an image tarball so it
can be content-addressed from another image that does not vendor it
makes sense. Is there a reason to require hashed names for vendored
images?

Feedback on strategy:

This sounds like a fancy way to construct and distribute a root
filesystem, and I don't see a need to encode that in the runtime spec.
Implementing and testing tools that create and unpack images like this
seems to be a completely orthogonal problem to launching containers
from filesystem bundles. But Docker proves there is a large community
of people who prefer to distribute their rootfs' using this
layered/blacklisted tarball approach. I'd recommend defining it in a
separate repository so folks writing and using implementations of this
layering/distribution spec can collaborate on it without distracting
the set of folks writing and using implemenations of the
container-launching runtime. There may be considerable overlap
between those sets of people, but they aren't the same set (e.g. I'm
personally interested in the container-launching runtime, but not in
layered-filesystem distribution) 1.

 Subject: Re: Hashing and verifying a bundle
 Date: Wed, 02 Sep 2015 11:48:54 -0700
 Message-ID: <[email protected]>

mrunalp · 2016-01-05T23:22:52Z

The suggested changes imply that the runtime would have to construct a rootfs that could be used in the container (what we pivot the root to) either by using a union filesystem or just copying the files to a new root. Should we clarify that further?

wking · 2016-01-05T23:49:43Z

On Tue, Jan 05, 2016 at 03:22:55PM -0800, Mrunal Patel wrote:

The suggested changes imply that the runtime would have to construct
a rootfs that could be used in the container…

In the current commit (e021a7f), unpacking and image to a filesystem
bundle is a separate step 1. So why does “the runtime” (which
launches containers from bundles 2) need to be involved at all. I
see no benefit, and an increased runtime-author cost, if we require a
single entity to be both the container-launching implementation and
the image-unpacking implementation (which is why I want this image
spec in a different repository 3).

philips · 2016-01-06T09:31:31Z

I think large design things like this should be discussed on the mailing lists and with Google Docs. GitHub is a very difficult forum for this.

Also, here is a proposal I worked up that tries to break an image format into a number of different layers with hashes at each of those layers: https://docs.google.com/document/d/1JPQqH9m9fwZIVsvpB5FSyPJjH0RZLnFkI2J2gohfPu0/edit

My concern here is that the image format is too monolithic and includes the host specific config.json file by default.

mrunalp · 2016-01-06T17:42:41Z

I agree that we should discuss this on the mailing list first.

vbatts · 2016-01-07T02:30:18Z

image.md

+
+An image's `ID` MUST be a string of the form `sha512-<hash>`.
+
+QUESTION: Do we really want to mandate sha512?


It is a solid default. For a majority of x86{,_64} chips, it is faster than sha256. Though for architectures like ARM, this could be a hit.
Here is my laptop. Both are speedy, but sha512 is faster:

$ dd if=/dev/urandom of=rand.img bs=1M count=64 64+0 records in 64+0 records out 67108864 bytes (67 MB) copied, 9.47004 s, 7.1 MB/s $ time sha256sum rand.img 2d4e6875974352af656630e8ed76ee036847888d69c507f73809d672c58f439d rand.img real 0m1.068s user 0m1.026s sys 0m0.041s $ time sha512sum rand.img ba914581b051a017d9f7fc3169bd6036bfa24666eb692d891107ce237d60a091919f0cbce0408221c452bd0217ee1388a72ca33a2b65e6bb791714217f85a4d3 rand.img real 0m0.692s user 0m0.654s sys 0m0.036s

Here is a raspberry pi. The sha512 is an order of magnitude slower:

$ dd if=/dev/urandom of=rand.img bs=1M count=64 64+0 records in 64+0 records out 67108864 bytes (67 MB) copied, 54.8794 s, 1.2 MB/s $ time sha256sum rand.img d286218629fccb13c9c1568b6ddb9f2fc5452299a4ee307e3cff353836d8901a rand.img real 0m5.552s user 0m4.850s sys 0m0.440s $ time sha512sum rand.img 8dcaddd1337cc9dcef273b4a53243563a619747214256a7c6285e6e7a94c671d06121ba08c87ae8c0a33bde4eb142d80413782913f79d9fd2686799174be1061 rand.img real 1m38.730s user 1m33.010s sys 0m1.120s

But like the pattern of <algorithm>-<digest> could allow folks to choose alternates that suite their needs better, or once there is the next great algorithm to be switched to.

philips · 2016-02-09T16:52:01Z

image.md

+
+1. Layers.
+Each embedded layer MUST have a corresponding image (tar) file, at the root of the image.
+Each layer's tar file MUST be named with its ID followed by the appropriate file-extension based on how the tar file is written (e.g. `.tar` for uncompressed).


File extensions don't matter, we should just use magic numbers like ACI does.

I wasn't focusing on the extension for any purpose, other than to say it needs to have one after the ID part of the name.

Not sure what your "why?" was referring to, but if its the ID part of the filename then I did that because we needed a way to go from the "layers" to the embedded tar file in a consistent fashion. We can't use the "name" of the image because it might have characters that are not permitted on all filesystems - like a "/" - so having a tar file with that name might not be possible. Using, and requiring an ID, also forces us to ensure there's a crypto key for each embedded image.

I agree with @philips that having magic numbers might be a better option than having an extension. @duglin, could you explain the advantages of having an extension?

I think you guys are reading too much into the extension - all files (well most) have extensions, tar/tgz/etc...
In that sentence I was defining the format of the base filename nothing more - I can remove the comment about extensions - I just added it to make it clear that its still a normal file and whatever extension would normally be there should still be there.. I still expect there to be the appopriate magic number in the file - but that's out of scope of this doc - that's defined by 'tar.

vbatts · 2016-02-15T22:45:24Z

image.md

+
+1. A `manifest.json` file describing the contents of the image.
+This file is REQUIRED to be present in the image.
+This file extends the [`config.json`](config.md) file with some additional properties:


I'm struggling on this.
As the config.json has evolved to essentially be a instance configuration for a container, I don't see how the manifest.json is particularly an extension. This comes back to the F2F conversations about having templates, or "suggests" or "requests" for resources. But the config.json is really not portable and would come with a list of caveats and exceptions, right?

I tend to think of this as a separate discussion. Whether the json in an image becomes the exact values we see in config.json or a template, or ranges, or something else is something that we can work on in parallel. To me the key question for us in this PR is whether we're ok with a single json file (manifest.json) that contains: 1) what will be used to construct the config.json, and 2) extra fields to help process the image. Or do we want to split these into multiple files. I don't yet see value in splitting them. Then in a different PR we can twiddle the json to be what we want based on whatever usecases we're looking to support.

Perhaps people could enumerate the usecases for something more verbose than the existing config.json.

On Mon, Feb 15, 2016 at 06:40:49PM -0800, Doug Davis wrote:

Or do we want to split these into multiple files. I don't yet see
value in splitting them.

The value I see in splitting files is clear separation of concerns. A
manifest file contains only the bits needed for fetching and
validation. A config.json contains only the bits needed by the
‘start’ command. A process.json contains only the bits needed by the
‘exec’ command 1. The state JSON contains only the bits needed to
manipulate (join, signal, tear down) a container. I'm ok allowing
folks to wedge all this into one file 2, but it seems like it would
be easier to write the individual tools and to understand the results
of a particular action if each action had a separate config file.

Unless there's some overlap between the actions, I see no benefit to
configuring multiple actions in the same file. And “start a container
from this OCI-bundle directory” and “create and OCI-bundle directory
from this tarball” don't seem like they have any overlap.

Subject: Re: Labels and extension meta data in containers #108 Date: Wed, 30 Dec 2015 20:54:37 -0800 Message-ID: <[email protected]>

mrunalp · 2016-02-24T11:34:40Z

image.md

+An `ID` MUST be a string of the form `<algorithm>-<hash value>`.
+The `algorithm` part of the string MUST be in lower-case.
+For example, `sha512` not `SHA512`.
+It is RECOMMENDED that implementations use the sha512 algorithm.


We should perhaps back this with some reasoning and references and also suggest alternatives.

Signed-off-by: Doug Davis <[email protected]>

duglin · 2016-03-16T20:44:18Z

rebased

philips · 2016-04-06T17:04:51Z

This work is moving into the newly formed OCI Image Format. https://github.com/opencontainers/image-spec

wking reviewed Jan 5, 2016
View reviewed changes

philips mentioned this pull request Jan 6, 2016

config: Single, unified config file #284

Merged

vbatts reviewed Jan 7, 2016
View reviewed changes

duglin force-pushed the imageFormat branch 16 times, most recently from 923813e to c501cdb Compare January 8, 2016 00:54

duglin force-pushed the imageFormat branch 6 times, most recently from a842fc9 to 16e4366 Compare February 9, 2016 15:26

philips reviewed Feb 9, 2016
View reviewed changes

duglin force-pushed the imageFormat branch from 16e4366 to e9ff75a Compare February 15, 2016 16:19

vbatts reviewed Feb 15, 2016
View reviewed changes

philips modified the milestones: by-1.0.0, v0.4.0, v0.5.0 Feb 17, 2016

mrunalp reviewed Feb 24, 2016
View reviewed changes

duglin force-pushed the imageFormat branch from e9ff75a to 7c49de4 Compare February 24, 2016 14:05

duglin and others added 3 commits March 16, 2016 13:05

Create image.md

e1f25e7

Signed-off-by: Doug Davis <[email protected]>

Part 2

1271949

Signed-off-by: Doug Davis <[email protected]>

Some bigger changes

2e7485f

Signed-off-by: Doug Davis <[email protected]>

duglin force-pushed the imageFormat branch from 7c49de4 to 2e7485f Compare March 16, 2016 20:05

mrunalp closed this Apr 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create image.md #293

Create image.md #293

duglin commented Jan 5, 2016

wking Jan 5, 2016

wking commented Jan 5, 2016

mrunalp commented Jan 5, 2016

wking commented Jan 5, 2016

philips commented Jan 6, 2016

mrunalp commented Jan 6, 2016

vbatts Jan 7, 2016

philips Feb 9, 2016

philips Feb 9, 2016

duglin Feb 9, 2016

mrunalp Feb 24, 2016

duglin Feb 24, 2016

vbatts Feb 15, 2016

duglin Feb 16, 2016

wking Feb 16, 2016

mrunalp Feb 24, 2016

duglin commented Mar 16, 2016

philips commented Apr 6, 2016


		## Notational Conventions

		The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this and other documents of the specification are to be interpreted as described in [RFC2119](http://tools.ietf.org/html/rfc2119).


		An image's `ID` MUST be a string of the form `sha512-<hash>`.

		QUESTION: Do we really want to mandate sha512?

Create image.md #293

Create image.md #293

Conversation

duglin commented Jan 5, 2016

Choose a reason for hiding this comment

wking commented Jan 5, 2016

mrunalp commented Jan 5, 2016

wking commented Jan 5, 2016

philips commented Jan 6, 2016

mrunalp commented Jan 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

duglin commented Mar 16, 2016

philips commented Apr 6, 2016