Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance for build setup #1489

Open
catto opened this issue Feb 10, 2019 · 16 comments
Open

Improve performance for build setup #1489

catto opened this issue Feb 10, 2019 · 16 comments
Labels

Comments

@catto
Copy link
Member

catto commented Feb 10, 2019

What happened:
Current launcher image has many habitat packages in it.
These files are copied to a temporary volume for each build and it takes for a while depending on an environment. For my environment with NVMe SSD, it takes over 10 seconds to complete.

/opt/sd # du -sh /hab
547.6M	/hab
/opt/sd # du -sh .
42.2M	.

What you expected to happen:
It takes just a few seconds to complete a setup for each build.

I tried to find Kubernetes config like data volume container available on Docker which attaches container storage directly to another container, but I haven't found it yet.

@jithine
Copy link
Member

jithine commented Feb 11, 2019

to workaround this we create symlinks cc @minz1027

@minzcmu
Copy link
Member

minzcmu commented Feb 11, 2019

@catto
Copy link
Member Author

catto commented Feb 11, 2019

I see. In a VM, they are on the same volume and we can use symlink.
But in a k8s pod, they are on separated volumes and it’s impossible to use symlink 😭

@minzcmu
Copy link
Member

minzcmu commented Feb 12, 2019

are you talking about the init container part?: https://github.com/screwdriver-cd/executor-k8s/blob/master/config/pod.yaml.tim#L37
In fact, that's the same case for k8s-vm, we also need to copy the files to the base host, then mount and symlink...: https://github.com/screwdriver-cd/executor-k8s-vm/blob/master/config/pod.yaml.tim

@jithine jithine added the feature label Apr 5, 2019
@minzcmu
Copy link
Member

minzcmu commented Apr 10, 2019

update 04/10

For executor-k8s, a read-only PVC with latest launcher dependencies may be the solution.

For executor-k8s-vm, this will be easier, currently for each vm launcher pod, it's putting the dependencies at /opt/screwdriver/{{build_id_with_prefix}}/sdlauncher on the base host. But this can be a generic path on the host, e.g. /opt/screwdriver/sdlauncher/v5.0.86(launcher verison). And for each pod, it can just check if the share mount has data already, if not, copy dependencies over there, otherwise just proceed.

And for the vm pod, we switch to mount /opt/screwdriver/sdlauncher/v5.0.86.

@minzcmu
Copy link
Member

minzcmu commented Apr 16, 2019

04/16

We got some interesting findings after this change.

k8s logs:

Pod Start Time / Scheduled Time if scheduled immediately:
Mon, 15 Apr 2019 23:26:40 +0000

Init-container :
Started:      Mon, 15 Apr 2019 23:26:55 +0000
Finished:     Mon, 15 Apr 2019 23:26:57 +0000

vm-launcher:
Started:      Mon, 15 Apr 2019 23:27:04 +0000

2019-04-15T23:27:04.747864479Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2019-04-15T23:27:04.747938287Z                                  Dload  Upload   Total   Spent    Left  Speed
100   543    0     0  100   543      0   132k --:--:-- --:--:-- --:--:--  132k
2019-04-15T23:27:05.62344408Z 2.1.5: Pulling from sd_platform/docker-docker
2019-04-15T23:27:05.623511676Z Digest: sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df
2019-04-15T23:27:05.623518318Z Status: Image is up to date for XXXX:4443/sd_platform/docker-docker:2.1.5
2019-04-15T23:27:07.37098662Z sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df: Pulling from sd_platform/docker-docker
2019-04-15T23:27:07.371153288Z Digest: sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df
2019-04-15T23:27:07.371176508Z Status: Image is up to date for xxxx:4443/sd_platform/docker-docker@sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df
2019-04-15T23:27:07.373271865Z Successfully pulled the image
2019-04-15T23:27:07.380452396Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2019-04-15T23:27:07.380626802Z                                  Dload  Upload   Total   Spent    Left  Speed
100   264    0     0  100   264      0  66000 --:--:-- --:--:-- --:--:-- 88000
2019-04-15T23:27:07.406191366Z Running hyperctl...
2019-04-15T23:27:07.412762024Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2019-04-15T23:27:07.412909964Z                                  Dload  Upload   Total   Spent    Left  Speed
100   134    0     0  100   134      0  44666 --:--:-- --:--:-- --:--:-- 44666
2019-04-15T23:27:16.738842086Z [WARN  tini (3)] Tini is not running as PID 1 and isn't registered as a child subreaper.
2019-04-15T23:27:16.738876083Z Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
2019-04-15T23:27:16.741810087Z To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.
2019-04-15T23:27:17.975409253Z 2019/04/15 23:27:16 Launcher process only fetch token.
2019-04-15T23:27:18.026302278Z 2019/04/15 23:27:16 Processing logs for build 6645
2019-04-15T23:27:18.028277193Z 2019/04/15 23:27:16 Archiver started
2019-04-15T23:27:18.046029122Z 2019/04/15 23:27:16 Starting Build 6645
2019-04-15T23:27:18.681556752Z 2019/04/15 23:27:17 Setting Build Status to RUNNING
  • for current setup, the init container runtime is reduced to 0 secs if launcher cached in the host. But the rest parts are the same.
Pod Start Time / Scheduled Time if scheduled immediately:
Mon, 15 Apr 2019 23:09:24 +0000

Init-container :
Started:      Mon, 15 Apr 2019 23:09:40 +0000
Finished:     Mon, 15 Apr 2019 23:09:41 +0000

vm-launcher:
Mon, 15 Apr 2019 23:09:52 +0000

2019-04-15T23:09:53.286590066Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2019-04-15T23:09:53.286728225Z                                  Dload  Upload   Total   Spent    Left  Speed
100   543    0     0  100   543      0   106k --:--:-- --:--:-- --:--:--  106k
2019-04-15T23:09:54.150378991Z 2.1.5: Pulling from sd_platform/docker-docker
2019-04-15T23:09:54.150450901Z Digest: sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df
2019-04-15T23:09:54.150458611Z Status: Image is up to date for XXXX:4443/sd_platform/docker-docker:2.1.5
2019-04-15T23:09:55.802929581Z sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df: Pulling from sd_platform/docker-docker
2019-04-15T23:09:55.803240993Z Digest: sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df
2019-04-15T23:09:55.80327804Z Status: Image is up to date for xxxx:4443/sd_platform/docker-docker@sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df
2019-04-15T23:09:55.804995476Z Successfully pulled the image
2019-04-15T23:09:55.81221539Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2019-04-15T23:09:55.812356596Z                                  Dload  Upload   Total   Spent    Left  Speed
100   264    0     0  100   264      0  52800 --:--:-- --:--:-- --:--:-- 52800
2019-04-15T23:09:55.916560024Z Running hyperctl...
2019-04-15T23:09:55.923216256Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2019-04-15T23:09:55.92337709Z                                  Dload  Upload   Total   Spent    Left  Speed
100   134    0     0  100   134      0  44666 --:--:-- --:--:-- --:--:-- 44666
2019-04-15T23:10:01.470851688Z [WARN  tini (3)] Tini is not running as PID 1 and isn't registered as a child subreaper.
2019-04-15T23:10:01.470896815Z Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
2019-04-15T23:10:01.473685282Z To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.
2019-04-15T23:10:02.705225564Z 2019/04/15 23:10:01 Launcher process only fetch token.
2019-04-15T23:10:02.746122022Z 2019/04/15 23:10:01 Processing logs for build 6636
2019-04-15T23:10:02.746148506Z 2019/04/15 23:10:01 Archiver started
2019-04-15T23:10:02.763418942Z 2019/04/15 23:10:01 Starting Build 6636
2019-04-15T23:10:03.391819033Z 2019/04/15 23:10:02 Setting Build Status to RUNNING
  • The above two are just examples, the behavior is consistent over a lot of build. Based on the findings above, it looks like, the most time consuming part is pod created => init-container started and hyperctl vm creation time.

  • For pod created => init-container started, we are seeing a diffenrence between SSD and SATA machines. For ssd, it takes around 5 secs. For SATA, it's 10 ~ 20 secs.

@minzcmu
Copy link
Member

minzcmu commented Apr 16, 2019

Best scenario

Config:

ssd, launcher content cache, launcher image cached, hyperctl image cached, build image cached

Time Breakdown:

Total 15 secs

  • pod created -> init container started: 6 secs
  • init container started -> finished: 0 secs
  • init container finished -> vm launcher started: 3 secs
  • hyperctl image pull: 1.5 sec (with cache, it still takes time to check)
  • vm creation: 3 sec
  • vm started -> set build status to RUNNING: 1 sec
Pod Start Time / Scheduled Time if scheduled immediately:
Start Time:         Tue, 16 Apr 2019 22:59:14 +0000

Init-container:
Started:      Tue, 16 Apr 2019 22:59:20 +0000
Finished:     Tue, 16 Apr 2019 22:59:20 +0000

vm-launcher
Started:      Tue, 16 Apr 2019 22:59:23 +0000

2019-04-16T22:59:23.843687414Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2019-04-16T22:59:23.843816459Z                                  Dload  Upload   Total   Spent    Left  Speed
100   547    0     0  100   547      0   2707 --:--:-- --:--:-- --:--:--  2707
2019-04-16T22:59:24.429083994Z 2.1.5: Pulling from sd_platform/docker-docker
2019-04-16T22:59:24.429152489Z Digest: sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df
2019-04-16T22:59:24.429168296Z Status: Image is up to date for docker.ouroath.com:4443/sd_platform/docker-docker:2.1.5
2019-04-16T22:59:25.162666725Z sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df: Pulling from sd_platform/docker-docker
2019-04-16T22:59:25.162797827Z Digest: sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df
2019-04-16T22:59:25.16280582Z Status: Image is up to date for XXX:4443/sd_platform/docker-docker@sha256:2e223723a86ebed85740ae43e8d74a587d381d04092eaf1134538e0eaf7df3df
2019-04-16T22:59:25.164126937Z Successfully pulled the image
2019-04-16T22:59:25.170707246Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2019-04-16T22:59:25.170957436Z                                  Dload  Upload   Total   Spent    Left  Speed
100   266    0     0  100   266      0  66500 --:--:-- --:--:-- --:--:-- 66500
2019-04-16T22:59:25.177869232Z Running hyperctl...
2019-04-16T22:59:25.183803838Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2019-04-16T22:59:25.183985905Z                                  Dload  Upload   Total   Spent    Left  Speed
100   135    0     0  100   135      0  45000 --:--:-- --:--:-- --:--:-- 45000
2019-04-16T22:59:28.385277906Z [WARN  tini (3)] Tini is not running as PID 1 and isn't registered as a child subreaper.
2019-04-16T22:59:28.385297943Z Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
2019-04-16T22:59:28.387281094Z To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.
2019-04-16T22:59:28.975889149Z 2019/04/16 22:59:27 Launcher process only fetch token.
2019-04-16T22:59:29.001257707Z 2019/04/16 22:59:27 Processing logs for build 1429071
2019-04-16T22:59:29.001266357Z 2019/04/16 22:59:27 Archiver started
2019-04-16T22:59:29.012217007Z 2019/04/16 22:59:27 Starting Build 1429071
2019-04-16T22:59:29.548594938Z 2019/04/16 22:59:28 Setting Build Status to RUNNING

@jithine jithine closed this as completed Apr 17, 2019
@minzcmu
Copy link
Member

minzcmu commented Jul 2, 2019

07/01

Performance with kata and executor-k8s on SSD machine. The bottle neck here is the copy in the init container. This is taking quite a long time (~16 secs). To speed it up, we can either make the emptyDir use memory or use the same technique we have for k8s-vm to mount it from the base host. Give init container permission to write to the mount, and main container read-only permission.

Total 25 secs

  • pod created -> init container started: 6 secs
  • init container started -> finished: 16 secs
  • init container finished -> container started: 2 secs
  • container started -> set build status to RUNNING: 1 sec
Pod Start Time / Scheduled Time if scheduled immediately:        
Tue, 02 Jul 2019 00:12:09 +0000

Init-container
Started:      Tue, 02 Jul 2019 00:12:15 +0000
Finished:     Tue, 02 Jul 2019 00:12:31 +0000
   
Containers:
Started:      Tue, 02 Jul 2019 00:12:33 +0000

@minzcmu
Copy link
Member

minzcmu commented Jul 3, 2019

07/03

Reopen the issue to work on improving the setup time for executor-k8s.
To reduce the setup time, we can cache the dependencies on the base host similar to k8s-vm.

  • make initContainer privileged
  • initContainer will write to the share mount on base host if not exist
  • make build container non-privileged
  • mount the dependencies volume as read-only to build container

POC

Did a proof concept and the build setup time reduced to ~10s.

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: dedicated
            operator: In
            values:
            - beta-screwdriver
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - screwdriver-vm
          topologyKey: kubernetes.io/hostname
        weight: 100
  nodeSelector:
    kubernetes.io/hostname: node
  containers:
  - args:
    - |
      ls -la /opt/sd
      sleep 10000
    command:
    - /opt/sd/launcher_entrypoint.sh
    image: node:8
    name: build
    securityContext:
      privileged: false
    volumeMounts:
    - mountPath: /opt/sd
      name: screwdriver
      readOnly: true
    - mountPath: /sd
      name: workspace
      readOnly: true
  initContainers:
  - command:
    - /bin/sh
    - -c
    - if ! [ -f /opt/launcher/launch ]; then TEMP_DIR=`mktemp -d -p /opt/launcher` && cp -a /opt/sd/* $TEMP_DIR && mkdir -p $TEMP_DIR/hab && cp -a /hab/* $TEMP_DIR/hab && mv $TEMP_DIR/* /opt/launcher && rm -rf $TEMP_DIR || true; else ls /opt/launcher; fi;
    image: screwdrivercd/launcher:v6.0.8
    name: launcher
    volumeMounts:
    - mountPath: /opt/launcher
      name: screwdriver
    securityContext:
      privileged: true
  tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: beta-screwdriver
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
  volumes:
  - name: screwdriver
    hostPath:
      type: DirectoryOrCreate
      path: /opt/screwdriver/test_sdlauncher/v6.0.8
  - emptyDir: {}
    name: workspace

To really make it work, need to do the symlink logic to link the readonly hab pkgs to /hab and make /hab writable like what k8s-vm does.

Limitation

With this method, the cache will live on the host, as time goes by, need to have some cronjob to clean up the old dependencies.

@catto What do you think about this approach? Let me know if you have any other ideas :D

@minzcmu minzcmu reopened this Jul 3, 2019
@catto
Copy link
Member Author

catto commented Jul 4, 2019

@minz1027 Sounds good to me. It would be better that we provide both method new method and current one so that users who cannot use privileged container can continue using SD.cd.

I'm curious about taking long time to copy launcher binaries in kata container. Are you using kata-container 1.7 and virtio-fs? The latest version supports virtio-fs with nemu and you can specify nemu profile to use virtio-fs which is much faster than previous one (9pfs).
https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-virtio-fs-with-kata.md

ref: #818

@minzcmu
Copy link
Member

minzcmu commented Jul 9, 2019

@catto that's a nice suggestion! But unfortunately for rhel, the latest version is 1.5... sadness. Once they provide 1.7, for sure we can try it out. But as long as we do copy, more or less it will take some time.

Let me discuss this solution with jithin to see if we want to implement it now.

@catto
Copy link
Member Author

catto commented Jul 9, 2019

@minz1027 You can try the latest version with kata-deploy for various distros! have you tried it?
https://github.com/kata-containers/packaging/blob/master/kata-deploy/README.md#install-kata-and-configure-docker

@catto
Copy link
Member Author

catto commented Jul 19, 2019

This CSI plugin should make setup faster though it's in alpha stage.
https://github.com/kubernetes-csi/csi-driver-image-populator

initContainer that copies files from launcher container to build container could be replaced with volumes directly created from launcher image using this plugin.

@catto
Copy link
Member Author

catto commented Aug 28, 2019

I've tested csi-driver-image-populator and confirmed there is an significant performance improvement.

Modify

Note: --feature-gates=CSIInlineVolume is required for k8s 1.15

volumes:
  - name: data
    csi:
      driver: image.csi.k8s.io
      volumeAttributes:
          image: screwdrivercd/launcher:$version
  • modify container's command to create symlink to the attached volume from /opt/sd to $mounted_dir/opt/sd and from/hab to $mounted_dir/hab.
    • for example:
command:
    - sh
    - "-c"
args: ["ln -s /opt/sdvol/opt/sd /opt/sd; ln -s /opt/sdvol/hab /hab; launch ....]

Result

Before

  • Launching build container takes 13 seconds at least depends on disk busyness.
  • Copying hundreds megabytes of data every build on a node leads other build to be also slow.
Events:
  Type    Reason     Age   From                                             Message
  ----    ------     ----  ----                                             -------
  Normal  Scheduled  15s   default-scheduler                                Successfully assigned default/no-csi-test to testnode01
  Normal  Pulled     14s   kubelet, testnode01  Container image "screwdrivercd/launcher:latest" already present on machine
  Normal  Created    9s    kubelet, testnode01  Created container launcher
  Normal  Started    9s    kubelet, testnode01  Started container launcher
  Normal  Pulling    2s    kubelet, testnode01  Pulling image "node:12"
  Normal  Pulled     2s    kubelet, testnode01  Successfully pulled image "node:12"
  Normal  Created    2s    kubelet, testnode01  Created container build
  Normal  Started    2s    kubelet, testnode01  Started container build

FYI: Launching build pods in my production environment take 30+ seconds even though its node has higher performance cpu and disk than test environment. I guess it's because of high disk IO caused by initContainer.

  Type    Reason     Age   From                                           Message
  ----    ------     ----  ----                                           -------
  Normal  Scheduled  38s   default-scheduler                              Successfully assigned screwdrivercd/buildpodname to prodnode01
  Normal  Pulled     33s   kubelet, prodnode01  Container image "screwdrivercd/launcher:v6.0.15" already present on machine
  Normal  Created    23s   kubelet, prodnode01  Created container
  Normal  Started    22s   kubelet, prodnode01  Started container
  Normal  Pulling    8s    kubelet, prodnode01  pulling image "node:8"
  Normal  Pulled     5s    kubelet, prodnode01  Successfully pulled image "node:8"
  Normal  Created    0s    kubelet, prodnode01  Created container
  Normal  Started    0s    kubelet, prodnode01  Started container

After

  • Launching build container takes only 3 seconds.
  • Copying hundreds megabytes of data has gone.
Events:
  Type    Reason     Age   From                                             Message
  ----    ------     ----  ----                                             -------
  Normal  Scheduled  3s    default-scheduler                                Successfully assigned default/csi-test to testnode01
  Normal  Pulling    2s    kubelet, testnode01  Pulling image "node:12"
  Normal  Pulled     1s    kubelet, testnode01  Successfully pulled image "node:12"
  Normal  Created    1s    kubelet, testnode01  Created container build
  Normal  Started    1s    kubelet, testnode01  Started container build

Also confirmed that build can invoke launch binary and user can write launcher volume such as /hab

+ ln -s /opt/sdvol/opt/sd /opt/sd
+ ln -s /opt/sdvol/hab /hab
+ /opt/sd/launch
NAME:
   launcher - launch a Screwdriver build

USAGE:
   launch [options] build-id

VERSION:
   6.0.15, commit 75f016895e37f13ad08cd4ffd5af08ec69d7b2ed, built at 2019-08-13T21:20:07Z

COMMANDS:
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --api-uri value        API URI for Screwdriver (default: "http://localhost:8080")
   --token value          JWT used for accessing Screwdriver's API [$SD_TOKEN]
   --workspace value      Location for checking out and running code (default: "/sd/workspace")
   --emitter value        Location for writing log lines to (default: "/var/run/sd/emitter")
   --meta-space value     Location of meta temporarily (default: "/sd/meta")
   --store-uri value      API URI for Store (default: "http://localhost:8081")
   --ui-uri value         UI URI for Screwdriver (default: "http://localhost:4200")
   --shell-bin value      Shell to use when executing commands (default: "/bin/sh") [$SD_SHELL_BIN]
   --build-timeout value  Maximum number of minutes to allow a build to run (default: 90) [$SD_BUILD_TIMEOUT]
   --only-fetch-token     Only fetching build token
   --help, -h             show help
   --version, -v          print the version

COPYRIGHT:
   (c) 2016-2019 Yahoo Inc.
+ /hab/bin/hab
hab 0.79.1/20190410220617

Authors: The Habitat Maintainers <[email protected]>
"A Habitat is the natural environment for your services" - Alan Turing

USAGE:
    hab [SUBCOMMAND]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    bldr             Commands relating to Habitat Builder
    cli              Commands relating to Habitat runtime config
    config           Commands relating to a Service's runtime config
    file             Commands relating to Habitat files
    help             Prints this message or the help of the given subcommand(s)
    origin           Commands relating to Habitat origin keys
    pkg              Commands relating to Habitat packages
    plan             Commands relating to plans and other app-specific configuration.
    ring             Commands relating to Habitat rings
    studio           Commands relating to Habitat Studios
    sup              The Habitat Supervisor
    supportbundle    Create a tarball of Habitat Supervisor data to send to support
    svc              Commands relating to Habitat services
    user             Commands relating to Habitat users


ALIASES:
    apply      Alias for: 'config apply'
    install    Alias for: 'pkg install'
    run        Alias for: 'sup run'
    setup      Alias for: 'cli setup'
    start      Alias for: 'svc start'
    stop       Alias for: 'svc stop'
    term       Alias for: 'sup term'
+ /hab/bin/hab pkg install core/git
» Installing core/git
☁ Determining latest version of core/git in the 'stable' channel
↓ Downloading core/git/2.21.0/20190826043848
☛ Verifying core/git/2.21.0/20190826043848
↓ Downloading core-20180119235000 public origin key
☑ Cached core-20180119235000 public origin key
→ Using core/acl/2.2.53/20190115012136
→ Using core/attr/2.4.48/20190115012129
→ Using core/bzip2/1.0.6/20190115011950
→ Using core/cacerts/2018.12.05/20190115014206
→ Using core/coreutils/8.30/20190115012313
↓ Downloading core/curl/7.65.3/20190826035620
☛ Verifying core/curl/7.65.3/20190826035620
→ Using core/db/5.3.28/20190115012845
→ Using core/expat/2.2.5/20190115012836
→ Using core/gcc-libs/8.2.0/20190115011926
→ Using core/gdbm/1.17/20190115012826
→ Using core/gettext/0.19.8/20190115013412
→ Using core/glibc/2.27/20190115002733
→ Using core/gmp/6.1.2/20190115003943
→ Using core/less/530/20190115013008
→ Using core/libcap/2.25/20190115012150
→ Using core/linux-headers/4.17.12/20190115002705
→ Using core/ncurses/6.1/20190115012027
→ Using core/nghttp2/1.34.0/20190115160823
→ Using core/openssh/7.5p1/20190305213650
→ Using core/openssl-fips/2.0.16/20190115014207
→ Using core/openssl/1.0.2r/20190305210149
→ Using core/pcre/8.42/20190115012526
→ Using core/perl/5.28.0/20190115013014
→ Using core/sed/4.5/20190115012152
→ Using core/xz/5.2.4/20190115013348
→ Using core/zlib/1.2.11/20190115003728
✓ Installed core/curl/7.65.3/20190826035620
✓ Installed core/git/2.21.0/20190826043848
★ Install of core/git/2.21.0/20190826043848 complete with 2 new packages installed.

@tkyi tkyi unassigned minzcmu Oct 9, 2019
@s-yoshika
Copy link
Contributor

We found a new improvement point in the build setup for k8s(-vm) executors.
These executors have init-container, whose image is launcher.
And the launcher image is built with docker volumes like below.
https://github.com/screwdriver-cd/launcher/blob/master/Dockerfile#L98-L99
The /hab volume which is one of them is a little heavy and seems to take around 3.5s additional time to run the container.

I created some images based on launcher and measured how long time it takes just to echo.

#  normal launcher image, it's pulled beforehand
$ time docker run -it --rm --entrypoint=echo screwdrivercd/launcher:latest

real	0m5.011s
user	0m0.043s
sys	0m0.027s

# removed both docker  volumes(/hab, /opt/sd)
$ time docker run -it --rm --entrypoint=echo launcher:no-vol

real 0m1.359s
user 0m0.048s
sys 0m0.021s

# removed only /opt/sd volume
$ time docker run -it --rm --entrypoint=echo launcher:removed-sd-vol

real 0m4.997s
user 0m0.044s
sys 0m0.029s

# removed only /hab volume
$ time docker run -it --rm --entrypoint=echo launcher:removed-hab-vol

real 0m1.622s
user 0m0.033s
sys 0m0.041s

As above we found that it takes around 3.5s additional time only with /hab volume in our environment.

These docker volumes are needed for the docker executor, but k8s(-vm) executors never use this volume because these executors have extra volumes for kubernetes.
https://github.com/screwdriver-cd/executor-k8s/blob/master/config/pod.yaml.hbs#L62-L68

So we created a custom launcher image whose docker volumes are removed with docker-copyedit which can edit an image metadata like VOLUME.

And we confirmed this change improves the build setup time. In our environment, the average queued time including pulling images can be below 30s for now. The queued time was 40~60s in daily average and have never been below 30s before this chage has been deployed.

@jithine
Copy link
Member

jithine commented Jan 14, 2020

We can make habitat configurable and provide a flag to turn it off SD cluster wide. This flag SD_HABITAT_ENABLED, when off should turn off all habitat related processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

4 participants