Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

virtcontainers: Fix unit tests #209

Merged
merged 3 commits into from
Apr 16, 2018
Merged

Conversation

sboeuf
Copy link

@sboeuf sboeuf commented Apr 13, 2018

Several failures have been reported through kata-containers/tests#225 regarding our CI. A panic was triggered after our unit tests were reaching the defined timeout of 10s, and we could have bumped this timeout to a bigger value, but the root cause was our bad management of the go routines from our code and from our tests, leaving a lot of go routines behind, and slowing down our tests.

Fixes #208

Sebastien Boeuf added 3 commits April 13, 2018 15:22
When using noopShim type from the unit tests, we were ending up
getting a PID 1000, and when checking if the shim was around, we
were always expecting the shim to be "not running", based on the
fact that the process was not there anymore. Unfortunately, this
was a very wrong assumption because we cannot control which PIDs
are running or not on the system. The way to simplify this is to
return a PID 0 in case of noopShim, processed as a special case
by the function waitForShim().

Fixes kata-containers#208

Signed-off-by: Sebastien Boeuf <[email protected]>
Because of the bad design of the cc_proxy_mock go routine, we were
leaving an infinite loop running into this go routine behind. This
was consuming a lot of resources and it was obviously slowing down
the tests being run in parallel. That's one of the reason we were
hitting the 10 seconds timeout when running go tests.

Fixes kata-containers#208

Signed-off-by: Sebastien Boeuf <[email protected]>
Those different files were all calling into a go routine that was
eventually reporting some result through a go channel. The problem
was the way those routine were implemented, as they were hanging
around forever. Indeed, nothing was actually listening to the channel
in some cases, and those routines never ended.

This was one of the problem detected by the fact that our unit tests
needed more time to pass because when they were all run in parallel,
the resources consumed by those routines were increasing the time
for other tests to complete.

Fixes kata-containers#208

Signed-off-by: Sebastien Boeuf <[email protected]>
@sboeuf sboeuf added the review label Apr 13, 2018
@sboeuf
Copy link
Author

sboeuf commented Apr 13, 2018

@devimc @grahamwhaley @egernst @bergwolf PTAL and let's merge this ;)

Copy link
Contributor

@grahamwhaley grahamwhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice hunting @sboeuf
lgtm

proxy.startListening()
go func() {
for {
proxy.serve()

proxy.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - do you need the locking if you only have one reader and one writer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you do, otherwise Golang complains. I agree it is semantically overkill here.

@grahamwhaley
Copy link
Contributor

Couple of strange things on the failed 17.10 CI - I'll give it a nudge (or is this a known issue @chavafg ?).
From the logs:

[env],Tty:false,Stdin:false,Stdout:true,Stderr:true,}" 
# time="2018-04-14T02:06:14Z" level=fatal msg="execing command in container failed: unable to upgrade connection: 404 page not found"
# 1
# rm: cannot remove '/tmp/tmp.bRonNcxVBh/crio/devicemapper/mnt/39c165bd2c9a769327ce7504c6107baea42d08faab2a53f67d578c1b8ab9d820': Device or resource busy
# rm: cannot remove '/tmp/tmp.bRonNcxVBh/crio/devicemapper/mnt/0fdf931ad72a8ccc862ad2376cb0d3b95dc264908a1af5a6d7894546fb9db7a3': Device or resource busy
# rm: cannot remove '/tmp/tmp.bRonNcxVBh/crio-run/devicemapper-containers/200b9202d80ea95734851b07445c36344b5426d452a303317f31c20d760badb9/userdata/shm': Device or resource busy
Build timed out (after 5 minutes). Marking the build as aborted.

@codecov
Copy link

codecov bot commented Apr 16, 2018

Codecov Report

Merging #209 into master will decrease coverage by <.01%.
The diff coverage is 75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #209      +/-   ##
==========================================
- Coverage   65.33%   65.33%   -0.01%     
==========================================
  Files          73       73              
  Lines        7702     7707       +5     
==========================================
+ Hits         5032     5035       +3     
- Misses       2127     2128       +1     
- Partials      543      544       +1
Impacted Files Coverage Δ
virtcontainers/noop_shim.go 100% <100%> (ø) ⬆️
virtcontainers/hook.go 94.59% <71.42%> (-2.51%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad5669f...92577c6. Read the comment docs.

Copy link
Member

@egernst egernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good; please address Graham's comments, though.

@egernst egernst merged commit 8088a62 into kata-containers:master Apr 16, 2018
@egernst egernst removed the review label Apr 16, 2018
zklei pushed a commit to zklei/runtime that referenced this pull request Jun 13, 2019
The go netlink package has added more netlink socket protocols
than we need. Specify one to avoid failure due to unsupported
protocols.

Fixes: kata-containers#209

Signed-off-by: Peng Tao <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants