-
Notifications
You must be signed in to change notification settings - Fork 195
integration: soak: add rm soak test #414
integration: soak: add rm soak test #414
Conversation
Marking RFC/WIP right now, as I'd like to confirm some things with others.... I have a sneaky feeling this could be related to kata-containers/runtime#396 - just a feeling. It feels like the old 'stuck lock' in vc issues we had a long long time ago - but, irrc, @sboeuf rewrote how all the locking works, so I suspect it has similar symptoms, but is not the same issue. /cc @jamiehannaford for completeness. |
Oh, once we've worked over this somewhat I'll add a commit to add this into the QA CI scripts so it runs on every PR - if we agree (we should look at how long it takes to run for instance). Also, this PR is somewhat strongly related to #215 - that is, much of the sanity checking code from this PR can be lifted out into a lib maybe and then we can run that before/after each test. |
@grahamwhaley I confirm that running the soak test triggers the error you're describing. I have tried on a simple Azure VM with 4vCPUs and 16GiB of RAM. I have run the test for 20 containers only. The issue seems to be related to locking but it might not be the root cause. This is worth some investigations. |
OK - I've got some debug from the stuck |
As mentioned on this issue, it would be extremely useful to get this landed. |
This'll still be mine then. Things to do are:
|
Hi @grahamwhaley - that list sounds good. But I don't think we need to wait until the script is checking all those things before landing a basic test though? You could add features as we go along on separate PRs. The existing script seems to have proved itself so the sooner we get it running regularly, the safer we'll be from those nightmare cross-repo bisects right? 😄 |
302bb89
to
6f6b4d4
Compare
OK... I've dropped the RFC and DNM... |
doh - I missed the |
6f6b4d4
to
5c6cc00
Compare
@grahamwhaley - nice! Hopefully the CI will be able to crunch through this a bit quicker (I see they are triggered so we'll need to look at the logs once finished). Hence, aside from the lgtm |
5c6cc00
to
9747600
Compare
phew, let's try one more time (fingers crossed!). This time:
/cc @bergwolf , as #578 will maybe have similar or benefit from all of those as well :-) |
I think you'll need to check if docker service is started (and start it if not) before executing the tests. |
Ah, I see:
@chavafg - is that the common 'idiom' for our test suites then - they have to check and start the services they need before they run? |
9747600
to
bfc1b99
Compare
OK, I see we do something similar for other test suites - sometimes in the Makefile (swarm), sometimes in the scripts themselves (crio). I've added the relevant |
network errors on the jobs :(
|
Heh heh. From the F27 CI:
and the same for Centos7 it seems. I'll check where/what the Now, otherwise in a way this is sort of good, as that is exactly the sort of situation the test is mean to pick up. |
@grahamwhaley, the
Although not sure what is the best way to solve this. In https://github.com/kata-containers/tests/blob/master/integration/openshift/hello_world.bats, we use:
|
Add a function to show a number of kata relevant system information items such as what docker and the runtime thinks is running, and what components we can see alive. Useful as a diagnostic tool for if we fail a sanity check during testing. Signed-off-by: Graham Whaley <[email protected]>
Add an 'rm' soak test. The test was originally written to capture 'stuck' docker rm's of many containers, but as it also does a lot of sanity checking of many other parts of the system (checks for runtime/qemu/proxy/shims running when they should, and not running when they should not, and that 'kata-runtime list' matches what we have asked docker to do, and check we don't leave dangling mounts around etc.), it has also been useful for general stability checking. Fixes: kata-containers#195 Signed-off-by: Graham Whaley <[email protected]>
Enable the docker soak test in the Makefile. Over-ride the test default configuration to bring the test time down to something more acceptible in the CIs. Signed-off-by: Graham Whaley <[email protected]>
bfc1b99
to
e99893b
Compare
Added a system info dump function to the common lib. Fixed the sudo RUNTIME invocation. Let's see how the CIs are feeling... |
lgtm, CI happy. Merging |
Add an 'rm' soak test. The test was originally written
to capture 'stuck' docker rm's of many containers, but
as it also does a lot of sanity checking of many other
parts of the system (checks for runtime/qemu/proxy/shims
running when they should, and not running when they should
not, and that 'kata-runtime list' matches what we have asked
docker to do, and check we don't leave dangling mounts around
etc.), it has also been useful for general stability checking.
Fixes: #195
Signed-off-by: Graham Whaley [email protected]