Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

runtime: Ignore ENOENT in kill/delete #2960

Merged
merged 1 commit into from
Nov 4, 2020
Merged

runtime: Ignore ENOENT in kill/delete #2960

merged 1 commit into from
Nov 4, 2020

Conversation

keloyang
Copy link
Contributor

If sandbox/container's dir is not exist, the kill/delete will always fail,and this make kubelet/container delete it repeatedly
but fail always. In some kind of abnormal situation(e.g. kill -9 $pidofqemu),kata-runtime may kill/delete sandbox first, this make container'dir not exist, so kata-runtime should skip this error.

Fixes: #2959.

Signed-off-by: Shukui Yang [email protected]

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising @keloyang !

Could you look at porting this to Kata 2.0 (https://github.com/kata-containers/kata-containers/tree/2.0-dev/src/runtime) ?

Also, it would be good to understand why the test below did not find this issue (and fix it if necessary):

I wonder if we need a new test to exercise this code.

cli/delete.go Outdated
@@ -75,6 +75,11 @@ func delete(ctx context.Context, containerID string, force bool) error {
kataLog.Warnf("Failed to get container, force will not fail: %s", err)
return nil
}
if err.Error() == syscall.ENOENT.Error() {
kataLog.Infof("delete, container %s is not exist, skip", containerID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change these log lines to be "structured". Something like:

kataLog.WithField("container", containerID).Info("skipping delete as container does not exist")

@keloyang
Copy link
Contributor Author

@jodh-intel I used the old Kata environment, not shimv2, but I think we can really can improve that test case, we can check the return for "crictl stopp $podid" and "crictl rmp $podid". the "structured" log has been changed, ptal, thanks.

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @keloyang.

lgtm

Note: That test does specify set -o errexit (aka set-e), so any failing command will fail the test fwiw).

@jodh-intel
Copy link
Contributor

@keloyang - Travis is failing due to a compile error:

# github.com/kata-containers/runtime/cli
./delete.go:78:21: undefined: syscall

@keloyang
Copy link
Contributor Author

# github.com/kata-containers/runtime/cli
./delete.go:78:21: undefined: syscall

updated, thanks for your patience,I feel sorry for my mistake.

@keloyang
Copy link
Contributor Author

the ci always fail, so I didn't check the CI results every time. sorry.

@jodh-intel
Copy link
Contributor

Thanks @keloyang.

/test

@codecov
Copy link

codecov bot commented Sep 22, 2020

Codecov Report

Merging #2960 into master will increase coverage by 0.06%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master    #2960      +/-   ##
==========================================
+ Coverage   50.26%   50.32%   +0.06%     
==========================================
  Files         119      120       +1     
  Lines       15789    15845      +56     
==========================================
+ Hits         7936     7974      +38     
- Misses       6772     6785      +13     
- Partials     1081     1086       +5     

@keloyang
Copy link
Contributor Author

ping @jodh-intel @pmores thanks for your review.

@fidencio
Copy link
Member

@keloyang, similarly to the other PR, this one was blocked tue to the metrics failure.

Would be possible to rebase it atop of current git main? It'd allow the metrics one to pass and we can finally have it merged.

@keloyang
Copy link
Contributor Author

Do you means I need to do a rebase ? @fidencio

@fidencio
Copy link
Member

/test

@fidencio
Copy link
Member

Let's see if this new push solves the metrics issue, thanks @keloyang.

If sandbox/container's dir is not exist, the kill/delete will
always fail,and this make kubelet/container delete it repeatedly
but fail always. In some kind of abnormal
situation(e.g. kill -9 $pidofqemu),
kata-runtime may kill/delete sandbox first, this make container'dir
not exist, so kata-runtime should skip this error.

Fixes: #2959.

Signed-off-by: Shukui Yang <[email protected]>
@liubin
Copy link
Member

liubin commented Nov 4, 2020

/test

Copy link
Member

@fidencio fidencio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@liubin liubin merged commit e50f23b into kata-containers:master Nov 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kata-runtime cannot kill container
5 participants