-
Notifications
You must be signed in to change notification settings - Fork 373
shimv2: Add tracing #2980
shimv2: Add tracing #2980
Conversation
@egernst @jodh-intel could either of you test locally? also maybe @amshinde wants to take a look |
/test |
Codecov Report
@@ Coverage Diff @@
## master #2980 +/- ##
==========================================
- Coverage 50.35% 50.30% -0.06%
==========================================
Files 120 120
Lines 15918 15978 +60
==========================================
+ Hits 8016 8037 +21
- Misses 6812 6851 +39
Partials 1090 1090 |
b302c1f
to
003a824
Compare
1b06857
to
eddc957
Compare
The answer to why tracing doesn't work earlier is that tracer must be created after We could try to call |
eddc957
to
c328c4d
Compare
@jodh-intel @amshinde @egernst this traces from I still need to: (1) add check for error after initial |
8941634
to
0533bcf
Compare
/test |
Thanks @cmaf! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cmaf. A few comments.
containerd-shim-v2/service.go
Outdated
// Stop tracing because a new tracer will be created when New is called again after StartShim | ||
defer katautils.StopTracing(s.ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it might be better to check if s.tracer
in New()
and calling StopTracing()
there if tracing is already active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jodh-intel I'm not sure I understand this. I could call StopTracing()
before the tracer is created in New()
, but s.tracer
doesn't exist until halfway through the function. A new context is used every time New()
is called. New()
is called twice per container, initially followed by StartShim()
, so there is one context used between those two, and a different context for the second New()
and subsequent functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jodh-intel I thought about this some more and tracing can't be stopped earlier in New()
because stopping tracing depends on using the specific context associated with the tracer, which isn't available since a new one is used in that function. It is possible I'm missing functions in between New()
and StartShim()
(or after), but StartShim()
is the only function that I've seen called after New()
using the same context associated with the tracer where we can end the first trace.
containerd-shim-v2/service.go
Outdated
func trace(ctx context.Context, tracer opentracing.Tracer, name string) opentracing.Span { | ||
if tracer != opentracing.Tracer(nil) { | ||
span, _ := opentracing.StartSpanFromContext(ctx, name) | ||
span.SetTag("subsystem", "runtime") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could set this tag in the root span to avoid having to set every time here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jodh-intel I tried setting it only at New
where the tracer is created and don't see a subsystem for the other functions, so I'm leaving it in for now.
@egernst yes it starts at |
680c0a6
to
5c4e754
Compare
@jodh-intel Thanks, I updated it based on your feedback and am testing a few locations for |
Hi @cmaf - I've atlast had a chance to try out this PR. Although I'm seeing trace spans, they are a couple of problems:
Adding a bit of debug shows an initially somewhat confusing set of shimv2 calls being made:
However, adding more shows the real picture: the Kata v2 shim (which is loaded for the lifetime of a single container) is forked twice. The group of API calls divides into two sets:
This now makes more sense and leads to the following possibilities:
I think Option 2 would be the best. And if you changed I appreciate that we're not tracing as early as possible with Option 2 (we wouldn't be tracing |
I should have mentioned that I'm using containerd 1.4.1: $ ctr version
Client:
Version: v1.4.1
Revision: c623d1b36f09f8ef6536a057bd658b3aa8632828
Go version: go1.13.15 |
containerd-shim-v2/service.go
Outdated
ctx, cancel := context.WithCancel(ctx) | ||
|
||
s := &service{ | ||
id: id, | ||
pid: uint32(os.Getpid()), | ||
ctx: ctx, | ||
tracer: tracer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmaf Could you help me understand how this Tracer instance is used? I'm probably overlooking something but I don't see it referenced anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmores, thanks for catching this. You're not overlooking anything, I previously used s.tracer
in the code but that had been removed. I removed this instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmaf Thanks!
9ce6522
to
1063b13
Compare
/test |
@jodh-intel Updated with a ppid check for scenario 2. If you generate a trace now you should see several ungrouped spans not specific to shimv2, and then a single The test for tracing in shimv2 will be in the 2.0-dev branch: kata-containers/tests#3038. |
1063b13
to
2de728b
Compare
containerd-shim-v2/service.go
Outdated
if os.Getppid() == 1 { | ||
span, _ := trace(s.ctx, "ResizePty") | ||
defer span.Finish() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmaf Are the ppid checks useful though outside of New(), Shutdown(), Create() and Delete()? I would assume that the first fork exits after spawning the second one...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmores you're right, they actually don't need to be in all functions. However I am removing all the checks because we decided to trace all functions irrespective of ppid.
Hi @cmaf, I just tried the latest version of your PR to test out the tracing output, but I am not seeing any traces related to CreateSandbox as I was seeing earlier. I do see that I have enabled tracing for the runtime in the configuration.toml file, but the only traces I am seeing are traces consisting of a single span for |
@amshinde yes it is working for me. I see |
@@ -274,7 +298,20 @@ func getTopic(e interface{}) string { | |||
return cdruntime.TaskUnknownTopic | |||
} | |||
|
|||
func trace(ctx context.Context, name string) (opentracing.Span, context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmaf Why not just reuse the Trace() function in pkg/katautils/tracing.go and modify it if needed if you want to add the check for nil context. cc @jodh-intel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amshinde Trace()
in pkg/katautils/tracing.go
sets a tag for the cli, which we don't want in this case. I noticed virtcontainers had its own implementation of trace()
so I implemented one for shimv2. I think we should fix this in a subsequent PR so that we're not replicating code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but this seems like a simple fix, and probably easier to do in this commit than at a later time when the two functions may have diverged (they already began diverging here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c3d virtcontainers has 13 different trace()
implementations which exist to add different tags to spans (@jodh-intel, correct me if I'm wrong). I think this should be changed so that all use a general Trace()
function that takes a list of tags, but since it is not essential to shimv2 tracing functionality and applies to many other functions, it should have its own issue & PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #3093
2de728b
to
7fad6fe
Compare
With the latest changes, this PR looks good to me at this point. |
@@ -274,7 +298,20 @@ func getTopic(e interface{}) string { | |||
return cdruntime.TaskUnknownTopic | |||
} | |||
|
|||
func trace(ctx context.Context, name string) (opentracing.Span, context.Context) { | |||
if ctx == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test does not exist in the Trace()
function in katautils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c3d you're right, it does not exist there but it exists in the sandbox trace function and a few other ones in virtcontainers. The CI fails with a SIGSEGV without it so I kept it in.
@@ -274,7 +298,20 @@ func getTopic(e interface{}) string { | |||
return cdruntime.TaskUnknownTopic | |||
} | |||
|
|||
func trace(ctx context.Context, name string) (opentracing.Span, context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but this seems like a simple fix, and probably easier to do in this commit than at a later time when the two functions may have diverged (they already began diverging here)
containerd-shim-v2/service.go
Outdated
ctx = context.Background() | ||
} | ||
span, ctx := opentracing.StartSpanFromContext(ctx, name) | ||
span.SetTag("subsystem", "runtime") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Trace()
function sets the source
tag to runtime
, not the subsystem
one. What is the reason for the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Trace()
function has an additional created span
message, which according to the comment, is essential for tracing testing. Thag begs the question: how do you plan to test this newly added tracing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c3d I switched it to source
, it should be consistent.
Regarding the created span
message, the test is undergoing significant changes in a PR (kata-containers/tests#2129) and will not check for that string.
@@ -330,6 +367,9 @@ func (s *service) Cleanup(ctx context.Context) (_ *taskAPI.DeleteResponse, err e | |||
|
|||
// Create a new sandbox or container with the underlying OCI runtime | |||
func (s *service) Create(ctx context.Context, r *taskAPI.CreateTaskRequest) (_ *taskAPI.CreateTaskResponse, err error) { | |||
span, _ := trace(s.ctx, "Create") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to have a few span.LogField()
added giving more information about what is being created?
(This same comment applies to practically all functions below)
Or do you plan to add that later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least to start, the basic spans would be good -- it shows the trace of execution very clearly. if feasible, I think we should look at adding in follow on PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/test |
7fad6fe
to
82ede0e
Compare
/test |
Add trace calls to shimv2 that create spans for functions in service.go. Tracing starts in New(), which is forked twice and is followed by either StartShim() or Create(). Tracing cannot start without the value for Trace enabled from the runtime config so load the config in New(), which results in it being loaded every time New() is called in addition to where it is originally loaded after Create(). Fixes kata-containers#1807 Signed-off-by: Chelsea Mafrica <[email protected]>
82ede0e
to
0279c81
Compare
/test |
@c3d, thanks for the feedback! I think I responded to everything, please let me know if you still have concerns. |
Added tracing with tracer being created in Create, which should ideally
happen in New. Currently getting no trace or span output when the tracer
is created in New or StartShim, or Create before create() is called.
Also no log output from these functions.
To do:
(ctx passed in or ctx saved in service struct - currently implemented
using service ctx)
New if possible
Signed-off-by: Chelsea Mafrica [email protected]