-
Notifications
You must be signed in to change notification settings - Fork 373
FC: extract error info from firecracker built-in log and metrics scheme #2073
FC: extract error info from firecracker built-in log and metrics scheme #2073
Conversation
@jodh-intel @devimc plz review. ;) |
virtcontainers/fc.go
Outdated
for scanner.Scan() { | ||
fc.Logger().WithFields(logrus.Fields{ | ||
"level": "log", | ||
"contents": scanner.Text()}).Error("firecracker failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pennyzct Why are you logging these as errors? Are these generic log messages or errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @amshinde Since it's the error info extracted from firecracker built-in logging scheme. It will only have outputs when firecracker failed setting up vmm. I have already set the logging level as Error
here.
fcLogLevel := "Error"
for scanner.Scan() { | ||
fc.Logger().WithFields(logrus.Fields{ | ||
"level": "metrics", | ||
"contents": scanner.Text()}).Error("firecracker failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above. Why are metrics logged as errors as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, metrics data will be more about performance, but it also has data structure like drive_fails
, machine_cfg_fails
, ... etc for users to spot root cause of failure. And Like I said last comment, I have set the level as Error
, it will only output when firecracker fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Pennyzct for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Pennyzct
if err := fc.metricsFifo.Close(); err != nil { | ||
fc.Logger().WithError(err).Error("Failed closing firecracker metrics fifo file") | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can factorize a lot of code here, how about creating a new function to open and read the pipe and call it with different parameters depending on the pipe type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @devimc I have already used the API from containerd/fifo to open, read,..., close pipe files.
And you're right, reading from logs.fifo
and metrics.fifo
share a lot same logic and I should create one new function to summarize. ;)
9966589
to
f258d79
Compare
/test |
Codecov Report
@@ Coverage Diff @@
## master #2073 +/- ##
==========================================
- Coverage 50.98% 50.84% -0.15%
==========================================
Files 110 110
Lines 15224 15266 +42
==========================================
Hits 7762 7762
- Misses 6506 6548 +42
Partials 956 956 |
f258d79
to
6bec9d3
Compare
re-base and /test |
/test |
@Pennyzct sorry a re-re-rebase is required here , btw some test were failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -579,6 +584,68 @@ func (fc *firecracker) fcSetVMBaseConfig(mem int64, vcpus int64, htEnabled bool) | |||
return err | |||
} | |||
|
|||
func (fc *firecracker) fcSetLogger() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the fc
prefix as it's redundant but I see there is already precedent for this in the codebase.
Hi @amshinde |
6bec9d3
to
11c4026
Compare
/test |
Hi guys. |
MetricsFifo: &jailedMetricsFifo, | ||
} | ||
param.SetBody(cfg) | ||
_, err = fc.client().Operations.PutLogger(param) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this can only be be called after fc is started?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid not, it should belong to the configuration part, and should before the InstanceStart
command.
And if you are using curl
to reproduce, it works well. 😂
curl --unix-socket api.socket -i \
-X PUT "http://localhost/logger" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d "{
\"log_fifo\": \"/root/firecracker/log.fifo\",
\"metrics_fifo\": \"/root/firecracker/metrics.fifo\",
\"level\": \"Info\"
}"
I have tried a few "plausible" reason:
Given that, I've given sleeping time between requests to give it time to detect the close, but, still, |
|
||
func (fc *firecracker) fcListenToFifo(fifoName string) (string, error) { | ||
fcFifoPath := filepath.Join(fc.vmPath, fifoName) | ||
fcFifo, err := fifo.OpenFifo(context.Background(), fcFifoPath, syscall.O_CREAT|syscall.O_RDONLY|syscall.O_NONBLOCK, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pennyzct have you tried with a blocking io?
Hi @Pennyzct I spend some time investigating on this issue, and found that the root cause located in firecracker, which would be crashed when setting logger with options as null. I think you can workaround this issue in kata side by applying the following diff to your commits:
|
Hi @lifupan |
11c4026
to
bd633e1
Compare
/test |
Firecracker have its own logging scheme, providing two fifo files with log and metrics info. We should extract error info for better debugging. Fixes: kata-containers#2072 Signed-off-by: Penny Zheng <[email protected]> Signed-off-by: Haibo Xu <[email protected]>
bd633e1
to
daae1db
Compare
/test |
/test |
Which feature do you think can be improved?
Hi~ I've recently been working on enabling firecracker on kata containers.
If Firecracker exits abnormally after
InstantStart
request, the error output you could get is only timeout related, nothing useful.Firecracker has its own logging scheme, involving two fifo files. For better debugging, we should
extract those error info and re-play them in kata-runtime log.