Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

FC: extract error info from firecracker built-in log and metrics scheme #2073

Merged
merged 1 commit into from
Nov 25, 2019

Conversation

Pennyzct
Copy link
Contributor

Which feature do you think can be improved?
Hi~ I've recently been working on enabling firecracker on kata containers.
If Firecracker exits abnormally after InstantStart request, the error output you could get is only timeout related, nothing useful.

docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: context deadline exceeded: unknown.

Firecracker has its own logging scheme, involving two fifo files. For better debugging, we should
extract those error info and re-play them in kata-runtime log.

@Pennyzct
Copy link
Contributor Author

@jodh-intel @devimc plz review. ;)

for scanner.Scan() {
fc.Logger().WithFields(logrus.Fields{
"level": "log",
"contents": scanner.Text()}).Error("firecracker failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pennyzct Why are you logging these as errors? Are these generic log messages or errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amshinde Since it's the error info extracted from firecracker built-in logging scheme. It will only have outputs when firecracker failed setting up vmm. I have already set the logging level as Error here.

fcLogLevel := "Error"

for scanner.Scan() {
fc.Logger().WithFields(logrus.Fields{
"level": "metrics",
"contents": scanner.Text()}).Error("firecracker failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above. Why are metrics logged as errors as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, metrics data will be more about performance, but it also has data structure like drive_fails, machine_cfg_fails, ... etc for users to spot root cause of failure. And Like I said last comment, I have set the level as Error, it will only output when firecracker fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Pennyzct for the explanation.

Copy link

@devimc devimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Pennyzct

if err := fc.metricsFifo.Close(); err != nil {
fc.Logger().WithError(err).Error("Failed closing firecracker metrics fifo file")
}
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can factorize a lot of code here, how about creating a new function to open and read the pipe and call it with different parameters depending on the pipe type

Copy link
Contributor Author

@Pennyzct Pennyzct Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @devimc I have already used the API from containerd/fifo to open, read,..., close pipe files.
And you're right, reading from logs.fifo and metrics.fifo share a lot same logic and I should create one new function to summarize. ;)

@Pennyzct Pennyzct force-pushed the firecracker_fifo_log branch 3 times, most recently from 9966589 to f258d79 Compare September 20, 2019 09:54
@devimc
Copy link

devimc commented Sep 20, 2019

/test

@codecov
Copy link

codecov bot commented Sep 20, 2019

Codecov Report

Merging #2073 into master will decrease coverage by 0.14%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #2073      +/-   ##
==========================================
- Coverage   50.98%   50.84%   -0.15%     
==========================================
  Files         110      110              
  Lines       15224    15266      +42     
==========================================
  Hits         7762     7762              
- Misses       6506     6548      +42     
  Partials      956      956

@Pennyzct Pennyzct force-pushed the firecracker_fifo_log branch from f258d79 to 6bec9d3 Compare September 26, 2019 02:01
@Pennyzct
Copy link
Contributor Author

re-base and /test

@jcvenegas
Copy link
Member

/test

@jcvenegas
Copy link
Member

@Pennyzct sorry a re-re-rebase is required here , btw some test were failing

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -579,6 +584,68 @@ func (fc *firecracker) fcSetVMBaseConfig(mem int64, vcpus int64, htEnabled bool)
return err
}

func (fc *firecracker) fcSetLogger() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the fc prefix as it's redundant but I see there is already precedent for this in the codebase.

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Oct 8, 2019

Hi @amshinde
So sorry for the delayed response, just got back from our national holiday. ;)
I will re-re-base asap. For the failing part, with arm CI, it's because the missing aarch64 entry in the latest image centos:latest. I have addressed specific details in kata-contains/ci:#222.
And for the other failing part, I will track down and try to find out how it's related. 😁

@Pennyzct Pennyzct force-pushed the firecracker_fifo_log branch from 6bec9d3 to 11c4026 Compare October 8, 2019 03:07
@Pennyzct
Copy link
Contributor Author

Pennyzct commented Oct 8, 2019

/test

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Oct 10, 2019

Hi guys.
I really need some help here. 😭
The error docker: Error response from daemon: OCI runtime create failed: Put http://localhost/logger: EOF: unknown. really frustrated me. Have you guys ever encounter the similar error or have any clue where this comes from?
@devimc @amshinde @jcvenegas @jodh-intel @bergwolf

MetricsFifo: &jailedMetricsFifo,
}
param.SetBody(cfg)
_, err = fc.client().Operations.PutLogger(param)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this can only be be called after fc is started?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid not, it should belong to the configuration part, and should before the InstanceStart command.
And if you are using curl to reproduce, it works well. 😂

 curl --unix-socket api.socket -i \
         -X PUT "http://localhost/logger" \
         -H "accept: application/json" \
         -H "Content-Type: application/json" \
         -d "{
                  \"log_fifo\": \"/root/firecracker/log.fifo\",
                  \"metrics_fifo\": \"/root/firecracker/metrics.fifo\",
                   \"level\": \"Info\"
              }"

@Pennyzct
Copy link
Contributor Author

I have tried a few "plausible" reason:

The connection has already been closed. But when client initiates another request before it actually
detects the close, and the EOF that it reads get interpreted as an error for this new request rather
than a close that occurred prior to the request.

Given that, I've given sleeping time between requests to give it time to detect the close, but, still,
the error occured.


func (fc *firecracker) fcListenToFifo(fifoName string) (string, error) {
fcFifoPath := filepath.Join(fc.vmPath, fifoName)
fcFifo, err := fifo.OpenFifo(context.Background(), fcFifoPath, syscall.O_CREAT|syscall.O_RDONLY|syscall.O_NONBLOCK, 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pennyzct have you tried with a blocking io?

@lifupan
Copy link
Member

lifupan commented Nov 8, 2019

Hi @Pennyzct

I spend some time investigating on this issue, and found that the root cause located in firecracker, which would be crashed when setting logger with options as null.
I had opened a defect to firecracker: firecracker-microvm/firecracker#1375 .

I think you can workaround this issue in kata side by applying the following diff to your commits:

diff --git a/virtcontainers/fc.go b/virtcontainers/fc.go
index 1d8e9f39..b3acdf56 100644
--- a/virtcontainers/fc.go
+++ b/virtcontainers/fc.go
@@ -630,6 +630,7 @@ func (fc *firecracker) fcSetLogger() error {
                Level:       &fcLogLevel,
                LogFifo:     &jailedLogFifo,
                MetricsFifo: &jailedMetricsFifo,
+               Options: []string{},
        }
        param.SetBody(cfg)
        _, err = fc.client().Operations.PutLogger(param)

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Nov 8, 2019

Hi @lifupan
Thanks three thousand times!!!! update asap. ;).

@Pennyzct Pennyzct force-pushed the firecracker_fifo_log branch from 11c4026 to bd633e1 Compare November 12, 2019 08:38
@Pennyzct
Copy link
Contributor Author

/test

Firecracker have its own logging scheme, providing two fifo files with log
and metrics info.
We should extract error info for better debugging.

Fixes: kata-containers#2072

Signed-off-by: Penny Zheng <[email protected]>
Signed-off-by: Haibo Xu <[email protected]>
@Pennyzct Pennyzct force-pushed the firecracker_fifo_log branch from bd633e1 to daae1db Compare November 13, 2019 01:22
@Pennyzct
Copy link
Contributor Author

/test

@raravena80
Copy link
Member

/test

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants