Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

[WIP] virtcontainers: netns_watcher: Monitor network changes #194

Closed
wants to merge 1 commit into from

Conversation

sboeuf
Copy link

@sboeuf sboeuf commented Apr 7, 2018

This commit introduces a new watcher dedicated to the monitoring
of a specific network namespace in order to detect any change that
could happen to the network.

As a result of such a detection, the watcher should call into the
appropriate runtime path with the proper arguments to modify the
pod network accordingly.

Fixes #170

Signed-off-by: Sebastien Boeuf [email protected]

@sboeuf sboeuf added the wip label Apr 7, 2018
@sboeuf sboeuf changed the title virtcontainers: netns_watcher: Monitor network changes [WIP] virtcontainers: netns_watcher: Monitor network changes Apr 7, 2018
@sboeuf sboeuf force-pushed the netns_watcher branch 3 times, most recently from bb4a8d2 to 058f461 Compare April 11, 2018 09:05
@sboeuf
Copy link
Author

sboeuf commented Apr 11, 2018

@bergwolf @miaoyq @WeiZhang555 Still WIP, but I'd like to get a first review !

@bergwolf
Copy link
Member

I guess it is better to put this in the cli directory?

@codecov
Copy link

codecov bot commented Apr 11, 2018

Codecov Report

Merging #194 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #194   +/-   ##
=======================================
  Coverage   66.67%   66.67%           
=======================================
  Files          93       93           
  Lines        9580     9580           
=======================================
  Hits         6387     6387           
  Misses       2506     2506           
  Partials      687      687

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4a7712...e17a516. Read the comment docs.

}

/* Enter network namespace */
ret = enter_netns((const char*) params.netns_path);
Copy link

@miaoyq miaoyq Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can run the monitor via network.run(networkNSPath string, cb func() error) error in virtcontainers, instead of enter_netns((const char*) params.netns_path) here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that virtcontainers would start the process directly into the network namespace, and I agree this would work, but I think it is better to have this being performed from the C process directly as entering namespaces is safer in C than Go. See #148 for more details about namespaces issues in Go.

Moreover, this makes this program more generic as it is able to enter a namespace on its own.

ret = monitor_netns((const char*) params.pod_id,
(const char*) params.runtime_path);
if (ret) {
goto exit;
Copy link

@miaoyq miaoyq Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant if statement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll remove it.

@sboeuf
Copy link
Author

sboeuf commented Apr 11, 2018

@bergwolf well I guess it can be either way, and I made the choice to put this under virtcontainers since it is virtcontainers responsibility to spawn this at the right time (i.e after the network has been configured through the pod creation).
@amshinde @egernst @miaoyq WDYT ?

@sboeuf sboeuf force-pushed the netns_watcher branch 2 times, most recently from 0e67822 to f008bff Compare April 11, 2018 22:01
@sboeuf
Copy link
Author

sboeuf commented Apr 11, 2018

@bergwolf @miaoyq @egernst @amshinde I'll try to summarize how this program works so that it is easier for you to review the code.

This watcher is supposed to be started by virtcontainers after the initial network has been setup. For this reason, after it entered the given network namespace, it takes a snapshot of what the network looks like. This will serve as a reference whenever a netlink event is received.
For route related events, we don't need such thing since the whole route information is provided through the event.
But in case we're adding a new IP address to an existing interface, or even when adding a new interface with IP addresses attached to it, we need to make sure we can provide what the interface should look like. This is part of the decision to go with the declarative way for the network, which means we describe what the interface should look like. Maintaining an accurate description of the network is needed in this case, otherwise we don't have the whole information about the interface through events like RTM_NEWADDR or RTM_DELADDR.

List of tasks to consider:

  • Replace printf logging with syslog logs.
  • Provide the pod_id through the runtime command line.
  • Make sure we don't save the changes to our internal list of interfaces if the runtime call returned with an exit code different from 0.
  • Consider valid events only for a list of interface type (like veth), but ignore bridge for instance. Maybe this could be handled from virtcontainers directly, the monitoring process being more like a simple passthrough.
  • Complete the code implementing the calls to the runtime. But this needs the CLI/API extension regarding network hotplug to be implemented. Or at least properly defined.
  • Replace the table/list of interfaces with a hash table.

@bergwolf
Copy link
Member

@sboeuf There are two things I am thinking differently:

  1. The workflow: I think the ns_watcher should be started by kata cli instead of virtcontainers library. Starting it from the virtcontainers library seems to be a layering violation to me, the call chain of which is kata cli -> virtcontainers -> ns_watcher -> kata cli -> virtcontainers. I would suggest we avoid such jumping back forth.
  2. We should put this in the cli directory. The ns_watcher calls kata-runtime cli directly. Putting it in the virtcontainers directory introduces cycle dependency logically.

@miaoyq
Copy link

miaoyq commented Apr 12, 2018

But in case we're adding a new IP address to an existing interface, or even when adding a new interface with IP addresses attached to it, we need to make sure we can provide what the interface should look like. This is part of the decision to go with the declarative way for the network, which means we describe what the interface should look like. Maintaining an accurate description of the network is needed in this case, otherwise we don't have the whole information about the interface through events like RTM_NEWADDR or RTM_DELADDR.

I think we only care the veth with ip addr, we should ignore the veth without ip addr utill an IP addr is added. Also, monitor only provides pod ID and interface name, and virtcontainers can get all Network information according to pod ID.

  • Consider valid events only for a list of interface type (like veth), but ignore bridge for instance. Maybe this could be handled from virtcontainers directly, the monitoring process being more like a simple passthrough.

Agree.

  • Complete the code implementing the calls to the runtime. But this needs the CLI/API extension regarding network hotplug to be implemented. Or at least properly defined.

Anybody have started working on sandbox API definition related to the document ?

  • Replace the table/list of interfaces with a hash table.

What does this mean? I'm not clear with this. :-p @sboeuf

@sboeuf
Copy link
Author

sboeuf commented Apr 12, 2018

@miaoyq

I think we only care the veth with ip addr, we should ignore the veth without ip addr utill an IP addr is added. Also, monitor only provides pod ID and interface name, and virtcontainers can get all Network information according to pod ID.

We can say that we don't send an event if we only receive a new interface, but I think we still need to track this new interface internally, so that we can provide the whole interface when a new IP address is provided. I am saying this for two reasons:

  • The whole design rely on the fact that we want declarative API, meaning that we expect the caller to provide an explicit description of the interface, and not only its name.
  • Based on the first reason, I don't want to scan the network after we received an event since this might not be the representation of what is expected (if several IP addresses are added, we might be scanning and finding 3 IP addresses, but there are more events to come after this). Also, scanning does not tell us if an interface has been updated, we need to keep the state of the network to compare it to the event received.

@WeiZhang555 @egernst I'd really like your input on this.
Either we can make this monitoring process very simple if we consider providing only the interface name and saying that it has to be added/updated/deleted, based on the fact that virtcontainers will scan the network to get the whole interface description.
Or we stick with the declarative way, and this means the monitoring process has to provide the full description of the network, for which it needs to maintain a full snapshot of the network. For instance, without this snapshot, we cannot determine if an interface needs to be updated or added in case we receive RTM_NEWLINK.

Agree.

Just to confirm, what do you agree on ? Both the fact that we should limit the interface type to veth and the fact that the network hotplug API should be responsible for this ? Or only one of them ?

Anybody have started working on sandbox API definition related to the document ?

Not yet, but this is tied to the new network hotplug API. I can make a proposal about what is going to be needed here

What does this mean? I'm not clear with this. :-p @sboeuf

I am using a simple global array of structures to save the snapshot of the network, and this works only for interfaces index being between 1 and 49. But if we want to make sure we can support 50 interfaces, using a hash table would work better.

@sboeuf
Copy link
Author

sboeuf commented Apr 12, 2018

@bergwolf

The workflow: I think the ns_watcher should be started by kata cli instead of virtcontainers library. Starting it from the virtcontainers library seems to be a layering violation to me, the call chain of which is kata cli -> virtcontainers -> ns_watcher -> kata cli -> virtcontainers. I would suggest we avoid such jumping back forth.

The chain you're describing here is not really different (binary wise) since virtcontainers is a simple library and not a binary(different process) itself.

We should put this in the cli directory. The ns_watcher calls kata-runtime cli directly. Putting it in the virtcontainers directory introduces cycle dependency logically.

There is no such thing as cycle dependency here since the binary is in C and it does not import anything from virtcontainers.

To be honest, I understand your concern and I am balanced between the two different possibilities. The main reason I'd like to keep it inside virtcontainers is that we need to know when we can start this from the CLI perspective. Virtcontainers knows better about this kind of thing. That being said, maybe starting this after the call to CreateSandbox() returned might be a way of doing this.
I'd like more input on this. @amshinde @devimc @grahamwhaley @egernst WDYT ?

@miaoyq
Copy link

miaoyq commented Apr 12, 2018

Just to confirm, what do you agree on ? Both the fact that we should limit the interface type to veth and the fact that the network hotplug API should be responsible for this ? Or only one of them ?

@sboeuf Both.

@bergwolf
Copy link
Member

@sboeuf If we put this in virtcontainers, we make virtcontainers depend on kata cli because the ns_watcher binary depends on it. The functionality is not working without kata cli. That's why I call it logical dependency not golang import dependency.


nif.idx = if_idx;

if (if_idx >= MAX_IFACES) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MAX_IFACES=50 is far from enough.

if_idx is link index, if host has more than 50 interfaces, then you create a new container, it's interface link index is easily larger than 50.
So I think link index shouldn't be used as the array's index, the array index should be standalone counter.

Copy link
Member

@WeiZhang555 WeiZhang555 Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I get a segment fault when running the process.
To reproduce the error, you can create 50 containers, then try to exec

$ netns_watcher -d -n /proc/<pid>/ns/net -p hello -r /usr/local/bin/kata-runtime

Pid should be process id of last container, and its interface index is larger than 50.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the table of interfaces, that's exactly what I was mentioning when I said I'd like to replace this with a hash table ;). I will make the change.

Now about the segmentation fault, I am not sure, and I'll try to reproduce !

@WeiZhang555
Copy link
Member

Some inputs from me:

Either we can make this monitoring process very simple if we consider providing only the interface name and saying that it has to be added/updated/deleted, based on the fact that virtcontainers will scan the network to get the whole interface description.
Or we stick with the declarative way, and this means the monitoring process has to provide the full description of the network, for which it needs to maintain a full snapshot of the network. For instance, without this snapshot, we cannot determine if an interface needs to be updated or added in case we receive RTM_NEWLINK.

I think latter one is better. ns_watcher should take all responsibilities of watching network changes, and send all interface infos to kata cli. Let kata cli/virtcontainers take another scan isn't a good idea in my opinion.

And it looks good to me to put this into virtcontainers/ lib.

Most part of the code looks quite cool, only some comments on details.


#define PROGRAM_NAME "netns-watcher"

#define MAX_IFACES 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do the following for maximum built-time flexibility:

#ifndef MAX_IFACES
#define MAX_IFACES  50
#endif

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Copyright (C) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • GPL or Apache 2.0?
  • SPDX license header required ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

void print_iface_list() {
int i;

for (i = 0; i < MAX_IFACES; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather this was:

int max_ifaces = (int)(sizeof(iface_list)/sizeof(iface_list[0]));

for (i = 0; i < max_ifaces; i++) {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*/
void print_version(void)
{
printf("%s v0.1\n", PROGRAM_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Hard-coded version number. How about a #define for it?
  • Version should be in semver format.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/*
* Free internal fields of the route structure.
*/
void free_route(struct route *rt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a check on the param:

if (! rt) {
    return;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.ip_addrs = NULL,
};

iface_list[idx] = nif;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong - you're appending a local variable to a global array?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is not simply to make sure every field of iface_list[idx] get reset to the proper values. But I can do something different.

* Free internal fields of the iface structure.
*/
void free_iface(struct iface *nif) {
int i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*/
int fork_runtime_call(char *params[])
{
int ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

int create_iface_from_ifaddrs(struct ifaddrs *ifa, struct iface *nif)
{
int sock;
int family;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,1251 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about giving it a prefix like kata or vc? :)

I assume it's intentional that this isn't actually mentioned in the Makefile yet too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's call this kata-netmon which stands for "Kata network monitor". And yes, the Makefile will come later. I need to finalize the program first.

@jodh-intel
Copy link
Contributor

Since this PR introduces C code, we're also going to have to add C code static analysis checks specifically for this file. Ideally, we'd also register the project with https://scan.coverity.com/, but fwics that only works with Travis which we're trying to get rid of.

@jodh-intel
Copy link
Contributor

Could you update the commit message to explain what this tool does when it detects a change to a network ns?

@sboeuf
Copy link
Author

sboeuf commented May 31, 2018

Depends on implementation and decisions made on #287
Once #287 will get merged, the development on this PR could resume.

@katabuilder
Copy link

PSS Measurement:
Qemu: 162111 KB
Proxy: 8749 KB
Shim: 10807 KB

Memory inside container:
Total Memory: 2045972 KB
Free Memory: 2001904 KB

@jodh-intel
Copy link
Contributor

btw, I think you'll need to add a commit to enable Coverity Scan as this is the first piece of C code in the runtime repo. Basically crib what you can from:

.. and do:

@jodh-intel
Copy link
Contributor

... for static analysis, we could start by setting up https://github.com/marketplace/lgtm which is free for OSS projects and has full github integration fwics.

@grahamwhaley
Copy link
Contributor

I can't think of any reason not to give lgtm a tryout at least...

@jodh-intel
Copy link
Contributor

Hi @sboeuf - please could you push your latest branch as your comments don't appear to match up with reality for us :)

@sboeuf
Copy link
Author

sboeuf commented Jul 30, 2018

@jodh-intel yes you're right, but also what happened is that I have re-written this code in Go to prototype faster here. I will push another PR soon, replacing this one. The goal being to validate the whole behavior of what we expect from this binary, and maybe port it to C later.

@jodh-intel
Copy link
Contributor

Sounds good - thanks for the update @sboeuf.

This commit introduces a new watcher dedicated to the monitoring
of a specific network namespace in order to detect any change that
could happen to the network.

As a result of such a detection, the watcher should call into the
appropriate runtime path with the proper arguments to modify the
pod network accordingly.

Fixes kata-containers#170

Signed-off-by: Sebastien Boeuf <[email protected]>
@sboeuf
Copy link
Author

sboeuf commented Jul 30, 2018

I want to keep this PR around since we might move to the C implementation later. But for now, we want to prototype this through the new PR #534

@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 169529 KB
Proxy: 5891 KB
Shim: 8814 KB

Memory inside container:
Total Memory: 2043480 KB
Free Memory: 2003868 KB

@opendev-zuul
Copy link

opendev-zuul bot commented Jul 30, 2018

Build succeeded (third-party-check pipeline).

/*
* Add route.
*/
int add_route(const struct nlmsghdr *nh, const char *sandbox_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this kind of change, I would insist that either we make change to network config via the sandbox network hotplug API, or we put this binary in the cli directory. I do not think we should make virtcontainers call the kata-runtime binary directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It sounds a bit recursive :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment here, let me know what you think about it !

@sboeuf
Copy link
Author

sboeuf commented Sep 10, 2018

Closing this PR as it's very unlikely that we'll use this C code. The code will still be available on my Github fork on the branch netns_watcher.

@sboeuf sboeuf closed this Sep 10, 2018
zklei pushed a commit to zklei/runtime that referenced this pull request Jun 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants