Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

weave_flows metric is fastdp-only, no way to see how many sleeve flows there are #3788

Closed
hairyhenderson opened this issue Mar 18, 2020 · 7 comments · Fixed by #3789
Closed
Milestone

Comments

@hairyhenderson
Copy link
Contributor

(note: I'm not very familiar with Weave internals, so apologies if I make terminology mistakes below - please correct me!)

We recently ran into the issues fixed in 2.6.2 (#3781, #3783, #3782), and while working through upgrading our clusters were wondering how we could detect this sort of thing in the future.

We use Prometheus for our monitoring, and have the weave_flows metric graphed on a Grafana dashboard. It's obvious when looking over a period of time that the number of flows plummets (in our case, from an average of a few thousand per node to below ~100 per node).

However, we found no way to measure the number of sleeve flows.

Note that this may be related to #2557, but that may be a big general, and it's been stale for a long time.

What you expected to happen?

A metric to measure the number of sleeve flows should be present, or perhaps a label to indicate which kind of flows are being measured.

What happened?

No indication of what weave_flows is actually indicating.

How to reproduce it?

Grab the output of the /metrics endpoint from a weave process.

Anything else we need to know?

Probably not relevant, but we're running on AWS with KOPS.

Versions:

Note: we were running Weave 2.6.1, but have now upgraded to 2.6.2. (can't exec into the weave-net pod, but we're running weaveworks/weave-npc:2.6.2).

$ docker version
Client:
 Version:           18.09.9
 API version:       1.38 (downgraded from 1.39)
 Go version:        go1.11.13
 Git commit:        039a7df9ba
 Built:             Wed Sep  4 16:51:48 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.3-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       d7080c1
  Built:            Wed Feb 20 02:26:45 2019
  OS/Arch:          linux/amd64
  Experimental:     false
$ uname -a
Linux ip-10-40-108-161 5.3.0-0.bpo.2-cloud-amd64 #1 SMP Debian 5.3.9-2~bpo10+1 (2019-11-13) x86_64 GNU/Linux
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.10", GitCommit:"1bea6c00a7055edef03f1d4bb58b773fa8917f11", GitTreeState:"clean", BuildDate:"2020-02-11T20:05:26Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Logs:

(not relevant IMO)

Network:

(not relevant IMO)

@bboreham
Copy link
Contributor

"flow" is an object in the OVS kernel module used by fastdp; there is no analogous structure for sleeve.

But I think you are looking for some indication that peer-to-peer connections have stuck at sleeve when they should be fastdp. We could add the connection type as a label on weave_connections.

Does weave_connections{state="established"} - weave_flows give you the number of sleeve connections?

@hairyhenderson
Copy link
Contributor Author

"flow" is an object in the OVS kernel module used by fastdp; there is no analogous structure for sleeve.

But I think you are looking for some indication that peer-to-peer connections have stuck at sleeve when they should be fastdp. We could add the connection type as a label on weave_connections.

Ah, thanks for clarifying. Yes, that's definitely what I'm looking for, and connection type on weave_connections sounds about right!

Does weave_connections{state="established"} - weave_flows give you the number of sleeve connections?

That's kind of what I was wondering, but the scale seems off. From what you're implying weave_connections{state="established"} should include both fastdp and sleeve connections?

For example, during one of the instants when we were running Weave 2.6.1, for one of the pods, weave_connections{state="established"} is 15, where weave_flows is 42, though the weave_flows count seems to vary quite a bit more than connections:

Screen Shot 2020-03-18 at 13 35 45

And then, after the upgrade to 2.6.2, weave_flows is quite a bit larger than weave_connections{state="established"}:

Screen Shot 2020-03-18 at 13 42 41

So I suppose that means that no, weave_connections{state="established"} - weave_flows won't really mean anything 😉

@bboreham
Copy link
Contributor

Ah, right: flows are per-MAC whereas connections are per-machine (or peer if you prefer).

Would you be interested to try a PR to add the connection-type label?

@hairyhenderson
Copy link
Contributor Author

perhaps, though I'm not at all familiar with the weave code so it may take some time... I'll poke at it and see 😉

@bboreham
Copy link
Contributor

Great! Start here:

for _, conn := range s.Router.Connections {

The status struct has a dictionary Attrs where key name should give a string like sleeve or fastdp. (There may also be keys like encrypted and mtu)

@hairyhenderson
Copy link
Contributor Author

@bboreham thanks - I was wondering what Attrs would be! map[string]interface{}s are the bane of my existence 😂

@hairyhenderson
Copy link
Contributor Author

@bboreham I've issued #3789 - PTAL when you have a chance, it's perhaps a bit rough, let me know how I can improve it!

@bboreham bboreham added this to the 2.7 milestone Aug 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants