-
Notifications
You must be signed in to change notification settings - Fork 373
RFC: Add host time sync in Kata #1279
Comments
/cc @egernst @jcvenegas |
OOI @mcastelino - does it start out in sync and then drift, or start out out-of-sync or? Any feel for the magnitude as well? |
/cc @bergwolf since this will impact agent-as-init as we'll have to make the agent itself start any extra services, rather than using systemd. |
It starts our of sync and then drifts further by quite a bit. I wrote a little tool to model this If you see the zipkin traces you will see how bad the drift is right off the bat. Also some more details on timesync |
I think this can be fixed using kata-containers/agent#425 and doing something like this #976 |
@bergwolf WDYT? looks like we need an owner for this issue. any takers? |
@mcastelino The article suggests adding |
@amshinde I think timesyncd only works with ntp sources, but I may be wrong. |
@dylanzr You are right. I looked at this yesterday, and I realized that timesyncd is quite minimal and works with only ntp, it does not support ptp or hardware clocks. |
The feature is really necessary, I love it. Add chrony to rootfs sounds good though |
I agree that phc is quite useful for us. It's also worth noting that the |
@bergwolf good point, this needs to be handled and documented. |
@bergwolf it is imperative that we have time sync. I see issues in maintaining consistency across the cluster without it. So we need a fallback for older kernels. We should not hold up this PR. The VM (i.e container) cannot assume NTP connectivity. So if the host kernel is older we should fallback to gRPC based sync that was proposed. |
I have raised PR kata-containers/osbuilder#256 to add chrony to rootfs. The PR also configures this chrony to use virtual ptp as a source. As @mcastelino mentioned, we cannot assume NTP connectivity for the VM, we can fallback to GRPC based sync in that case. (We cannot rely on NTP sources for chrony or systemd-timesynced ). What do others think about this approach. I would also like some input on how to handle time sync in case of initrd based rootfs. |
Hi~ @mcastelino @egernst @devimc @bergwolf |
@Pennyzct grpc sync is not really going to be accurate. We should be looking at adding ptp support for aarch64. |
@mcastelino yes, we should be considering to implement kvm-ptp on aarch64. ;) @jongwu @justin-he |
Is there a chance that the jump to QEMU 5.0 broke this? I'm having time-sync issues running on a 4.19 x86 box, running Kata 1.11.3 with QEMU 5.0 and the pre-packaged VM images. |
@evanfoster - could you provide further details of the sync issues? Do you get errors from |
@evanfoster - might be worth raising a fresh issue on it and referencing this one. |
Hey @jodh-intel , I haven't set up a debug image to test this yet, so I'm not sure if I'm seeing time drift issues, but the time may also be incorrect on startup. I've had some folks report a 20 second discrepancy on pod start (which breaks their application, they need ±2 seconds), and I have a pod that's running ~63 seconds slow after 15 days. I don't have quite enough data yet to justify creating a new issue, but I'll work on gathering that so I can do so. |
Hi @evanfoster - yes, you'll need to build a debug image as documented in https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#set-up-a-debug-console. However, you could just enable full debug as a first step to see if that gives you anything interesting in the logs. I hope the move to Qemu 5 didn't break this since we do have a basic time drift test that should have caught the issue here: Could you possibly do a bit of digging and maybe open a new issue with the output of |
Can do! It might be a few days, however. |
Time is not accurate within Kata containers
When running containers using runc as long as the host systems and time synchronized time is accurate within the containers and consistent across the cluster.
However when running Kata containers time is no longer accurate.
Any end to end traces obtained that involve Kata container will yield wrong/inconsistent results.
We should consider adding
ptp_kvm
kernel module to the default Kata Kernel and set it up for timesync with the host usingchronyd
The downside of this is that it will add an active component in the Kata VM in addition to the kata agent.
Expected result
Time is Kata containers should be consistent with host time to match runc behavior
The text was updated successfully, but these errors were encountered: