Results 1 to 17 of 17
  1. #1

    Unhappy OpenVZ + Dovecot - "Time Moved Backwards" is killing the server

    Hey, ya'll!

    My Dovecot setup keeps crashing due to time changes. Every few minutes after logging in the process stops because time moved back 4~5 minutes. It seems to be a known (as well as old) issue, but i can't find a way around this.

    Aug 24 22:16:32 raiz0 dovecot: master: Dovecot v2.0.13 starting up (core dumps disabled) - PROCESS IS UP
    ...
    Aug 24 22:25:19 raiz0 dovecot: imap-login: Login: user=<smeagle>, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, mpid=5228, secured - LAST ACTIVITY, 1 SECOND BEFORE CRASHING
    Aug 24 22:20:20 raiz0 dovecot: log: Warning: Time moved backwards by 299 seconds.
    Aug 24 22:20:20 raiz0 dovecot: config: Warning: Time moved backwards by 299 seconds.
    Aug 24 22:20:20 raiz0 dovecot: auth: Warning: Time moved backwards by 299 seconds.
    Aug 24 22:20:20 raiz0 dovecot: master: Warning: Time moved backwards by 299 seconds, waiting for 180 secs until new services are launched again. - DOVECOT IS DEAD FOR 3 MINUTES
    Aug 24 22:20:20 raiz0 dovecot: anvil: Warning: Time moved backwards by 299 seconds.
    Aug 24 22:20:20 raiz0 dovecot: auth: Warning: Time moved backwards by 299 seconds.
    Right after the 3 minutes time-off everything is back to normal for 5~10 minutes, then time moves backwards again.

    Feedback from people who experienced the same issue suggests this might have something to do with the node settings, but it's a hard to believe claim as i'm hosted by a well-known budget VPS company.

    Anyway, I've tried both chrony and ntp (with -x and -g parameters) with no success. Things are unusable like that.

    For all that it matters, this is a 64-bit Debian Squeeze box built on kernel 2.6.18, dovecot version 2.0.13 (latest stable release).

    Txs a lot for any suggestion, fellas!

  2. #2
    Join Date
    May 2006
    Location
    San Francisco
    Posts
    7,325
    Have you contacted your host about the node's time?

  3. #3
    Hey, txs for the prompt reply!

    Hmm not really. Is there really any chance a misconfiguration on the host could be causing this?

  4. #4
    Join Date
    May 2006
    Location
    San Francisco
    Posts
    7,325
    Quote Originally Posted by Smeagle View Post
    Hey, txs for the prompt reply!

    Hmm not really. Is there really any chance a misconfiguration on the host could be causing this?
    Quite possibly since time is delegated to individual containers via the main node on VZ.

  5. #5
    Quote Originally Posted by Orien View Post
    Quite possibly since time is delegated to individual containers via the main node on VZ.
    OK, i'm getting a ticket open right now. I will keep this thread updated on how it goes.

  6. #6
    Quote Originally Posted by Smeagle View Post
    Is there really any chance a misconfiguration on the host could be causing this?
    Yes, since your VPS is using OpenVZ, only the main node controls the time. And it controls the time for all the containers. These jumps back in time are truly horrendous.

  7. #7
    Join Date
    May 2006
    Location
    NJ, USA
    Posts
    6,645
    The company should just run ntpd on the nodes..
    AS395558

  8. #8
    I've never played with OpenVZ before so it's kinda new to me, so please guys tell me something: what's the expected behavior from this kind of error? Because I'm NOT seeing time moving backwards at all!

    Take a look at the attached image. The terminal on the right is running "top -d 1", whilst the one on the left is tailing Dovecot's log file. Both of those highlighted events actually occurred around 3 seconds apart, but for some reason the log skipped 309 seconds at that point.

    I was watching them closely and the OS time itself (according to "top") simply *never* changed, instead Dovecot's logged time is moving forwards!

    A few seconds after that "time travel" the log reports the OS time moved backwards because it can't seem to realize it is the one who is running ahead! That leads me to think OpenVZ may have nothing to do with it.

    Or maybe someone shouldn't be messing with the server's flux capacitors...
    Attached Thumbnails Attached Thumbnails bug.jpg  

  9. #9
    Join Date
    Oct 2006
    Location
    Indiana, USA
    Posts
    72
    I had this problem with an OpenVZ provider earlier this year.

    The problem seems to be with certain OpenVZ kernels and is exhibited when Debian is run in a container.

    It took a lot to convince my provider of the problem and that it was a problem on their end of things.

    One thing I did to demonstrate the problem was simply create a script that writes the current time to a file, then ran the script in minute intervals by use of cron. The file, which should have shown the time increment by one minute every minute, would show the time increment normally for about three minutes, then it would jump ahead by five minutes, followed by incrementing normally, then it would jump backwards by five minutes.

    After many support tickets back and forth, the host seemed willing to try a different kernel on the main node, but then they backed out because they didn't want to affect other users on the box. Although they acknowledged the problem with the system time jumping around, they said nobody else complained about it, and therefore they didn't want to fix it. They refunded my money, but they never fixed it, and that was a shame because everything else about that host was excellent. But having time kept properly on a server is important to me.

    I tried to send you a PM because I want to ask you which host you use, but the forum software says you can't receive PMs.

  10. #10
    Quote Originally Posted by mojojuju View Post
    I had this problem with an OpenVZ provider earlier this year.

    The problem seems to be with certain OpenVZ kernels and is exhibited when Debian is run in a container.

    It took a lot to convince my provider of the problem and that it was a problem on their end of things.

    One thing I did to demonstrate the problem was simply create a script that writes the current time to a file, then ran the script in minute intervals by use of cron. The file, which should have shown the time increment by one minute every minute, would show the time increment normally for about three minutes, then it would jump ahead by five minutes, followed by incrementing normally, then it would jump backwards by five minutes.

    After many support tickets back and forth, the host seemed willing to try a different kernel on the main node, but then they backed out because they didn't want to affect other users on the box. Although they acknowledged the problem with the system time jumping around, they said nobody else complained about it, and therefore they didn't want to fix it. They refunded my money, but they never fixed it, and that was a shame because everything else about that host was excellent. But having time kept properly on a server is important to me.

    I tried to send you a PM because I want to ask you which host you use, but the forum software says you can't receive PMs.
    That's weird, i just found out i can't access the PM panel over here... anyway, my provider is "BN" (that massive host mostly known over here for their affordable VPS offers).

    Yeah, the situation you described is exactly what I'm experiencing. In any case, support responded to my ticket today saying they haven't heard other complaints regarding this node. Additionally they've set it to use 0.pool.ntp.org instead of their internal time server, but i'm afraid it didn't work. I don't know if we're talking about the same provider here, but i happen to like them a lot as well. Everything else has been working like a charm so far.

    Earlier today I removed my Dovecot 2 setup and installed the 1.2.17 (1.x latest stable release), yet time was still skipping. Then i tried Courier... same problem. Finally, i solved this issue by removing the function into the ioloop.c library related to time checking and then rebuilding Dovecot. The mail log time is still moving around, but at least it doesn't crash the IMAP server now.

    The point is that this VM will be running time sensitive resources like OpenVPN, master-master Mysql replication, samba and remote directory sync'n. So even though Dovecot is now running fine, i can't always rely on a workaround for everything. So i'm still looking into this issue.

    You mentioned something seems to be broken between OpenVZ and Debian specifically. Have you tried reloading a different distro on that node before you moved out?

    Thank you for sharing!

  11. #11
    Join Date
    Oct 2006
    Location
    Indiana, USA
    Posts
    72
    I think the problem you and I both experienced is documented at http://forum.parallels.com/showthread.php?t=110074

    The vdso=0 boot option mentioned in that thread is supposed to fix the time skipping problem. The host I was with (which is not the one you mentioned BTW) was going to fix the problem with that method, then at the last minute decided not to, but they refunded my money. You might want to bring that up with your host too.

    Also, there's a kernel update for Virtuosso (http://kb.parallels.com/en/111398) that mentions that it fixes this problem:

    The timer in a Container may sometimes be 300 seconds fast. (PCLIN-29182)
    I don't know how fast such fixes become available to OpenVZ though.

    Anyhow, I hope your host shows that it's better than mine in delivering a solution. There's no way I'd stay with a VPS provider that resulted in system time jumping around like that. You shouldn't have to resort to having to modify the ioloop.c library to get a popular software to run on your server.

    Good luck to you, and I hope you'll report back if you managed to get things fixed.

    P.S. As far as this being a Debian specific problem, I think I had this one confused with another totally unrelated problem, so please disregard what I said about Debian.

  12. #12
    Join Date
    Oct 2006
    Location
    Indiana, USA
    Posts
    72
    I think the problem you and I both experienced is documented at http://forum.parallels.com/showthread.php?t=110074

    The vdso=0 boot option mentioned in that thread is supposed to fix the time skipping problem. The host I was with (which is not the one you mentioned BTW) was going to fix the problem with that method, then at the last minute decided not to, but they refunded my money. You might want to bring that up with your host too.

    Also, there's a kernel update for Virtuosso (http://kb.parallels.com/en/111398) that mentions that it fixes this problem:

    The timer in a Container may sometimes be 300 seconds fast. (PCLIN-29182)
    I don't know how fast such fixes become available to OpenVZ though.

    Anyhow, I hope your host shows that it's better than mine in delivering a solution. There's no way I'd stay with a VPS provider that resulted in system time jumping around like that. You shouldn't have to resort to having to modify the ioloop.c library to get a popular software to run on your server.

    Good luck to you, and I hope you'll report back if you managed to get things fixed.

    P.S. As far as this being a Debian specific problem, I think I had this one confused with another totally unrelated problem, so please disregard what I said about Debian.

  13. #13
    Yep, it seems like that's what is going on. But there's one thing: the problem is not taking place on the available 32-bits ISOs. Here's what I just found out...

    I was considering the "top" tool set to refresh every second might not be catching the time changes, so instead I set up a macro (cron wouldn't let me do it so often) to appended the variable "$date" every second to a file for 15 minutes (pretty much like what you did) in an attempt to capture any change in date, time or timezone. There it is! The OS timestamp is jumping around indeed!

    In order to attempt to reproduce the error under a "fresh" environment I created a restore point to my VPS and rebuilt the VM no less than 6 times over the past few hours. I've tried both the x86 and the x64 builds available for Debian 6, Fedora 12 and CentOS 6. The conclusion is pretty simple: All tested 64bits OSs are jumping 5 minutes either backwards or forwards every 2:30 minutes. In the other hand, the 32bits ones never came up with a single "time skip".

    The shell command used was:

    Code:
    echo "$(date)" >> log.txt
    There's no point on posting all of the 6 logs collected given that the results were consistently the same. So let me just show what happened to CentOS 6.0:

    64Bits:
    Fri Aug 26 01:59:00 EDT 2011
    Fri Aug 26 01:59:01 EDT 2011
    Fri Aug 26 01:59:02 EDT 2011
    ...
    Fri Aug 26 02:00:57 EDT 2011
    Fri Aug 26 02:00:58 EDT 2011 RUNNING FOR 119 SECONDS STRAIGHT
    Fri Aug 26 02:05:59 EDT 2011 TIME JUMPED 301 SECONDS FORWARDS
    Fri Aug 26 02:06:00 EDT 2011
    ...
    Fri Aug 26 02:08:27 EDT 2011
    Fri Aug 26 02:08:28 EDT 2011 RUNNING FOR 149 SECONDS STRAIGHT
    Fri Aug 26 02:03:30 EDT 2011 TIME JUMPED 302 SECONDS BACKWARDS
    Fri Aug 26 02:03:31 EDT 2011
    ...
    Fri Aug 26 02:05:57 EDT 2011
    Fri Aug 26 02:05:58 EDT 2011 RUNNING FOR 148 SECONDS STRAIGHT
    Fri Aug 26 02:10:59 EDT 2011 TIME JUMPED 301 SECONDS FORWARDS
    Fri Aug 26 02:11:00 EDT 2011
    ...
    Fri Aug 26 02:13:27 EDT 2011
    Fri Aug 26 02:13:28 EDT 2011 RUNNING FOR 149 SECONDS STRAIGHT
    Fri Aug 26 02:08:29 EDT 2011 TIME JUMPED 299 SECONDS BACKWARDS
    Fri Aug 26 02:08:30 EDT 2011
    ...
    Fri Aug 26 02:10:57 EDT 2011
    Fri Aug 26 02:10:58 EDT 2011 RUNNING FOR 149 SECONDS STRAIGHT
    Fri Aug 26 02:15:59 EDT 2011 TIME JUMPED 301 SECONDS FORWARDS
    Fri Aug 26 02:16:01 EDT 2011
    ...
    (As the macro was running on terminal client, network latency wouldn't allow for 100% accuracy. Therefore any 2 seconds deviation is acceptable)

    32Bits:
    Fri Aug 26 02:26:30 EDT 2011
    Fri Aug 26 02:26:31 EDT 2011
    Fri Aug 26 02:26:32 EDT 2011
    ...
    Fri Aug 26 02:41:28 EDT 2011
    Fri Aug 26 02:41:29 EDT 2011
    Fri Aug 26 02:41:30 EDT 2011 RUNNING FOR 900 SECONDS STRAIGHT. PERFECT!
    This is serious business, I can't imagine a production server running that way. From log files to access statistics, accuracy of the services would be compromised by the server's erratic time measurements.

    I'm pretty sure it's an isolated problem with my account or the node where I'm in. It's safe to say they are hosting thousands (dozens of thousands, maybe?) VPSs, hard to believe every single 6-months-old 64bits setup running on their network is experiencing this sort of issue. I guess WHT would be crowded by people complaining about it.

    That's it. I linked this topic on my ticket and I hope they look into this issue unlike what your provider did. Otherwise I'm sticking with the 32bits Debian.

    To anyone out there who is still reading this and runs an OpenVZ + a 64bits box: it may be worth performing this check just to be on the safe side.

    Thanks, dude! I'm keeping this topic updated as to what they say about it.

  14. #14
    Great news!

    Hi,

    It appears as you noted this is most likely related to a kernel bug. We scheduled some downtime today on all our Miami VPS server's for a few updates, that has been completed at this time. You should not notice this clock drift any further as long as they corrected this bug in the latest OpenVZ kernel release.

    Thanks!
    It's worth noting that they arranged and performed the kernel update on the servers less than 11 hours after I last contacted them. Brilliant!

    Debian is reporting to be on version 2.6.32 now, everything's still running smooth and no more "time traveling" over here! YAY!!!

    Txs everyone!

  15. #15
    Join Date
    Oct 2006
    Location
    Indiana, USA
    Posts
    72
    It's good to hear that you're no longer time traveling.

    I'm surprised though, that I've seen very little mention of this problem in this and other forums..

    I think it's safe to say that you're using BurstNet, and they seem to have handled the problem well.

  16. #16
    Join Date
    Jun 2010
    Location
    Northern Ireland
    Posts
    45
    Was it Burstnet? Whoever it was, they deserve kudos for not doing what 99% of companies do, putting heads in the sand and pretend it's the customers fault

    I wonder where that comes from and why it is the predominant attitude of companies? Nobody likes to be wrong, and maybe the type of people that run companies like to be wrong even less than us mere mortals

    Glad you had a good outcome!

  17. #17
    Quote Originally Posted by mojojuju View Post
    It's good to hear that you're no longer time traveling.

    I'm surprised though, that I've seen very little mention of this problem in this and other forums..

    I think it's safe to say that you're using BurstNet, and they seem to have handled the problem well.
    It feels great to be on the present again!

    Anyway, I'm surprised too that pretty much nobody complained about the clock on their boxes being wrong half the time...

    And sure, it's BurstNet indeed! I was preserving their identity until it was clear I wasn't messing something up, but it's all figured out now. I'm glad it turned out not to be about a misconfiguration on their end, but a reasonably recent kernel bug which they quickly managed to overcome as soon as it was spotted.

    Quote Originally Posted by tentimes View Post
    Was it Burstnet? Whoever it was, they deserve kudos for not doing what 99% of companies do, putting heads in the sand and pretend it's the customers fault

    I wonder where that comes from and why it is the predominant attitude of companies? Nobody likes to be wrong, and maybe the type of people that run companies like to be wrong even less than us mere mortals

    Glad you had a good outcome!
    Totally agreed! That's what I said while leaving a feedback after closing that ticket!

    They could have tried to get away with it as I was the only one experiencing this issue. To be honest, as soon as they told me that, I first thought a refund would be the most I could expect from that conversation (just like what mojojuju's host did to him)... Why would they even bother upgrading a node's kernel because of a random guy reporting an error on a $5.95 VM?

    Not only did they, but they also immediately patched the other boxes in the DC in order to prevent someone else from running into the same headache I've got into.

    We can't expect computers will never fail, but it feels great to know how far our provider is willing to go to back us up.

    Being this much helpful is not the rule for the webhosting market, so I really appreciate what they did for me.

Similar Threads

  1. Moved from Shared Hosting to "Cloud" VPS to thwart hackers
    By gffund in forum Systems Management Requests
    Replies: 7
    Last Post: 01-09-2010, 01:19 AM
  2. Dovecot "Status=deferred" [Ubuntu 9.04]
    By PWS-PO in forum Hosting Security and Technology
    Replies: 6
    Last Post: 11-25-2009, 11:47 AM
  3. Replies: 39
    Last Post: 02-22-2008, 03:22 AM
  4. Replies: 44
    Last Post: 12-18-2005, 12:14 PM
  5. Replies: 5
    Last Post: 01-15-2005, 06:50 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •