Results 1 to 17 of 17
  1. #1
    Join Date
    Feb 2008
    Location
    Wilkes-Barre, PA
    Posts
    1,119

    Supermicro CentOS 6.5 NIC bug

    tl;dr If you're running a Supermicro X8SIE and the NIC keeps crashing in CentOS 6.5, you're not going crazy.. there's a bug.

    Just want to post this for anyone else that is having similar issues.

    A client ordered a Xeon X3430 from us with CentOS 6.4. A day or two later he opened a ticket with us stating that he couldn't connect to the server. I took a look at it and couldn't figure out what was going on. Assuming he just made a configuration change that I couldn't find, I just offered to reinstall the server.

    About a day later, the same thing happened. Server was fully booted, everything seemed fine, couldn't figure out what was going on. This time I decided to dig into it a little deeper and discovered that the NIC was crashing. That then led me to this little bug: http://bugs.centos.org/view.php?id=6810

    After the updating CentOS from 6.4 to 6.5, a bug in the NIC driver was causing the network to drop.
    NEPA Fiber
    AS 394868 - Wilkes-Barre, PA
    █ Fiber Internet, Dedicated Servers, Colocation, Cloud
    100% Uptime SLA - 24/7/365 Support

  2. #2
    Join Date
    Nov 2006
    Location
    USA
    Posts
    762
    Yes, I have seen the e1000e module issue. Disabling active state power management and updating the e1000e module fixed this issue.

  3. #3
    Join Date
    Jan 2008
    Location
    Raleigh, NC
    Posts
    1,230
    Quote Originally Posted by PersonalJ View Post
    Yes, I have seen the e1000e module issue. Disabling active state power management and updating the e1000e module fixed this issue.
    This. We had the same issue and found that to correct the problem. Hasn't happened since.
    █| VEEROTECH.NET - Shared, Reseller & VPS Hosting
    █| High Performance *Pure SSD* CloudLinux & LiteSpeed Powered Web Hosting
    █| cPanel & WHM - Softaculous - Website Builder - R1Soft - SpamExperts - Let's Encrypt

  4. #4
    Join Date
    Aug 2006
    Location
    Ashburn VA, San Diego CA
    Posts
    4,571
    This effects almost all SM boards with 82574L NICs. Some worse than others, the other day we has an X8SIL board that couldn't make it thru a net install. Seems it's become far worse on 6.5 than previous releases.

    You can patch the hardware easily, and/or use a kernel switch to resolve it:

    http://djlab.com/2012/10/x9scm-x9scl-network-timeout/

    As an added measure, I'd also recommend using the e1000e module from ELRepo rather than building and maintaining your own.

    http://elrepo.org/tiki/kmod-e1000e
    Fast Serv Networks, LLC | AS29889 | Fully Managed Cloud, Streaming, Dedicated Servers, Colo by-the-U
    Since 2003 - Ashburn VA + San Diego CA Datacenters

  5. #5
    Join Date
    Mar 2003
    Location
    California USA
    Posts
    13,294
    Quote Originally Posted by FastServ View Post
    This effects almost all SM boards with 82574L NICs. Some worse than others, the other day we has an X8SIL board that couldn't make it thru a net install. Seems it's become far worse on 6.5 than previous releases.

    You can patch the hardware easily, and/or use a kernel switch to resolve it:

    http://djlab.com/2012/10/x9scm-x9scl-network-timeout/

    As an added measure, I'd also recommend using the e1000e module from ELRepo rather than building and maintaining your own.

    http://elrepo.org/tiki/kmod-e1000e
    Typically this issue has been easily been resolved in most cases. However, I have a pair of servers that for the life of me couldn't fix with any solution. Latest bios, tried the firmware patch, different drivers, different kernels, kernel flags, etc. Nothing would keep it from timing out the nic. I ended up putting dual port nics in each server.

    It was really odd. Never had servers have it so bad as those two.
    Steven Ciaburri | Industry's Best Server Management - Rack911.com
    Software Auditing - 400+ Vulnerabilities Found - Quote @ https://www.RACK911Labs.com
    Fully Managed Dedicated Servers (Las Vegas, New York City, & Amsterdam) (AS62710)
    FreeBSD & Linux Server Management, Security Auditing, Server Optimization, PCI Compliance

  6. #6
    Join Date
    Aug 2006
    Location
    Ashburn VA, San Diego CA
    Posts
    4,571
    Quote Originally Posted by Steven View Post
    Typically this issue has been easily been resolved in most cases. However, I have a pair of servers that for the life of me couldn't fix with any solution. Latest bios, tried the firmware patch, different drivers, different kernels, kernel flags, etc. Nothing would keep it from timing out the nic. I ended up putting dual port nics in each server.

    It was really odd. Never had servers have it so bad as those two.
    Yea it's something we've learned to live with over the years on most SM boards and has become part of the automated deployment process.

    The older X8 boards USED to be unaffected by this until CentOS6.5. We couldn't even complete a net install 6.5 without adding pcie_aspm=off to the PXE boot args, and I fear as these existing installs are auto updated to 6.5 it's going to be fun when they are rebooted.
    Fast Serv Networks, LLC | AS29889 | Fully Managed Cloud, Streaming, Dedicated Servers, Colo by-the-U
    Since 2003 - Ashburn VA + San Diego CA Datacenters

  7. #7
    Join Date
    Feb 2008
    Location
    Wilkes-Barre, PA
    Posts
    1,119
    I tried everything, pcie_aspm=off, etc.. and I still couldn't get it to work. I ended up just upgrading the client to a different server. I only have a few of those boards anyway so I'll screw around with it when I have free time, rather than making the client wait for me to figure it out.
    Last edited by Encrypted; 01-01-2014 at 09:13 PM.
    NEPA Fiber
    AS 394868 - Wilkes-Barre, PA
    █ Fiber Internet, Dedicated Servers, Colocation, Cloud
    100% Uptime SLA - 24/7/365 Support

  8. #8
    Join Date
    Jan 2008
    Location
    Raleigh, NC
    Posts
    1,230
    Quote Originally Posted by Encrypted View Post
    I tried everything, pcie_aspm=off, etc.. and I still couldn't get it to work. I ended up just upgrading the client to a different server. I only have a few of those boards anyway so I'll screw around with it when I have free time, rather than making the client wait for me to figure it out.
    Did you reboot the node after making the change? (assume you did, but some don't)
    █| VEEROTECH.NET - Shared, Reseller & VPS Hosting
    █| High Performance *Pure SSD* CloudLinux & LiteSpeed Powered Web Hosting
    █| cPanel & WHM - Softaculous - Website Builder - R1Soft - SpamExperts - Let's Encrypt

  9. #9
    Join Date
    Aug 2006
    Location
    Ashburn VA, San Diego CA
    Posts
    4,571
    Quote Originally Posted by Encrypted View Post
    I tried everything, pcie_aspm=off, etc.. and I still couldn't get it to work. I ended up just upgrading the client to a different server. I only have a few of those boards anyway so I'll screw around with it when I have free time, rather than making the client wait for me to figure it out.
    In your case it may have required the kmod-e1000e package from ELrepo. That would essentially wipe out the stock driver which is causing the problems. I've seen a handful of servers that needed it.
    Fast Serv Networks, LLC | AS29889 | Fully Managed Cloud, Streaming, Dedicated Servers, Colo by-the-U
    Since 2003 - Ashburn VA + San Diego CA Datacenters

  10. #10
    Join Date
    Feb 2008
    Location
    Wilkes-Barre, PA
    Posts
    1,119
    Quote Originally Posted by Kingfish85 View Post
    Did you reboot the node after making the change? (assume you did, but some don't)
    Indeed I did.

    Quote Originally Posted by FastServ View Post
    In your case it may have required the kmod-e1000e package from ELrepo. That would essentially wipe out the stock driver which is causing the problems. I've seen a handful of servers that needed it.
    I'm going to try that when I get a spare minute, which doesn't happen very often. haha
    NEPA Fiber
    AS 394868 - Wilkes-Barre, PA
    █ Fiber Internet, Dedicated Servers, Colocation, Cloud
    100% Uptime SLA - 24/7/365 Support

  11. #11
    Quote Originally Posted by Encrypted View Post
    I tried everything, pcie_aspm=off,
    "pcie_aspm=off" has solved the "Reset adapter" issue on all of our servers. Maybe you are facing different problem. Are there any messages in your logfiles?

  12. #12
    Join Date
    Feb 2008
    Location
    Wilkes-Barre, PA
    Posts
    1,119
    Quote Originally Posted by UltraVPS View Post
    "pcie_aspm=off" has solved the "Reset adapter" issue on all of our servers. Maybe you are facing different problem. Are there any messages in your logfiles?
    The logs were empty, which is what was confusing me. However, I'm fairly sure that this is the same issue. Every time the server is rebooted it stays up for about 7-8 minutes and then stops pinging. According to mii-tool, it has link, etc as well. It's also not the port on the switch, because I swapped the same slot out with a different server and it's fine.
    NEPA Fiber
    AS 394868 - Wilkes-Barre, PA
    █ Fiber Internet, Dedicated Servers, Colocation, Cloud
    100% Uptime SLA - 24/7/365 Support

  13. #13
    Join Date
    Aug 2006
    Location
    Ashburn VA, San Diego CA
    Posts
    4,571
    Quote Originally Posted by Encrypted View Post
    The logs were empty, which is what was confusing me. However, I'm fairly sure that this is the same issue. Every time the server is rebooted it stays up for about 7-8 minutes and then stops pinging. According to mii-tool, it has link, etc as well. It's also not the port on the switch, because I swapped the same slot out with a different server and it's fine.
    I think you have a different (more serious) issue. This particular NIC/Driver bug issue (going back at least 2 years now, on multiple distro's not just CentOS6) leaves clear evidence in syslog. Probably best that you moved your client off the box.
    Fast Serv Networks, LLC | AS29889 | Fully Managed Cloud, Streaming, Dedicated Servers, Colo by-the-U
    Since 2003 - Ashburn VA + San Diego CA Datacenters

  14. #14
    Join Date
    Feb 2008
    Location
    Wilkes-Barre, PA
    Posts
    1,119
    Quote Originally Posted by FastServ View Post
    I think you have a different (more serious) issue. This particular NIC/Driver bug issue (going back at least 2 years now, on multiple distro's not just CentOS6) leaves clear evidence in syslog. Probably best that you moved your client off the box.
    The issue is only in CentOS 6.5. I threw CentOS 6.4 on it earlier today and it's been up all day thus far.
    NEPA Fiber
    AS 394868 - Wilkes-Barre, PA
    █ Fiber Internet, Dedicated Servers, Colocation, Cloud
    100% Uptime SLA - 24/7/365 Support

  15. #15
    Join Date
    Oct 2012
    Location
    Portugal - My Paradise
    Posts
    224
    Simple, follow this small tutorial:

    Code:
    vi /etc/grub.conf
    Add the code before "quet":

    Code:
    pcie_aspm=off
    Reboot server.

    Code:
    dmesg|grep PCIe
    YOu will see: PCIe ASPM is disabled
    ▄▀▄ Offshore Dedicated Servers
    ▄▀▄ Keep your privacy!
    ▄▀▄ Europe Servers with DDoS Protection
    ▄▀▄ www.evoluso.com / [email protected] / SKYPE: evoluso.com

  16. #16
    What version of the e1000e driver are you using?
    RamNode - #1 SSD VPS
    High Performance SSD and SSD-Cached VPS
    NYC - LA - ATL - SEA - NL - 1Gbps - IPv6 - DDoS Protection - AS3842
    Get your super fast VPS today! - www.ramnode.com

  17. #17
    Join Date
    Aug 2006
    Location
    Ashburn VA, San Diego CA
    Posts
    4,571
    You can make a conditional script out of it to apply it as part of your deploy scripts:

    Code:
    #!/bin/sh
    if grep -qE '6\.[0-9]{1,2}' "/etc/redhat-release"; then
     if lspci | grep -qE '82574L'; then
      ## e1000e fix for CentOS6/82574L
      rpm --import http://elrepo.org/RPM-GPG-KEY-elrepo.org
      rpm -Uvh http://elrepo.org/elrepo-release-6-5.el6.elrepo.noarch.rpm
      yum -y install kmod-e1000e
      curl --connect-timeout 15 -m 15 "http://djlab.com/stuff/fixeep.sh" > ~/fixeep.sh
      chmod +x ~/fixeep.sh
      ~/fixeep.sh eth0
      ~/fixeep.sh eth1
      # Then add pcie_aspm=off to grub.conf and restart
      cp /boot/grub/grub.conf /boot/grub/grub.conf.bak
      cat /boot/grub/grub.conf | sed '/^\s*kernel/ s/$/ pcie_aspm=off/' > /boot/grub/grub.conf.new
      mv -f /boot/grub/grub.conf.new /boot/grub/grub.conf
     fi
    fi
    Last edited by FastServ; 01-03-2014 at 10:26 AM.
    Fast Serv Networks, LLC | AS29889 | Fully Managed Cloud, Streaming, Dedicated Servers, Colo by-the-U
    Since 2003 - Ashburn VA + San Diego CA Datacenters

Similar Threads

  1. centos 6.2 OpenVZ HW Node 2 NIC's
    By ptimo3 in forum Hosting Security and Technology
    Replies: 4
    Last Post: 11-24-2012, 06:50 PM
  2. Linux CentOS 6.3 two NIC cards don't get along
    By Sfed in forum Hosting Security and Technology
    Replies: 4
    Last Post: 09-30-2012, 08:22 AM
  3. SUPERMICRO MBD-X9SCL+-F NIC issues?
    By Funtacular in forum Colocation and Data Centers
    Replies: 6
    Last Post: 08-24-2012, 12:33 AM
  4. Centos 6.2 VPS server Dual Nic setup
    By Onessa in forum Web Hosting
    Replies: 5
    Last Post: 05-31-2012, 04:37 AM
  5. Bug in Centos 6
    By hostemo in forum Dedicated Server
    Replies: 3
    Last Post: 10-15-2011, 01:15 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •