Page 1 of 4 1234 LastLast
Results 1 to 25 of 86
  1. #1
    Join Date
    Jul 2005
    Location
    Australia
    Posts
    128

    ARP limits on the MLX chassis

    There's been a number of threads talking about the MLX router chassis' and the fact that the published specs on Brocade gear leaves a lot to be desired.

    I've got a first hand report of 3% cpu at 6k ARP on the MLX, but we also know on the FESX switches that it doesn't scale linearly and that there is an effective ARP "barrier" at some point where the MLX is likely to just die in a fire.

    I'm just wondering if there is anyone currently using an MLX in their network, and what level of cpu usage they're seeing and at what ARP table size so that we can get some idea of a realistic and reliable maximum.
    IOFLOOD.com -- We Love Servers
    Phoenix, AZ Dedicated Servers
    ★ 28-core Dual E5-2680v4 ★ 10G / 40G Bandwidth ★
    Contact Us: sales [at] ioflood.com ★ Skype: ioflood_michael_b ★

  2. #2
    Join Date
    Aug 2006
    Location
    Ashburn VA, San Diego CA
    Posts
    4,615
    Might want to post this on f-nsp.
    Fast Serv Networks, LLC | AS29889 | DDOS Protected | Managed Cloud, Streaming, Dedicated Servers, Colo by-the-U
    Since 2003 - Ashburn VA + San Diego CA Datacenters

  3. #3
    Join Date
    Dec 2009
    Posts
    2,297

    Re: ARP limits on the MLX chassis

    Just curious.. why don't you architect your network in a way that eliminates these giant arp tables at the "core?"
    REDUNDANT.COMEquinix Data Centers Performance Optimized Network
    Managed & Unmanaged
    • Servers • Colocation • Cloud • VEEAM
    sales@redundant.com

  4. #4
    Join Date
    Jan 2003
    Location
    Chicago, IL
    Posts
    6,957
    Quote Originally Posted by Ionity View Post
    Just curious.. why don't you architect your network in a way that eliminates these giant arp tables at the "core?"
    6k is giant?

    And in a typical network configuration the MLX's could often be used in a distribution layer, and a distribution layer is where you'd expect the largest ARP tables to be. If these were being used as a "core" router, then I'd agree, it should be all Layer 3 by that point.
    Karl Zimmerman - Founder & CEO of Steadfast
    VMware Virtual Data Center Platform

    karl @ steadfast.net - Sales/Support: 312-602-2689
    Cloud Hosting, Managed Dedicated Servers, Chicago Colocation, and New Jersey Colocation

  5. #5
    Join Date
    Jan 2010
    Posts
    308
    Quote Originally Posted by Ionity View Post
    Just curious.. why don't you architect your network in a way that eliminates these giant arp tables at the "core?"
    Since when is 6k ARP entries considered giant?

  6. #6
    Join Date
    Dec 2009
    Posts
    2,297
    Quote Originally Posted by KarlZimmer View Post
    6k is giant?

    And in a typical network configuration the MLX's could often be used in a distribution layer, and a distribution layer is where you'd expect the largest ARP tables to be. If these were being used as a "core" router, then I'd agree, it should be all Layer 3 by that point.
    No 6k isn't giant. But trying to find the upper limits and push up towards that turns into the wrong network architecture.
    REDUNDANT.COMEquinix Data Centers Performance Optimized Network
    Managed & Unmanaged
    • Servers • Colocation • Cloud • VEEAM
    sales@redundant.com

  7. #7
    Join Date
    Mar 2003
    Location
    Sioux Falls, SD
    Posts
    1,282
    Code:
    System Parameters      Default    Maximum    Current    Actual     Bootup     Revertible
    ip-arp                             8192       65536      8192       8192       8192       No
    The maximum is 65536 on this MLX8 with NI-MLX-MR management module
    James Cornman
    365 Data Centers - AS19151/AS29838
    Colocation • Network Connectivity • Managed Infrastructure Services

  8. #8
    Quote Originally Posted by amc-james View Post
    Code:
    System Parameters      Default    Maximum    Current    Actual     Bootup     Revertible
    ip-arp                             8192       65536      8192       8192       8192       No
    The maximum is 65536 on this MLX8 with NI-MLX-MR management module
    The maximum is also 65k on the FESX448, but it falls over dead from cpu exhaustion around 4500. We're setting up a lab to hammer IP usage on the device to see how it holds up to a 20k+ arp table, which is what we're going to need for this device to make sense anywhere in our network -- core or distribution level.
    IOFLOOD.com -- We Love Servers
    Phoenix, AZ Dedicated Servers in under an hour
    ★ Ryzen 9: 7950x3D ★ Dual E5-2680v4 Xeon ★
    Contact Us: sales@ioflood.com

  9. #9
    Join Date
    Aug 2004
    Location
    Kauai, Hawaii
    Posts
    3,799
    Quote Originally Posted by funkywizard View Post
    The maximum is also 65k on the FESX448, but it falls over dead from cpu exhaustion around 4500. We're setting up a lab to hammer IP usage on the device to see how it holds up to a 20k+ arp table, which is what we're going to need for this device to make sense anywhere in our network -- core or distribution level.
    I have 14 ARP entires on my distribution switches (uplinks to each other and uplinks to routers). I believe we work with a few more IPs than you. As Alec pointed out, I also suspect you're doing something very wrong with your network design. Customer VLANs can be established on the edge switches and bridged to other devices if needed for portability etc.

  10. #10
    Quote Originally Posted by gordonrp View Post
    Customer VLANs can be established on the edge switches and bridged to other devices if needed for portability etc.
    The problem being that if you want portability, and you have the gateway live on the edge switch, then you have a lot of network traffic ping-pong on the network.

    So you have two servers, and they're for VPS hosting providers, so they want to share IPs, say, 2 /26's. You put the gateway on the top of rack switch connected to one server, the other server is on a different top of rack switch, so any time the other server has to reach it's gateway, it has to go:

    Server2 -> TOR2 -> distribution1 -> core -> distribution2 -> TOR1 -> distribution2 -> core -> internet

    That's not very efficient, and, the number of single points of failure there is insane unless you want to trust spanning tree protocol for layer2 failover, and if you get a ddos to server 2, that traffic ping-pongs between two separate top of rack switches and two distribution switches, doubling the traffic along the way AND affecting twice as much of your network.

    Obviously it's preferable to do your routing on the distribution or core layer IF you need portability, but obviously it's undesireable to put a lot of load onto your distribution or core if you can help it. At this point, you're right, your network is a lot larger than ours, so solutions that will work for us will not necessarily work for you. The question becomes, will the MLX support a 20k ARP table (meaning we can get by with all the gateways living on what will currently be the core device, and, in the future, what will eventually be our distribution device), or, is the max reasonable arp table for an MLX more in the 10k range, in which case we have no choice but to have our gateways "live" on the top of rack switches?

    We're going to do some benchmarking to see how the device holds up with a large arp table before we put it into production, but of course there's no substitute for real world experience if anyone has any to share.
    IOFLOOD.com -- We Love Servers
    Phoenix, AZ Dedicated Servers in under an hour
    ★ Ryzen 9: 7950x3D ★ Dual E5-2680v4 Xeon ★
    Contact Us: sales@ioflood.com

  11. #11
    Join Date
    Aug 2004
    Location
    Kauai, Hawaii
    Posts
    3,799
    Interconnect switches as well as going to distribution using ospf, or ibgp. No ping pong needed. You then also end up with a more redundant network as if optic from switch1 to distribution/core fails (pretending you're not doing lags), then you also have another route from switch1 via switch2 to distribution/code. Regardless, I doubt all your customers need vlan/ip bridging. For us it has to be less than 5% of customers. Get your vlans off your core = sleep more
    Last edited by gordonrp; 10-23-2013 at 08:37 PM.

  12. #12
    Quote Originally Posted by gordonrp View Post
    Interconnect switches as well as going to distribution using ospf, or ibgp. No ping pong needed. You then also end up with a more redundant network as if optic from switch1 to distribution/core fails (pretending you're not doing lags), then you also have another route from switch1 via switch2 to distribution/code. Regardless, I doubt all your customers need vlan/ip bridging. For us it has to be less than 5% of customers. Get your vlans off your core = sleep more
    Each server needs to be able to talk to the default gateway for the IP block it's trying to use. It needs to talk to that default gateway on layer 2, sending traffic to that gateway's mac address. If you want customers on two different switches to be able to share the same IP block, you need a layer 2 path between the two switches, because the gateway lives on one switch or the other, and not both, but the IP use is on both. This means you can't take advantage of layer 3 redundancy protocols like ospf / ibgp you've suggested.

    You cannot have more than 1 layer2 path between two devices or you will have all manner of issues (broadcast storms, mac addresses appearing on two ports on the same switch, so you have mac table flip flopping and brownouts, the list goes on). So no, you can't just cross connect the switches all to each other *and* to the distribution layer and expect it all to work and avoid ping ponging the traffic.

    If you want IP portability between servers on different top of rack switches, you either need to put up with traffic ping pong, or you have to put your gateways / VE's at the distribution or core layer. You really can't have it both ways.

    edit: as to IP bridging / sharing, we target VPS hosting providers so a higher percentage of our customers need this. More important than the percentage of customers who need this, is the percentage of IP addresses that need this, since the VPS hosts use dramatically more IPs than any other kind of customer. If I took all the customers with only one server and put their VE's on their top of rack, I'd be lucky if that accounted for a third of our IP address usage.
    IOFLOOD.com -- We Love Servers
    Phoenix, AZ Dedicated Servers in under an hour
    ★ Ryzen 9: 7950x3D ★ Dual E5-2680v4 Xeon ★
    Contact Us: sales@ioflood.com

  13. #13
    Join Date
    Oct 2002
    Location
    Vancouver, B.C.
    Posts
    2,699
    Either
    1) Just connect servers that need to share subnets to the same TOR switch. If you can't have the servers in the same rack, just run a cable over from another cabinet. At the very least, put them on TOR switches going to the same distribution switch.
    or
    2) Don't allow sharing subnets across multiple VPS hosting servers.
    or
    3) Spend the $$$ on a network architecture that can handle traffic going back and forth, and that doesn't have such single points of failure. Efficiency isn't as much of a concern if you have massive capacity well beyond what you run at.

    Personally, I'd advocate running layer3 at the access layer when you can, and at the distribution layer in the cases where you have to deal with sharing subnets and such. Having all your layer3 at the core is not very scalable, as you've already discovered. There's a reason why most new switches are layer 3 cable.
    ASTUTE INTERNET: Advanced, customized, and scalable solutions with AS54527 Premium Performance and Canadian Optimized Network (Level3, Shaw, CogecoPeer1, GTT/Tinet),
    AS63213 Cost Effective High Performance Network (Cogent, HE, GTT/Tinet)
    Dedicated Hosting, Colo, Bandwidth, and Fiber out of Vancouver, Seattle, LA, Toronto, NYC, and Miami

  14. #14
    Join Date
    Oct 2002
    Location
    Vancouver, B.C.
    Posts
    2,699
    Quote Originally Posted by funkywizard View Post
    You cannot have more than 1 layer2 path between two devices or you will have all manner of issues (broadcast storms, mac addresses appearing on two ports on the same switch, so you have mac table flip flopping and brownouts, the list goes on). So no, you can't just cross connect the switches all to each other *and* to the distribution layer and expect it all to work and avoid ping ponging the traffic.
    There's something called spanning tree, which actually works quite well when you do it right.
    ASTUTE INTERNET: Advanced, customized, and scalable solutions with AS54527 Premium Performance and Canadian Optimized Network (Level3, Shaw, CogecoPeer1, GTT/Tinet),
    AS63213 Cost Effective High Performance Network (Cogent, HE, GTT/Tinet)
    Dedicated Hosting, Colo, Bandwidth, and Fiber out of Vancouver, Seattle, LA, Toronto, NYC, and Miami

  15. #15
    Quote Originally Posted by hhw View Post
    There's something called spanning tree, which actually works quite well when you do it right.
    Spanning tree disables one path or the other. There would be two layer2 paths in terms of, if one failed the other could take over. But two would not be active simultaneously, and you'd still have traffic ping pong if the gateway were on a TOR that was not the same TOR that the server needing that gateway was located on. And that's assuming STP is even working correctly. I honestly don't trust STP.
    IOFLOOD.com -- We Love Servers
    Phoenix, AZ Dedicated Servers in under an hour
    ★ Ryzen 9: 7950x3D ★ Dual E5-2680v4 Xeon ★
    Contact Us: sales@ioflood.com

  16. #16
    Join Date
    Aug 2004
    Location
    Kauai, Hawaii
    Posts
    3,799
    hehe, I'm not going to keep beating a dead horse Good luck.

  17. #17
    Quote Originally Posted by hhw View Post
    Either
    1) Just connect servers that need to share subnets to the same TOR switch. If you can't have the servers in the same rack, just run a cable over from another cabinet. At the very least, put them on TOR switches going to the same distribution switch.
    or
    2) Don't allow sharing subnets across multiple VPS hosting servers.
    or
    3) Spend the $$$ on a network architecture that can handle traffic going back and forth, and that doesn't have such single points of failure. Efficiency isn't as much of a concern if you have massive capacity well beyond what you run at.

    Personally, I'd advocate running layer3 at the access layer when you can, and at the distribution layer in the cases where you have to deal with sharing subnets and such. Having all your layer3 at the core is not very scalable, as you've already discovered. There's a reason why most new switches are layer 3 cable.
    I agree these are the 3 possibilities you can work with.

    Option 1) leads to a ton of cables going all over the place, or needing to move servers around depending on cancellations and orders from existing customers. We've seriously considered this option, and may need to rely on it at least to some degree.

    Option 2) is clearly the popular option for hosting providers larger than us, who are facing the same problems we're currently facing. We call this option "telling the customer to F themselves". Naturally we consider that a last resort.

    Option 3 sounds good in theory, yes, you can avoid the bandwidth considerations of the ping ponging if you just throw bandwidth at the problem. But the underlying issue that you magnify the consequences of network failures is still there. You make the network less reliable if traffic is traveling back and forth between devices that it doesn't need to. This is especially true at layer2, where the only failover protocols you have to work with is spanning tree, which is honestly a terrible protocol.

    The only other option is having a shared device that can handle all of the ARP for a group of customers, and that group of customers is physically connected to that device. So you could say, ok, I put the ARP at distribution and I have a distribution device with 20k ARP entries, and if you want to share IPs you need to be physically connected to the same distribution device.

    That's not really much different than the "must share top of rack" solution except that the capacity of the distribution device needs to be proportionally larger. It does have the advantage of being more realistic. Making sure all of a customer's servers are on a network segment supporting 500 servers is a lot more straightforward than making sure all of a given customer's servers are on a network segment supporting 40 servers.

    The original question still stands: will the MLX support a 20k arp, and therefore allow us to have a "segment" 500 servers large, or do we have to make a really stupid tradeoff of having each segment of our network support a maximum of 40 servers, which in turn means it's not practical to allow customers to share IPs between servers?
    Last edited by funkywizard; 10-23-2013 at 09:17 PM.
    IOFLOOD.com -- We Love Servers
    Phoenix, AZ Dedicated Servers in under an hour
    ★ Ryzen 9: 7950x3D ★ Dual E5-2680v4 Xeon ★
    Contact Us: sales@ioflood.com

  18. #18
    Quote Originally Posted by gordonrp View Post
    hehe, I'm not going to keep beating a dead horse Good luck.
    Well, it's a tough technical problem, otherwise there would be no point in vigorous debate. Obviously there's no simple solution that gives you everything you would possibly want, so we're working out which tradeoffs make the most sense. If the MLX supports a 20k arp table, putting all the gateways on that device is the correct tradeoff for us. If it doesn't, some other configuration would make more sense. I wouldn't call that discussion to be beating a dead horse.
    IOFLOOD.com -- We Love Servers
    Phoenix, AZ Dedicated Servers in under an hour
    ★ Ryzen 9: 7950x3D ★ Dual E5-2680v4 Xeon ★
    Contact Us: sales@ioflood.com

  19. #19
    Join Date
    Dec 2009
    Posts
    2,297
    I would think potential customers would consider some of these discussions red flags??

    Anyhow...

    You know you could always run layer3 to the access switches, and in the scenario where a customer requires IP portability run those as layer2 VLAN's back to the "core" device.
    REDUNDANT.COMEquinix Data Centers Performance Optimized Network
    Managed & Unmanaged
    • Servers • Colocation • Cloud • VEEAM
    sales@redundant.com

  20. #20
    Quote Originally Posted by Ionity View Post
    I would think potential customers would consider some of these discussions red flags??

    Anyhow...

    You know you could always run layer3 to the access switches, and in the scenario where a customer requires IP portability run those as layer2 VLAN's back to the "core" device.
    If a customer chooses us because they have the wrong idea about who we are and what we offer, we don't deserve that business. I think its more honest and transparent to be willing to talk in public about these things. I'm sure other people have worse networks than us, and not talking about their network in public doesn't make their network better than it is.

    As to the second suggestion, that's certainly a valid strategy, but how do we find out who needs this feature? Customers appreciate when things just work, without having to answer questions they don'tunderstand or ask for things they dont know they need. So that really means assigning everything to core except customers with only one server. That doesn't offload a lot of ip space so you have much the same problem still.

    Could potentially monitor the network to see who is actually sharing Ips and who isn't, and for people who are, move the gateway to the core, and for people who don't, leave the gateway on the edge. That would get a bit more Ips off the core, but even then the core does need a bit of oomph regardless, so the problem itself is the same, just when you have the problem is at a higher level of use.
    IOFLOOD.com -- We Love Servers
    Phoenix, AZ Dedicated Servers in under an hour
    ★ Ryzen 9: 7950x3D ★ Dual E5-2680v4 Xeon ★
    Contact Us: sales@ioflood.com

  21. #21
    Join Date
    Oct 2009
    Location
    Canada
    Posts
    485
    This would be easier to do if the current infrastructure you have not wasn't already in place, but I tend to always look at the practical solution to things which doesn't always need a flashy technical solution...

    Generally speaking (and this may be different depending on your customer base) there are always certain customers who grow quickly and need more nodes right away, and some customers who buy a bunch too fast and reduce their amount of servers.

    If you were to leave 5-8 U space per rack, a simple approach would be to just physically move nodes from rack to rack and group them together physically per customer group. Assuming all your nodes are the same chassis type and they're all tagged with some form of serial number, this would allow for layer3 on the TOR switches as much as possible.

    Most customers also recognize the added redundancy they can get if they will then be presented with an option to buy clusters in two completely separate network segments as they grow as well. So it can be a double positive.

    Never forget the elbow grease solution. I'm sure it'a gotten everyone out of a bind here more than once.

    Going back to your point about not trusting spanning tree... I'm not sure if I would be sleeping easy knowing that if one core router bricked itself over night, it would have a diverse impact vs having each switch route upwards through two distinct paths with only L3 traffic to worry about...
    █ Pentester & IT Security Consultant

  22. #22
    Join Date
    Dec 2009
    Posts
    2,297
    Quote Originally Posted by funkywizard View Post
    If a customer chooses us because they have the wrong idea about who we are and what we offer, we don't deserve that business. I think its more honest and transparent to be willing to talk in public about these things. I'm sure other people have worse networks than us, and not talking about their network in public doesn't make their network better than it is.

    As to the second suggestion, that's certainly a valid strategy, but how do we find out who needs this feature? Customers appreciate when things just work, without having to answer questions they don'tunderstand or ask for things they dont know they need. So that really means assigning everything to core except customers with only one server. That doesn't offload a lot of ip space so you have much the same problem still.

    Could potentially monitor the network to see who is actually sharing Ips and who isn't, and for people who are, move the gateway to the core, and for people who don't, leave the gateway on the edge. That would get a bit more Ips off the core, but even then the core does need a bit of oomph regardless, so the problem itself is the same, just when you have the problem is at a higher level of use.
    If they have a single server then it is a no brainer. If they have multiple servers ask them if they want the IP's to be portable between servers? How hard is that of a question to ask?
    REDUNDANT.COMEquinix Data Centers Performance Optimized Network
    Managed & Unmanaged
    • Servers • Colocation • Cloud • VEEAM
    sales@redundant.com

  23. #23
    Quote Originally Posted by media-hosts_com View Post
    This would be easier to do if the current infrastructure you have not wasn't already in place, but I tend to always look at the practical solution to things which doesn't always need a flashy technical solution...

    Generally speaking (and this may be different depending on your customer base) there are always certain customers who grow quickly and need more nodes right away, and some customers who buy a bunch too fast and reduce their amount of servers.

    If you were to leave 5-8 U space per rack, a simple approach would be to just physically move nodes from rack to rack and group them together physically per customer group. Assuming all your nodes are the same chassis type and they're all tagged with some form of serial number, this would allow for layer3 on the TOR switches as much as possible.

    Most customers also recognize the added redundancy they can get if they will then be presented with an option to buy clusters in two completely separate network segments as they grow as well. So it can be a double positive.

    Never forget the elbow grease solution. I'm sure it'a gotten everyone out of a bind here more than once.

    Going back to your point about not trusting spanning tree... I'm not sure if I would be sleeping easy knowing that if one core router bricked itself over night, it would have a diverse impact vs having each switch route upwards through two distinct paths with only L3 traffic to worry about...
    As to the idea of leaving space spare in each rack, that's obviously an expensive solution to the problem, both in terms of labor to move things around all the time, and in terms of the spare space. Hence if the layer 3 was on a distribution level, say, every 10 racks or so, it's pretty easy to have 5u free out of 10 racks without even trying, just based on pace of cancellations, so there's no need to change cabling, and rarely would you need to move servers around. On a per-rack level, you're only dealing with 40 servers, so the amount of slack capacity you'd have to leave everywhere would get expensive, and the workload needed to continuously balance everything to make sure there's sufficient capacity for new orders without wasting space, is also too expensive.

    As to the impacts of failures and diversity and failover, STP is certainly less reliable than a routing protocol. A routing protocol will fail over the link if the full path from A->B is not working in some way. Something like spanning tree will only fail over from one link to another if a specific link has been detected as having failed, which is much less robust. There's plenty of horror stories out there about STP not working properly, or even being the cause of network failures in the first place. For now we have a "warm spare" mentality. Given that all of the failures we've seen to date have been issues with configurations or platforms or ddos or some other problem, and not with physical failures, we've never seen a problem that having spare physical hardware would have solved. Obviously as our network gets bigger, these low probability events have a much greater chance of happening, and we plan to plan our network accordingly with increasing redundancy as appropriate. Unfortunately, portable IPs and redundancy are are mutually exclusive with each other with the most effective means of redundancy which is with routing protocols like ospf and ibgp. You could possibly get by with VRRP and STP, but I think the added complexity is just as likely to cause a failure as the kinds of issues those technologies are designed to protect against.
    IOFLOOD.com -- We Love Servers
    Phoenix, AZ Dedicated Servers in under an hour
    ★ Ryzen 9: 7950x3D ★ Dual E5-2680v4 Xeon ★
    Contact Us: sales@ioflood.com

  24. #24
    Quote Originally Posted by Ionity View Post
    If they have a single server then it is a no brainer. If they have multiple servers ask them if they want the IP's to be portable between servers? How hard is that of a question to ask?
    You'd be surprised what kind of simple questions many customers seem incapable of answering. Like, what OS do you want? Well it's on the order form right, anyone can answer that. Or if you want raid or not. How many IPs you want. It's all on the order form. But we have to go through each order we get, compare it to what that customer ordered in the past, and if it differs from what they normally order, or if the combination of specs differ from what customers normally want, ask them if they ordered what they intended to receive. You'd be surprised how often it doesn't match up. Even then, even when we ask, we might get no reply, provision the server as they ordered it, and then they'll come back to indicate that they didn't get what they wanted, because their order form didn't reflect what they wanted. It happens every day.

    Our point of view is, if we can find a way to get the customer the correct result without relying on them to give us the correct answer to a question, we need to do it that way. If we don't take that point of view about things, they won't get the right result, and they'll either be disappointed (but never complain), or they'll ask us to fix it (even though they got what they asked for in the first place). Correct-by-default is the only way to do it if you want things to run smoothly.
    IOFLOOD.com -- We Love Servers
    Phoenix, AZ Dedicated Servers in under an hour
    ★ Ryzen 9: 7950x3D ★ Dual E5-2680v4 Xeon ★
    Contact Us: sales@ioflood.com

  25. #25
    Join Date
    Oct 2009
    Location
    Canada
    Posts
    485
    Quote Originally Posted by funkywizard View Post
    As to the idea of leaving space spare in each rack, that's obviously an expensive solution to the problem, both in terms of labor to move things around all the time, and in terms of the spare space. Hence if the layer 3 was on a distribution level, say, every 10 racks or so, it's pretty easy to have 5u free out of 10 racks without even trying, just based on pace of cancellations, so there's no need to change cabling, and rarely would you need to move servers around. On a per-rack level, you're only dealing with 40 servers, so the amount of slack capacity you'd have to leave everywhere would get expensive, and the workload needed to continuously balance everything to make sure there's sufficient capacity for new orders without wasting space, is also too expensive.

    As to the impacts of failures and diversity and failover, STP is certainly less reliable than a routing protocol. A routing protocol will fail over the link if the full path from A->B is not working in some way. Something like spanning tree will only fail over from one link to another if a specific link has been detected as having failed, which is much less robust. There's plenty of horror stories out there about STP not working properly, or even being the cause of network failures in the first place. For now we have a "warm spare" mentality. Given that all of the failures we've seen to date have been issues with configurations or platforms or ddos or some other problem, and not with physical failures, we've never seen a problem that having spare physical hardware would have solved. Obviously as our network gets bigger, these low probability events have a much greater chance of happening, and we plan to plan our network accordingly with increasing redundancy as appropriate. Unfortunately, portable IPs and redundancy are are mutually exclusive with each other with the most effective means of redundancy which is with routing protocols like ospf and ibgp. You could possibly get by with VRRP and STP, but I think the added complexity is just as likely to cause a failure as the kinds of issues those technologies are designed to protect against.
    100% in agreement to majority of these points.

    But if you have a 42U rack, with 2x 1U switches and 40 servers at 1U per, it doesn't give much options with regards to flexibility.

    Having a few extra U's available over a series of racks and some servers sitting idle (even though it doesn't bring in money) can be a good thing and is very heard to measure from a monetary perspective.

    For example:
    Having extra backup chassis per rack makes it easy to migrate a customer if their hardware fails. Fixing this problem quickly because there's physical space in the rack with room to work and or spare/idle servers in the rack (within reason) have a direct correlation to your reputation because the less waiting time the client has, the more likely they are to refer you. It's like an advertising budget on your colocation expense line of your income statement.

    Saying each rack absolutely must be 100% fully utilized from a cost savings perspective can sometimes cost more down the road by not being able to acheive the above result in a reasonable amount of time due to constraints (physical, inventory etc...).

    The only other way to do it and fully utilize 100% of all racks all the time with excessive capacity is to get a solution like qfabric deployed (which costs boatloads of money). Or get a big expensive chassis based redundant core switch that everything plugs in to like an EX8k series with 100k+ ARP entries and redundant re's, psu's etc.. and re-architect based on these needs. Then you can be 100% certain ARP's won't be the issue at hand.

    Best you can do is give your clients the best value for the price they pay. Better do it right and support growth now vs doing it in stages where you're constantly buying hardware to keep up.
    █ Pentester & IT Security Consultant

Page 1 of 4 1234 LastLast

Similar Threads

  1. Foundry/Brocade MLX RX and FGS switches available
    By serverhosts in forum Web Hosting Hardware
    Replies: 8
    Last Post: 08-12-2013, 05:24 AM
  2. Chassis 3.5 vs 2.5
    By iNetX in forum Colocation, Data Centers, IP Space and Networks
    Replies: 2
    Last Post: 07-30-2012, 11:25 AM
  3. NetIron XMR Series vs NetIron MLX Series vs Cisco 6509
    By KyleLC23 in forum Colocation, Data Centers, IP Space and Networks
    Replies: 24
    Last Post: 08-30-2010, 10:08 PM
  4. Replies: 2
    Last Post: 04-11-2009, 07:06 PM
  5. Plesk - Hard Quota Limits - Bug!! No limits for resellers!
    By SupermanInNY in forum Hosting Software and Control Panels
    Replies: 0
    Last Post: 07-28-2005, 03:25 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •