# [Workaround implemented] dhcpd overload  (aka, Stupid Things

## arithon-kelis

So, I've got this system acting as a router and DHCP server for 96 different subnets (scavenged Cisco hardware w/ such an old CatOS version that it doesn't allow me to filter multicasting... VLANs and trunking are my friend at the moment!), but whenever I start the DHCP server (dhcp-3.0.1-r1), it consumes pretty much all of a single processor.  It will continue like this for at least two hours (that's when I got impatient and killed it).

At this particular moment, all I really care about is providing DHCP services for all 96 subnets... I'm just out of ideas for what could be taking so long.  Any ideas, recommendations or helpful snickers?   :Smile: 

The dhcpd.conf is essentially as follows:

(note:  This file is generated via script; I didn't feel like typing it all in.  Essentially, it's a loop where ${id} ranges from 100 to 196.)

```

lease-file-name "/var/db/dhcpd.leases";

ddns-update-style ad-hoc;

authoritative;

option domain-name-servers 10.0.0.1,10.0.0.2;

option time-servers 10.0.0.1,10.0.0.2;

option time-offset -8;

default-lease-time 21600;

max-lease-time 43200;

# vlan ${id}

subnet 10.${id}.0.0 netmask 255.255.0.0 {

range 10.${id}.1.0 10.${id}.255.253;

option routers 10.${id}.255.254;

}

```

{edit:  As a temporary measure, I'm now running 96 instances of udhcpd, but I'd still like to understand what causes the other DHCP daemon to just devour processor and memory like there's no tomorrow...  (OK, so it's not as if the system is exactly hurting for resources; it's a mostly unused dual proc system with four gigs of memory.)}Last edited by arithon-kelis on Thu Oct 20, 2005 8:00 am; edited 1 time in total

----------

## adaptr

Filter multicasting ? What does that have to do with DHCP ?

Never mind - it's probably memory related.

free output ? vmstats ? dhcpd logs ?

----------

## arithon-kelis

The two are related only peripherally - due to the nature of the lab, I need to prevent multicast packets from breaching subnets.  To do this, I need a later version of the Catalyst OS - which my switch doesn't support - or I need to break the switch down into a bunch of VLAN's and route between them (minimal traffic).  If I could filter multicast packets, I would be able to run everything under a single subnet.  Since I cannot filter multicast packets, I have to resort to the VLAN approach.  The downside is that now, instead of one subnet, I have 96 subnets that need a DHCP server.  (I can get away with this approach for now, since cross-VLAN traffic is, for all intents and purposes, nonexistent.)

Physically, this system has two interfaces in it: eth0, eth1.  

Logically, this system has 108 interfaces in it (two physical interfaces, 106 virtual interfaces):  eth0, eth1, vlan*

(OK, I know, I'm ignoring loopback for the moment.)

The maximum memory usage by the dhcpd process was ~250MB, noted about a minute before I terminated the process.  At no time did the unused system memory go below 3.3GB.  The dhcpd logs are equally unenlightening:

```

Aug  2 02:44:59 extra-crunchy dhcpd: Internet Systems Consortium DHCP Server V3.0.1

Aug  2 02:44:59 extra-crunchy dhcpd: Copyright 2004 Internet Systems Consortium.

Aug  2 02:44:59 extra-crunchy dhcpd: All rights reserved.

Aug  2 02:44:59 extra-crunchy dhcpd: For info, please visit http://www.isc.org/sw/dhcp/

```

And that's it. *baffled*  It simply sits there, utilizing 99% of a single processor (system is a dual 3.2GHz Xeon), for hours.  No network activity comes out of the process.  As far as I can tell, not even any disk access aside from the initial loading.

----------

## adaptr

Since you are the only one potentially generating multicast packets in the first place, why not use different IGMP groups to separate the "subnets" instead of flooding the entire network with multicast ?

Well, it looks like it's time for debug/strace...

----------

## arithon-kelis

 *adaptr wrote:*   

> Since you are the only one potentially generating multicast packets in the first place, why not use different IGMP groups to separate the "subnets" instead of flooding the entire network with multicast ?
> 
> Well, it looks like it's time for debug/strace...

 

True - nearly all of the systems connected to this switch generate multicast traffic, all for the same multicast group.  I can't change the multicast group, unfortunately (requirement of the end.  The point is to limit the internal distribution of multicast packets; this is essentially a two-layer star topology - main switch connects to individual switchs, individual switches connect to multiple systems.  One of the network requirements require each node sharing a multicast group - but each node must not "contaminate" any other node.

I was rather hoping that this was a known issue (or that I was just doing something stupid).. Time for me to debug is going to be hard to find.   :Sad:   I'll let you know what I find...

----------

## adaptr

How would distributing a single multicast group over N switches contaminate anything ?

Multicast traffic is purely one-to-many, isn't it ?

If it isn't - all the more reason to use multiple groups.

----------

## arithon-kelis

 *adaptr wrote:*   

> How would distributing a single multicast group over N switches contaminate anything?
> 
> Multicast traffic is purely one-to-many, isn't it ?
> 
> If it isn't - all the more reason to use multiple groups.

 

I should have mentioned this earlier, but didn't feel it relevant to the dhcpd issue at hand:  I have no control over the multicast group used.  It is the same multicast group for every single station.  However, individual "clusters" of machines must not communicate with any other cluster.  (And yes, this is "broken" functionality.)  Thus, I must either perform some kind of multicast filtering at the root switch level, or I must break the network into the appropriate number of subnets.

That being said, I just left dhcpd running for over 24 hours.  It was still consuming 100% of one CPU.

----------

## adaptr

Assuming that more than one of these machines runs Linux, wouldn't it be more efficient to keep the subnets in place but delegate dhcp to one machine on each subnet, or over a few subnets, instead of using one machine for all of them?

I am curious how you are routing between these 96 subnets with nothing but switches...

----------

## arithon-kelis

 *adaptr wrote:*   

> Assuming that more than one of these machines runs Linux, wouldn't it be more efficient to keep the subnets in place but delegate dhcp to one machine on each subnet, or over a few subnets, instead of using one machine for all of them?
> 
> I am curious how you are routing between these 96 subnets with nothing but switches...

 

O.o

Not entirely certain how you got that one, but the switch is just being a normal switch.  Nothing special there.

Perhaps I need to draw a diagram...

```

annotation:

single-lines denote a regular connection.

double-lines denote an 802.1q trunked connection.

                                                 ,-(unmanaged computer)

         (file server)=.\      ,(unmanaged switch)-(unmanaged computer)

                       ||     /                  '-(unmanaged computer)

         (file server)=++   (vlan1)

                       ||   /                              ,-(unmanaged computer)

(uplink)-(dual xeon)=(Cisco 5500)-(vlan2)-(unmanaged switch)-(unmanaged computer)

                            \                              '-(unmanaged computer)

                            (vlan<n>)

                              \                  ,-(unmanaged computer)

                               '(unmanaged switch)-(unmanaged computer)

                                                 '-(unmanaged computer)

That, of course, is greatly simplified, but should convey the right idea.

```

Essentially, I can control the Cisco 5500 and anything "before" it.  The dual xeon box handles what little inter-vlan routing there may be.  (There is effectively no reason for any vlan to talk to another vlan.)

That being said, since the dual xeon box is already handling routing, it makes the most sense to use it as the DHCP server.  The DHCP, firewall, routing and VLAN configuration files can easily be automatically generated, simplifying administration.

Back to the original topic:

I still see no reason for dhcpd to utilize 100% of one processor for 24 hours.

----------

## makmortiv

After reading this thread...I sorta started to get the idea of what you were trying to accomplish (and I can understand the Cat5500 being a beast...they really are).

I have to ask...what kinda of overhead are 96 instances of udhcpd taking?  if it's nowhere near what dhcpd was taking...why not just use that instead?  Sure you'll have to manage the 96 vlan configs seperately...but considering the alternative of having dhcpd slamming the machine for 48+ hours...that sounded like a quicker easier fix. 

Now the assumptions about multicast packets are correct they are point-to-multipoint (in terms of how cisco sees it), but also means that since you're using vlans to pair off all the segments you shouldn't need to use any subnetting (though it was a greatly over paranoid idea  :Wink:  ).  As you know vlans are filtered on the data link layer...so it's not even getting the chance to propagating further up to other OSI layers at the switch. And speaking of vlans...I did notice you started at vlan "1"....as in the public everyone gets to see this vlan and is normally used for management...have you considered using something a little higher...say 100-196?   :Very Happy: 

So ultimately the configs for the port going to the dual xeon box should have vlans 1 and 100 thru 196...and each port having a vlan of it's own plus vlan 1. If you are not using managed switches to feed the workgroups...totally bypass my next sentence. If you are using managed switches, again from the 2nd tier switch (which feeds the workgroup node) have vlans 1 and the respective vlan on the port uplinking to the 5500 and only the workgroup vlan on ports heading to the workgroup.

Hope that helped.  :Smile: 

----------

## saturas

i think i have a better ideea but i never tried it on linux. 

i would try to migrate the dhcp server on a linux box and configure an interface (on linux)for multiple vlans(trunk). then plug a cable from a trunk port on the sw to this interface. you should run this dhcp server on this interface so that this interface would give dhcpservice to all your subnets(vlans), without propagating broadcast through the other segments.

let me know if you try this

----------

## arithon-kelis

 *saturas wrote:*   

> i think i have a better ideea but i never tried it on linux. 
> 
> i would try to migrate the dhcp server on a linux box and configure an interface (on linux)for multiple vlans(trunk). then plug a cable from a trunk port on the sw to this interface. you should run this dhcp server on this interface so that this interface would give dhcpservice to all your subnets(vlans), without propagating broadcast through the other segments.
> 
> let me know if you try this

 

That's essentially how it's currently set up.  Server is trunked (via gigabit now, woo!) to switch.  However, each VLAN is a different subinterface of the NIC, so it doesn't seem possible (nor particularly desirable at this point in time) to have a single DHCP scope for all the VLAN's.  I've ended up just invoking 96 instances of udhcpd; seems to work perfectly for what I need.

----------

## arithon-kelis

 *makmortiv wrote:*   

> After reading this thread...I sorta started to get the idea of what you were trying to accomplish (and I can understand the Cat5500 being a beast...they really are).
> 
> {snip}
> 
> Hope that helped. 

 

Actually, in an odd way, it did help.  I just now got a notice of the replies, but it was good to see that people had some good ideas.  

Since my initial post, I've changed a few things up.  I've trimmed down a few of the "userland" vlans (actually, about half of 'em) and threw in a gigabit fiber blade for the servers.  "userland" VLAN's are configured from 100-147.  VLAN 1 is currently not even used (console connection to the switch from the server, just for good measure).

After experimenting with a couple different network subnetting/non-subnetting configurations, I've stuck with the multiple subnets.  Some of the DHCP clients don't play by the rules and that confuses the @#(*$ out of the NAT...

So, in short... Workaround is to use udhcpd; the overhead (in my case) is negligable.

----------

## Chris W

I tried your example config and saw exactly the same CPU/RAM issue.   However, I found this was related to the number of distinct IP addresses in the pools.   You have close to 65000 in each subnet.  When I allocated only 250 address for each subnet it all worked respectably.   Do you really need 65000x100 addresses?

----------

## arithon-kelis

 *Chris W wrote:*   

> I tried your example config and saw exactly the same CPU/RAM issue.   However, I found this was related to the number of distinct IP addresses in the pools.   You have close to 65000 in each subnet.  When I allocated only 250 address for each subnet it all worked respectably.   Do you really need 65000x100 addresses?

 

Interesting.  I was just being lazy, to be honest - need more than a /24 in a couple of them, so I just allocated everything at /16.  *curious* Doesn't make sense to me as to why it would eat CPU, though.  I'll mess with that on a secondary system and see what I come up with.

----------

## egberts

In my testing of various commercial and Linux routers... I've seen the CPU utilization spikes up under heavy multicast traffic.  Primary reason for this spike is packet-replication (memory copy, memory copy, memory copy).

The best way to fix this is to do this at hardware-level.  Some example of software/hardware fixes are:

1.  4 ethernet ports per PCI card.  You can program these quad-cards to do switching then (best method: CPU-offloading)

2.  Bump the packet's Use-count, transmit one, yank it back, transmit port two, yank it back, transmit port-three.  (Penalty: Higher port has higher latency).

3.  hardware support for multi-port memory copy operation.  (Best latency)

4. Scatter-gather drivers (uses two mallocs per packets, and still do above #2 for the smaller malloc containing the headers, bigger malloc stays the same - except for postpending a changing CRC32 values, if applicable).

----------

## Chris W

 *arithon-kelis wrote:*   

> Interesting.  I was just being lazy, to be honest - need more than a /24 in a couple of them, so I just allocated everything at /16.  *curious* Doesn't make sense to me as to why it would eat CPU, though.  I'll mess with that on a secondary system and see what I come up with.

 

All the CPU load is during initialisation, not operation.  I surmise that it is allocating a small data structure for each address.  You are asking for 6.5 million, which in the worst case will be 6.5 million small mallocs.   Not inclined to track through the code to be sure though.

----------

