# Previously working network just...  borked...  strangely...

## Pobtastic

Okay I've had this Gentoo server running flawlessly for about five years now and it's very rarely given me any problems which I couldn't fix myself...  This time...  I'm stumped...  I figured at first the problem must have been a badly configured kernel, but I booted back into the old kernel and the problem persists?

The problem is there's no name resolution taking place, I can ping IPs just fine, internal and external, but I can't for instance - ping www.gentoo.org  - unknown host...  I can;  ping 209.177.148.228 no problem (gentoo.org's IP address)...

```
/etc/resolv.conf

# Generated by net-scripts for interface lo

domain mshome

nameserver 192.168.1.1
```

192.168.1.1 is my router but I've also tried using the 'actual' DNS of my ISP with no luck.

```
lspci

00:00.0 Host bridge: Intel Corporation 82810E DC-133 GMCH [Graphics Memory Controller Hub] (rev 03)

00:01.0 VGA compatible controller: Intel Corporation 82810E DC-133 CGC [Chipset Graphics Controller] (rev 03)

00:1e.0 PCI bridge: Intel Corporation 82801AA PCI Bridge (rev 02)

00:1f.0 ISA bridge: Intel Corporation 82801AA ISA Bridge (LPC) (rev 02)

00:1f.1 IDE interface: Intel Corporation 82801AA IDE Controller (rev 02)

00:1f.2 USB Controller: Intel Corporation 82801AA USB Controller (rev 02)

00:1f.3 SMBus: Intel Corporation 82801AA SMBus Controller (rev 02)

00:1f.5 Multimedia audio controller: Intel Corporation 82801AA AC'97 Audio Controller (rev 02)

01:0c.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
```

```
ifconfig

eth0      Link encap:Ethernet  HWaddr 00:B0:D0:D4:D8:18

          inet addr:192.168.1.98  Bcast:192.168.1.255  Mask:255.255.255.0

          inet6 addr: fe80::2b0:d0ff:fed4:d818/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:11751 errors:0 dropped:0 overruns:70 frame:0

          TX packets:12011 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:1860056 (1.7 Mb)  TX bytes:3746983 (3.5 Mb)

          Interrupt:5 Base address:0xc00

lo        Link encap:Local Loopback

          inet addr:127.0.0.1  Mask:255.0.0.0

          inet6 addr: ::1/128 Scope:Host

          UP LOOPBACK RUNNING  MTU:16436  Metric:1

          RX packets:3698 errors:0 dropped:0 overruns:0 frame:0

          TX packets:3698 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:1550783 (1.4 Mb)  TX bytes:1550783 (1.4 Mb)
```

```
dmesg | grep eth0

eth0:  setting full-duplex.

eth0: no IPv6 routers present
```

I've completely run out of ideas...  Please help!  Incidentally, I have two servers on the same network (one for web, one for mail) and both are identical, one is an rsync server and both are updated exactly the same (so both have the same versions of everything installed)...  One works...  One doesn't...

Thanks,

Pobster

----------

## Xanadu

 *Pobtastic wrote:*   

> I have two servers on the same network (one for web, one for mail) and both are identical, one is an rsync server and both are updated exactly the same (so both have the same versions of everything installed)...  One works...  One doesn't...

 

Well, the very first thing I think is most ISPs these days don't like their customers running their own services.  I'm only assuming that you're on BT (but I don't know enough about the ISPs on your side of The Pond  :Smile:  ).  Has BT recently made any changes to their Customers Running Servers part of their TOS?  

Also, which of the two machines running web services IS ABLE to resolve names?  The one running mail or the one running apache (I assume)?  

It may just be my paranoid American mind, but, I can see an ISP stop allowing certain machines to stop being able to resolve names in a way to get a message to you to turn the stuff off AND/OR so the service won't work and thus be less of a "risk" in their netblock.

I dunno, though.  I'm a little bit just guess here, but...

M.

----------

## Pobtastic

Hiya, thanks for replying.  It's the mail server which can't resolve names but I don't think it's anything to do with which servers I'm running as they've both got apache and postfix configured and running so I can switch between servers when one or the other goes down.  Don't forget, ping works fine on one but not on the other...  So I'm thinking it's unlikely to be either ISP or server related as they're both sat next to each other on my router - it's got to be some stupid configuration thing that I've messed up...  I just can't find where...

Pobster

edit:  I use Nildram btw, but I'm sure the issue isn't with them seeing as one server is fine?

----------

## Xanadu

 *Pobtastic wrote:*   

> Hiya, thanks for replying.  It's the mail server which can't resolve names but I don't think it's anything to do with which servers I'm running as they've both got apache and postfix configured and running so I can switch between servers when one or the other goes down.  Don't forget, ping works fine on one but not on the other...  So I'm thinking it's unlikely to be either ISP or server related as they're both sat next to each other on my router - it's got to be some stupid configuration thing that I've messed up...  I just can't find where...
> 
> Pobster
> 
> edit:  I use Nildram btw, but I'm sure the issue isn't with them seeing as one server is fine?

 

Are either/both out in a DMZ?  Or are all the services for internal use only (and explicitly denied from the Internet)?  But still, yes, you make a good point.  If you can resolve names on one machine and not another, then I'd have to agree (at face value), it seems your fingers typed something that your brain forgot about.  :Very Happy:   Can you traceroute or similar to your external DNS server(s) from both machines?  Can you "nmap -PU -p53 $ISP_DNS_SERVER"?  Don't worry about that nmap command, it send one and ONLY one UDP blip to port 53 (DNS) to one IP.  I don't see that would trigger anything saying you're trying to "hack" them. 

All that BS aside, and I'm just throughwing stuff out there now hoping that something I type may remind you of something you were playing with:

Were you playing with any DNS caching stuff?

Did you export some odd proxy stuff to your environment that you're not using any more?

Were you playing with iptables, had Gentoo save it, and forgot to take out some stuff related to DNS?  (just a note there, I have have this in my iptables rules: 

```
#Accept DNS crap *ONLY* from proper NameServer(s)

$IPT -A INPUT -i $LAN_IFACE -p udp -s $NAMESERVER1 --sport 53 -j ACCEPT

if { ! -z $NAMESERVER2 } ; then `$IPT -A INPUT -i $LAN_IFACE -p udp -s $NAMESERVER2 --sport 53 -j ACCEPT` ; fi
```

I define $NAMESERVER 1 & 2 at the top of my script:

```
NAMESERVER1=`grep nameserver /etc/resolv.conf | grep -n nameserver | grep "1:" | cut -d' ' -f2`

NAMESERVER2=`grep nameserver /etc/resolv.conf | grep -n nameserver | grep "2:" | cut -d' ' -f2`
```

Also, understand that I know not to use curly brackets in if statements, I just can't show that here with BBCode on...) So my machine(s) will accept DNS info *ONLY* from my router (one nameserver) or "proper" resolv.conf info if I'm on a more "normal" connection (two nameservers).  I'm only throwing it out there since I thought if you "hard-coded" IPs in a similar statement and something changed on your or their network...

Anyway, I'm now out of ideas.  I'll post more as I think of them, but I'll still continue to follow this as I'm curious too.

----------

## Pobtastic

Nope no DMZ set and iptables is actually off at the moment (I figured it a likely culprit, but it appears to be innocent this time!)  I unfortunately haven't been messing around with anything lately and I can tell that everything was fine only two days ago (Wednesday) - it was possibly fine yesterday as well, but it's hard to tell as I'm only judging that on the last email I sent (so DNS was working then)...

I unfortunately can't emerge nmap on the borked machine   :Sad:   I'm just popping out right now but I'm going to mess around with it again when I get back...  Weirdly I can't seem to use my NFS shares either at the moment either...  Which is strange as they appear to mount fine with no error messages?

Anyways...  Will take a look later, it's strange about the NFS...  As I can connect to the mail server fine and look at the emails present there...

Pobster

----------

## Xanadu

At a closer look at your original post, I noticed this:

```
/etc/resolv.conf 

 

 # Generated by net-scripts for interface lo

 domain mshome 

 nameserver 192.168.1.1
```

That says "interface lo".  Is that right?  Shouldn't that be eth0 or whatever?  Here's mine right now:

```
$ cat /etc/resolv.conf

# Generated by dhcpcd for interface eth1

search home

nameserver 192.168.1.1
```

(I'm behind a Verizon FiOS connection ATM with an Orinoco PCMCIA card and this laptop has a builtin 10/100 nic at eth0 thus eth1[/profile])

Are you purposely routing everything through lo for a reason (that I'm just not thinking of at the moment)?  Still, it should realize that (your) 192.168.1.1 is NOT your machine and send the traffic out the nic to the proper place.  What's your routing table look like?

----

As an aside,  I may not see you replies to this until later myself.  I'm putting the laptop to sleep and heading out also, but I will continue to watch this thread for more info and the like.

----------

## Pobtastic

Nah I think that "lo" is a red herring, it's always said that and the other (working) server says it too.  I think I read somewhere that to set your domainname you have to add dns_domain_lo="domain" to your /etc/conf.d/net file, it never worked for me, but it never made any difference either so I just left it there...

Right now I'm having trouble understanding why my ssh works fine, yet my ftp doesn't...  Both use IP addresses to connect, so...  Weird...

Anyways, both working and non-working servers show exactly the same;

```
route -n

Kernel IP routing table

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface

192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0

127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo

0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 eth0
```

Another dead end...  It's really odd that, ssh, postfix (incoming mail) and webmin are all working fine across the network/ internet, yet things like ntp, ftp, samba are unreachable...

Pobster

----------

## Pobtastic

Hmmm I found I had this already installed;

```
mii-diag -v

mii-diag.c:v2.11 3/21/2005 Donald Becker (becker@scyld.com)

 http://www.scyld.com/diag/index.html

Using the default interface 'eth0'.

  Using the new SIOCGMIIPHY value on PHY 24 (BMCR 0x3000).

 The autonegotiated capability is 01e0.

The autonegotiated media type is 100baseTx-FD.

 Basic mode control register 0x3000: Auto-negotiation enabled.

 You have link beat, and everything is working OK.

   This transceiver is capable of  100baseTx-FD 100baseTx 10baseT-FD 10baseT.

   Able to perform Auto-negotiation, negotiation complete.

 Your link partner advertised cde1: Flow-control 100baseTx-FD 100baseTx 10baseT-                                                                                                                                                             FD 10baseT, w/ 802.3X flow control.

   End of basic transceiver information.

libmii.c:v2.11 2/28/2005  Donald Becker (becker@scyld.com)

 http://www.scyld.com/diag/index.html

 MII PHY #24 transceiver registers:

   3000 782d 0040 6177 05e1 cde1 000b 0000

   0000 0000 0000 0000 0000 0000 0000 0000

   1000 0321 0000 0001 0000 0f38 0100 0000

   003f fd3e 0f00 ff40 002f 0000 80a0 000b.

 Basic mode control register 0x3000: Auto-negotiation enabled.

 Basic mode status register 0x782d ... 782d.

   Link status: established.

   Capable of  100baseTx-FD 100baseTx 10baseT-FD 10baseT.

   Able to perform Auto-negotiation, negotiation complete.

 Vendor ID is 00:10:18:--:--:--, model 23 rev. 7.

   No specific information is known about this transceiver type.

 I'm advertising 05e1: Flow-control 100baseTx-FD 100baseTx 10baseT-FD 10baseT

   Advertising no additional info pages.

   IEEE 802.3 CSMA/CD protocol.

 Link partner capability is cde1: Flow-control 100baseTx-FD 100baseTx 10baseT-FD                                                                                                                                                              10baseT.

   Negotiation  completed.
```

Doesn't look like it really helps, but I thought I'd post it anyway.  I'm starting to wonder whether I'm looking in the wrong place for my problem...  I'm wondering whether it's actually referencing my /etc/resolv.conf file at all for DNS...  I'm not entirely sure where to look to check though?

Pobster

----------

## Pobtastic

Okay...  I just checked /var/log/emerge.log and it seems that iptables was one of 4 updated packages I updated on the 12th (Wednesday, the same day I assume everything stopped working)...  /etc/init.d/iptables status confirmed that it *wasn't* running.  So...  On a whim, I got rid of iptables completely with emerge --unmerge iptables and now everything is fine again!!!!  That's really odd...  Still just relieved to have gotten to the bottom of it.

Thanks for your help,

Pobster

----------

## Xanadu

 *Pobtastic wrote:*   

> Okay...  I just checked /var/log/emerge.log and it seems that iptables was one of 4 updated packages I updated on the 12th (Wednesday, the same day I assume everything stopped working)...  /etc/init.d/iptables status confirmed that it *wasn't* running.  So...  On a whim, I got rid of iptables completely with emerge --unmerge iptables and now everything is fine again!!!!  That's really odd...  Still just relieved to have gotten to the bottom of it.
> 
> Thanks for your help,
> 
> Pobster

 

First, I'm sorry for the very late posting.  I don't have an Internet connection in my Apartment yet (long story...  :Sad:  ) so I can't reply to things as fast as I'd like.

That aside, that's some VERY interesting stuff.  I wonder what could possibly have been / is in your iptables rules that would've b0rked that.  Your iptables rules are saved (by default) in /var/lib/iptables/rules-save.  You may want to look that over and see what your machine thought it was supposed to do.  I personally don't use the save/load thing and run my own script when net.(whichever iface) comes up.

Anyway, I'm very glad to hear that you got everything working well again!  I wish you luck in the future!

M.

----------

## Pobtastic

No worries, I appreciate you posting at all!

I've unfortunately already re-compiled the kernel, emerged iptables and redone all the rules again from scratch so the real answer is probably lost now...  It's likely just that I'd forgotten to enable some iptables stuff in the kernel and for whatever reason it borked certain network traffic (I used to roll my own kernels, but nowadays it's much less bother to just use genkernel).  I'm just relieved to have fixed it!

Thanks,

Pobster

----------

## Xanadu

 *Pobtastic wrote:*   

> No worries, I appreciate you posting at all!
> 
> I've unfortunately already re-compiled the kernel, emerged iptables and redone all the rules again from scratch so the real answer is probably lost now...  It's likely just that I'd forgotten to enable some iptables stuff in the kernel and for whatever reason it borked certain network traffic (I used to roll my own kernels, but nowadays it's much less bother to just use genkernel).  I'm just relieved to have fixed it!
> 
> Thanks,
> ...

 

I've tried to use genkernel a few times over the years (the latest being just a few weeks ago) and I have one huge thing I hate about it: I can't seem to start my machines in a different runlevel.  Like if at my lilo prompt I type:

```
gentoo 2
```

It will still boot into my default runlevel (5 on most machines, but that's irrelevant).  It seems to ignore what I manually type on my kernel command line and just do ONLY what's in the MBR (lilo.conf).  I have no idea if it acts different with grub since I had gotten used to lilo in the late 90's and have yet to see a reason to switch (that and, frankly, the grub config is kinda confusing, lilo.conf is straight-forward and human readable...).

Anyway, that's food for another topic I guess.  It's not that big of a deal either way, it doesn't matter much to me WHAT boots my machine as long as it BOOTS!  :Very Happy:  Ya know what I mean?

Well, all that and I do honestly have to admit that I like the "Geek Factor" of:

```
1. emerge $NewKernelSource

2. zcat /proc/config.gz > /usr/src/linux/.config

3. make oldconfig

4. make modules modules_install bzImage install

5. update lilo.conf and update the MBR

6. reboot
```

I dunno.  Maybe it's just me.  :Smile:   Yea, It's all a lot more of a pain-in-the-rump than using genkernel, but there is that bit of "Geek Fun" that's added by doing it myself. 

/me shrugs.

Anyway, I'm starting to ramble again  :Embarassed: , I'm glad that you've got you machine(s) sorted!  

/me shakes his fist at netfilter

 :Laughing: 

----------

