# Need help diagnosing flaky wireless

## kwesadilo

For approximately the past year, I have had extreme difficulty connecting to the wireless network on my college campus. It is an unsecured 802.11bgn network, and I have an Atheros AR5212 wireless chipset. Most recently, I have been using Wicd to handle my connection, but I have also experienced this issue with NetworkManager and with manual configuration using iw and ifconfig. In Wicd, I open the list of wireless networks and say to connect to one, it quickly (says that it) gets through the steps before requesting an IP address. It stays there for a long time (~30 seconds) before popping up a notification that says it failed to get an IP address. If I try one of the other access points with the same SSID, it will do the same thing. Very occasionally (maybe 1 in 20 times), it will have the delay while requesting an IP address and then actually get one.

Here's some background: After I got my kernel set up correctly, I was able to connect to the campus network using Gentoo for several months. As far as I know, I did not do anything significant to my computer before the failure occurred. Last October, I was sitting in the lab using the wireless network when I lost my connection. No big deal, happens all of the time. But it failed to reconnect. I was using GNOME and NetworkManager at the time, so I didn't have any diagnostic information. I tried reconnecting for the next half hour or so, trying different access points, restarting NetworkManager, and powering the wireless adapter and computer on and off. Nothing helped. Since then, most of my attempts to connect to the campus wireless network fail as described above.

I initially thought that DHCP was just slow, so I increased the timeout for dhcpcd to 2 minutes. I initially thought that that made it better, but I don't think it really did. I tried messing around with iwconfig, iw, wpa_supplicant, and ifconfig to try to connect manually. I think it failed at the association level, causing dhcpcd to time out later, but I don't remember what I did well enough to reproduce it. I only seem to have problems on campus, which up until this year has meant that I'm never trying to connect to the network when I'm busy with stuff and can't troubleshoot this. Once or twice, I have observed somewhat similar behavior on other networks, but I can never reproduce it. I have observed the wireless to not work and then work on the same access point, so I don't think it's access-point dependent.

I really have no idea what the problem is. Since it only happens on campus, I'm guessing something the school is doing is at least partially responsible. I doubt that they will be inclined to take a bunch of time to chase down some problem that only affects a few people. (A couple of other Linux-using people have similar but not identical issues.) To get them to do anything, I'll probably need to point out a specific problem. If it's a problem with my setup, I'd also like to chase it down and fix it. I don't know where to start, though.

Here is a dump of my /var/log/everything/current from a successful connection:

```
Sep 14 13:42:43 [kernel] [   44.600896] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 14 13:42:48 [kernel] [   50.387560] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 14 13:42:55 [dhcpcd] dhcpcd not running

Sep 14 13:42:55 [kernel] [   56.978756] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 14 13:42:55 [dhcpcd] dhcpcd not running

Sep 14 13:42:56 [kernel] [   57.584531] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 14 13:42:56 [dhcpcd] dhcpcd not running

Sep 14 13:42:56 [kernel] [   57.607052] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 14 13:42:58 [dhcpcd] version 5.2.12 starting

Sep 14 13:42:58 [dhcpcd] wlan0: waiting for carrier

Sep 14 13:43:28 [dhcpcd] timed out

Sep 14 13:43:28 [dhcpcd] allowing 8 seconds for IPv4LL timeout

Sep 14 13:43:36 [dhcpcd] timed out

Sep 14 13:43:41 [dhcpcd] dhcpcd not running

Sep 14 13:43:41 [kernel] [  103.504021] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 14 13:43:41 [dhcpcd] dhcpcd not running

Sep 14 13:43:42 [kernel] [  103.715204] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

Sep 14 13:43:42 [kernel] [  104.093578] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 14 13:43:42 [kernel] [  104.105066] cfg80211: Calling CRDA for country: US

Sep 14 13:43:42 [dhcpcd] dhcpcd not running

Sep 14 13:43:42 [kernel] [  104.131472] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 14 13:43:43 [dhcpcd] dhcpcd not running

Sep 14 13:43:43 [kernel] [  105.135312] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 14 13:43:43 [dhcpcd] dhcpcd not running

Sep 14 13:43:44 [kernel] [  105.726554] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 14 13:43:44 [dhcpcd] version 5.2.12 starting

Sep 14 13:43:44 [kernel] [  106.154478] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

Sep 14 13:43:44 [dhcpcd] wlan0: broadcasting for a lease

Sep 14 13:43:45 [dhcpcd] wlan0: offered x.x.x.x from y.y.y.y

Sep 14 13:43:45 [dhcpcd] wlan0: acknowledged x.x.x.x from y.y.y.y

Sep 14 13:43:45 [dhcpcd] wlan0: checking for x.x.x.x

Sep 14 13:43:50 [dhcpcd] wlan0: leased x.x.x.x for 240 seconds

Sep 14 13:43:50 [dhcpcd] forked to background, child pid 4318

Sep 14 13:45:50 [dhcpcd] wlan0: renewing lease of x.x.x.x

Sep 14 13:45:50 [dhcpcd] wlan0: acknowledged x.x.x.x from y.y.y.y

Sep 14 13:45:50 [dhcpcd] wlan0: leased x.x.x.x for 900 seconds
```

and here is one from a failed connection:

```
Sep 13 17:19:48 [dhcpcd] wlan0: removing interface

Sep 13 17:19:48 [kernel] [97928.931801] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:19:48 [dhcpcd] dhcpcd not running

Sep 13 17:19:48 [kernel] [97929.524551] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 13 17:19:50 [kernel] [97930.846927] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None

Sep 13 17:19:50 [kernel] [97930.846931] e1000e 0000:00:19.0: eth0: 10/100 speed: disabling TSO

Sep 13 17:19:50 [kernel] [97930.847362] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Sep 13 17:20:22 [kernel] [97962.881178] e1000e: eth0 NIC Link is Down

Sep 13 17:20:39 [dhcpcd] dhcpcd not running

Sep 13 17:20:39 [kernel] [97979.658915] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:20:39 [dhcpcd] dhcpcd not running

Sep 13 17:20:39 [kernel] [97980.248583] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 13 17:20:39 [dhcpcd] dhcpcd not running

Sep 13 17:20:39 [kernel] [97980.282053] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:20:41 [dhcpcd] version 5.2.12 starting

Sep 13 17:20:41 [dhcpcd] wlan0: waiting for carrier

Sep 13 17:21:11 [dhcpcd] timed out

Sep 13 17:21:11 [dhcpcd] allowing 8 seconds for IPv4LL timeout

Sep 13 17:21:19 [dhcpcd] timed out

Sep 13 17:21:24 [dhcpcd] dhcpcd not running

Sep 13 17:21:24 [kernel] [98025.530689] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:21:24 [dhcpcd] dhcpcd not running

Sep 13 17:21:25 [kernel] [98026.122565] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 13 17:21:25 [dhcpcd] dhcpcd not running

Sep 13 17:21:25 [kernel] [98026.152404] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:21:26 [dhcpcd] dhcpcd not running

Sep 13 17:21:26 [kernel] [98027.157448] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:21:26 [dhcpcd] dhcpcd not running

Sep 13 17:21:27 [kernel] [98027.750562] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 13 17:21:27 [dhcpcd] version 5.2.12 starting

Sep 13 17:21:27 [dhcpcd] wlan0: waiting for carrier

Sep 13 17:21:57 [dhcpcd] timed out

Sep 13 17:21:57 [dhcpcd] allowing 8 seconds for IPv4LL timeout

Sep 13 17:22:05 [dhcpcd] timed out

Sep 13 17:22:11 [dhcpcd] dhcpcd not running

Sep 13 17:22:11 [kernel] [98072.551857] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:22:11 [dhcpcd] dhcpcd not running

Sep 13 17:22:12 [kernel] [98073.144558] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 13 17:22:12 [dhcpcd] dhcpcd not running

Sep 13 17:22:12 [kernel] [98073.174725] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:22:13 [dhcpcd] dhcpcd not running

Sep 13 17:22:13 [kernel] [98074.175324] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:22:13 [dhcpcd] dhcpcd not running

Sep 13 17:22:14 [kernel] [98074.767582] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 13 17:22:14 [dhcpcd] version 5.2.12 starting

Sep 13 17:22:14 [dhcpcd] wlan0: waiting for carrier

Sep 13 17:22:44 [dhcpcd] timed out

Sep 13 17:22:44 [dhcpcd] allowing 8 seconds for IPv4LL timeout

Sep 13 17:22:52 [dhcpcd] timed out

Sep 13 17:22:58 [dhcpcd] dhcpcd not running

Sep 13 17:22:58 [kernel] [98119.560414] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:22:58 [dhcpcd] dhcpcd not running

Sep 13 17:22:59 [kernel] [98120.152554] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 13 17:22:59 [dhcpcd] dhcpcd not running

Sep 13 17:22:59 [kernel] [98120.183738] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:23:00 [dhcpcd] dhcpcd not running

Sep 13 17:23:00 [kernel] [98121.186855] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:23:00 [dhcpcd] dhcpcd not running

Sep 13 17:23:01 [kernel] [98121.778583] ADDRCONF(NETDEV_UP): eth0: link is not ready

Sep 13 17:23:01 [dhcpcd] version 5.2.12 starting

Sep 13 17:23:01 [dhcpcd] wlan0: waiting for carrier

Sep 13 17:23:31 [dhcpcd] timed out

Sep 13 17:23:31 [dhcpcd] allowing 8 seconds for IPv4LL timeout

Sep 13 17:23:39 [dhcpcd] timed out

Sep 13 17:23:41 [dhcpcd] dhcpcd not running

Sep 13 17:23:41 [kernel] [98162.054164] ADDRCONF(NETDEV_UP): wlan0: link is not ready

Sep 13 17:23:41 [dhcpcd] dhcpcd not running

Sep 13 17:23:42 [kernel] [98162.677576] ADDRCONF(NETDEV_UP): eth0: link is not ready
```

Unfortunately, they are not from the same access point. However, I initially failed to connect to the access point from the good connection. I rebooted and connected on the first try. But rebooting usually doesn't fix this.

I'm not sure if what I posted is helpful. Please let me know if you need any other information, and thanks for your help.

Edit: Updated subject to reflect the kind of assistance I'm looking for

----------

## kwesadilo

Bump. I'm still experiencing this problem. I'm not expecting someone to know exactly what's going on from the information I've provided, but it would be nice to know other logs or diagnostics that I could look at to track this down. Or other tests that I could run. Even something I could ask for the network administrators to look at. I'm at my wit's end here.

----------

## cach0rr0

I am...not going to be of much help here, unfortunately

but I can at least confirm you aren't crazy. "Welcome to the world of Linux wireless"

I have one ath9k (AR928x) rig, and one iwlagn rig (WiFi Link 1000BGN)

Both have been pretty well hit or miss, sometimes I hit a kernel version where everything is great, sometimes I update and things go to utter shit. Funny enough, it's rarely the same version for most machines. On these laptops, I decide on what kernel I'm going to run based on nothing other than how stable wireless seems to be with that release. New nasty vulnerability? Don't care, got my wireless. 

Now, one thing I will say, most of my problems were with my AP at the house, a Linksys WRT410N. I would at times run into AP's outside that were comparably flaky, but these were normally in very saturated areas (as in, areas with heaps of hotspots around). 

When it stablized for me at the house, was after I switched up the channel settings on my router. I set one of its wireless NICs to "G only", channel width of "Full (20MHz)", set the channel to Auto (which, I assume looks for an unoccupied channel, and picks that). The other I set to "N only (5GHz)", "Turbo (40MHz)", channel to Auto. 

And things have been hunky dory ever since. Mind you, I *do* only ever connect to the 802.11g network, deciding I'd rather have slower and stable (but not painfully slow, my downstream at home is ~50Mbit tops) than fast and flaky. 

In the past, I'd all but narrowed it down specifically to Wireless-N being unstable as all hell under Linux. Even madwifi was just a clusterfuck (pardon my French). So I moved away from it at the house, and in your case I can't help but think you may be hitting that as well - maybe the AP's you're trying to connect to are trying to do BGN on one interface/channel/ssid, as opposed to simply one of them, or at least just BG. 

Thankfully, with my iwlagn machine, there's an option for that driver such that you can "modprobe iwlagn 11n_disable=1" (or something like that). Before I changed up my router, when I'd load the driver in that fashion, it was much, much more stable. And of course, now that it only has the option of connecting to 802.11g, the issue is all gone. 

So I don't know if we're looking at "Wireless under Linux is still flaky as hell", or "Wireless-N under Linux is flaky as hell". I tend to lean towards the latter, and avoid it where/when I can. 

As far as troubleshooting goes, always kill off things like wicd and/or networkmanager, and instead fashion a wpa_supplicant.conf, run wpa_supplicant from the command line in one console, fire off dhcpcd wlan0 in another console (or screen session, like i do when i do a gentoo install normally). If errors are going to surface, this should at least reveal them. 

Sorry that's not hugely useful, I don't have a ton of wonderful ways of helping try to "debug" this - I am not a coder, so I, presumably like you, am depending on program output to show me useful errors, which it often doesnt in the case of what we use with wireless - but the information is worth sharing I suppose.

----------

## kwesadilo

Well, at least I'm not crazy.

The network is an 802.11n network, but my card doesn't do N, so there's nothing (I think) for me to turn off to decrease flakiness. One thing I have observed that's weird is that Wicd and iw wlan0 scan dump will list networks with channels over 11, up into the 50s. I guess that's the G protocol running in the N band for backwards compatibility or something. I would have thought that my radio would be unable to communicate in the 5 GHz band that N uses, making it physically impossible for me to see such networks. But I can see them and (if I'm lucky) connect to them. I don't know what the deal is.

This is an unsecured network, so I could try to connect using the new iw tool, the old iwconfig tool, or wpa_supplicant. Which do you think produces the most useful diagnostic output when things go wrong?

The thing that annoys me the most about this is that it just happened when I wasn't changing anything. I am a coder, and I wouldn't mind getting into one of the tool or driver projects and trying to fix this. Inconveniently, I am pursuing a degree in computer engineering and am thus too busy to work on real projects that anyone actually cares about. My next best idea is to go back through all of the kernel versions I've ever used and do regression testing. But that's not exactly a quick check, either.

----------

## kwesadilo

I came up with an annoying but functional workaround several weeks ago. It is only now that I am on break from school and have time to post my findings. I should actually be studying for an exam right now, but I don't want to.  :Cool: 

In an attempt to drill down to the cause of my problems, I tried to connect to the network using iw and dhcpcd without running wicd. When I awaken my computer at a new location, I run

```
iw wlan0 scan
```

 If that completes successfully (and it almost always does), I run

```
iw wlan0 connect -w <SSID>
```

 This typically fails like this:

```
wlan0 (phy #0): failed to connect to xx:xx:xx:xx:xx:xx, status: 1: Unspecified failure
```

 except when it fails like this:

```
wlan0 (phy #0): failed to connect to xx:xx:xx:xx:xx:xx, status: 17: Association denied because AP is unable to handle additional associated STA
```

 I think there is one other failure mode that I have seen, but I did not have the presence of mind to record it. The first two are by far the most common. The interesting thing about this is that I can do it over and over and over again, and it will (usually) eventually succeed. I haven't kept really good track of this, but I think it tries another AP after failure 1 but tries the same one again after failure 17. Once I have successfully associated with an access point, I can get layer 3 connectivity with

```
dhcpcd wlan0
```

 This works every time, except when it times out waiting for carrier, and it turns out I'm no longer associated with the AP. I can usually rescan and re-associate at that point and continue the process.

Seemingly, the root of the problem is those association failures I'm getting. It's not just some of the time. I usually fail to associate 10 or 20 times before succeeding. However, if I am able to react to association failures and keep trying to associate until I succeed, I can generally get online in not much more than a minute, because association only takes a few seconds to fail. By contrast, DHCP takes 90 seconds to fail, which means that I could spend (have spent) an hour and a half in a densely covered room and without getting online.

As far as I can tell, Wicd just assumes that association succeeded and doesn't complain until it fails to get an IP address. If it could check the result of the association attempt, there would still be a bug somewhere, but I would be able to connect in a minute or so, which I can live with. I only recently discovered that Wicd has a configuration option that seems to change this: "Ping static gateways after connecting to verify association." But when I select this option in the configurator, it doesn't seem to stick past me closing the window. Do I have to be root to change this?

For now, this is my ugly workaround procedure to get online but avoid having to leave a terminal open to keep dhcpcd alive:

  Have Wicd running

 When I open my laptop, run iw wlan0 scan

 Run iw wlan0 connect -w <SSID>. Wicd may already do this, and I may interrupt it. Either way, it will probably fail

 Wicd will request an IP address. As quickly as possible, before this operation times out, run iw wlan0 connect -w <SSID> over and over until it works

 If I did it fast enough, Wicd will time out and try again to get an IP address, this time succeeding

 Otherwise, repeat step 4, possibly manually commanding Wicd to connect to a wireless network

 I now have a pretty reliable connection to the wireless network

----------

## cach0rr0

somewhere in here i could have sworn i said life with wireless under linux is a royal pain in the ass  :Smile: 

i feel your pain. 

i was getting downright terrible performance from an older zen-sources kernel on a laptop i rarely fire up (iwl-1000/iwlagn). It just flat out wouldnt stay connected, neither with wicd, nor with wpa_supplicant running on its own in a screen session. Once it died off, it was just 'failed to scan ap' over and over and over again, and some but mentioning mlme.c in cfg80211

So i upgrade to the latest zen-stable.git (3.1.2), first give it a go with wicd, then wpa_supplicant; same behaviour 

i go back to 2.6.37, try both with and without compat-wireless, linux-firmware-9999, try the same in 3.1.2

right...so that was epic fail too

havent used gentoo-sources in a good long while, but i tried it out, 3.1.1. wicd still just plain flaky, so im pessimistic

i try wpa_supplicant from the command line, and...hey wait a second, ive been connected to wireless for the last 12 hours with no interruptions. 

(this is all from the last 36 hours, by the way, not some story from ages ago)

it's so damn hit or miss, and it's damn painful trying to figure out "is it wicd on the fritz this time, or is it the kernel, or is it buggy firmware?" - in this case i chalked it up to wicd

This is, thankfully, not my main laptop. I mean im glad i have its wireless working reliably now (for now), but that's unliveable. 

All that to say - I *still* feel your pain. I'm to the point where im going to do with this laptop what i do with the other laptop; only fire up wicd when im on the road and connecting to foreign AP's.While I'm at home, it's not too painful for me to just fire up wpa_supplicant, let it run, start the dhcpcd init script (thus leaving it running, in case for whatever reason things drop out and i need to redo dhcp again)

fun stuff

----------

