# lighttpd gets killed on first incoming connection.

## Napalm Llama

I think PaX/grsecurity are to blame.

What happens is as follows:

Start lighttpd

ps aux shows its process, along with php-cgi child processes.

Attempt to connect to it from another machine: SYN sent; ACK RST received.

ps aux now shows no lighttpd process, although all the php ones are still there.

This shows up in /var/log/grsec.log:

```
May 29 22:52:43 muttley grsec: From 10.0.0.2: signal 6 sent to /usr/sbin/lighttpd[lighttpd:14830] uid/euid:102/102 gid/egid:1002/1002, parent /sbin/init[init:1] uid/euid:0/0 gid/egid:0/0
```

These are my first dabblings with hardened Linux, so I don't really know what to do.  I don't particularly want to chpax my webserver process, as I feel that would eliminate a large part of the protection the hardening was supposed to provide.  Can anybody help?

[edit:]

In the interests of science, I disabled all the protection using chpax anyway, but the problem still occurred.  Maybe I have too restrictive GrSec settings in my kernel?

----------

## Napalm Llama

This just in (and bump, sorry):

PaX/grsecurity are not to blame!  I just booted a non-hardened kernel without these features, and I still get the same problem.  I'm now blaming uClibc.  Does anybody know why this might be happening?

----------

## Hu

A bump which provides new information is quite forgivable, in my opinion.

According to kill -l, signal 6 is SIGABRT.  Can you attach sys-devel/gdb to the process before it aborts?  That should let you get a stack backtrace, which will let you identify the function that is calling abort.  Depending on your build settings, you may need to re-emerge some packages with debugging symbols in order to get a useful backtrace.  See How to get meaningful backtraces in Gentoo for instructions.

Also, on general principle, can you post the output of emerge --info from that box?  The results of emerge --pretend --verbose <list of all suspect packages> would be good too, so that we can see what USE flags and versions you are using for those packages.

----------

## Napalm Llama

OK, I've got another slight problem - I can't seem to get gdb to resolve the symbol names.  I'm using the -ggdb CFLAG, and because I'm on a hardened system I'm also trying to use the -nopie LDFLAG.  But when I enable the -nopie LDFLAG, the configure stage of the lighttpd install says this:

```
checking for C compiler default output file name...

configure: error: C compiler cannot create executables

See `config.log' for more details.

!!! Please attach the following file when filing a report to bugs.gentoo.org:

!!! /var/tmp/portage/www-servers/lighttpd-1.4.15/work/lighttpd-1.4.15/config.log

!!! ERROR: www-servers/lighttpd-1.4.15 failed.

Call stack:

  ebuild.sh, line 1615:   Called dyn_compile

  ebuild.sh, line 972:   Called qa_call 'src_compile'

  ebuild.sh, line 44:   Called src_compile

  lighttpd-1.4.15.ebuild, line 118:   Called econf '--libdir=/usr/lib/lighttpd' '--enable-lfs' '--enable-ipv6' '--with-bzip2' '--without-fam' '--without-gdbm' '--without-lua' '--without-ldap' '--without-memcache' '--with-mysql' '--with-pcre' '--with-openssl' '--without-webdav-props' '--without-webdav-locks' '--without-attr'

  ebuild.sh, line 578:   Called die

!!! econf failed

!!! If you need support, post the topmost build error, and the call stack if relevant.

!!! A complete build log is located at '/var/tmp/portage/www-servers/lighttpd-1.4.15/temp/build.log'.
```

In the meantime, this is all I get from gdb:

```
muttley ~ # gdb /usr/sbin/lighttpd

GNU gdb 6.6

Copyright (C) 2006 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "powerpc-gentoo-linux-uclibc"...

Using host libthread_db library "/lib/libthread_db.so.1".

(gdb) set args -Df /etc/lighttpd/lighttpd.conf

(gdb) run

Starting program: /usr/sbin/lighttpd -Df /etc/lighttpd/lighttpd.conf

Program received signal SIGABRT, Aborted.

0x302a7608 in ?? ()

(gdb) quit

The program is running.  Exit anyway? (y or n) y
```

Not very useful.

emerge -pv lighttpd gives this:

[ebuild   R   ] www-servers/lighttpd-1.4.15  USE="bzip2 fastcgi ipv6 mysql pcre php ssl -doc -fam -gdbm -ldap -lua -memcache -minimal -rrdtool -test -webdav -xattr" 0 kB

emerge --info gives this:

```
Portage 2.1.2.7 (uclibc/ppc/hardened, gcc-3.4.6, uclibc-0.9.28.3-r0, 2.6.21-gentoo-r2-muttley-1 ppc)

=================================================================

System uname: 2.6.21-gentoo-r2-muttley-1 ppc G2_LE

Gentoo Base System release 1.12.9

Timestamp of tree: Wed, 30 May 2007 16:20:01 +0000

dev-lang/python:     2.4.4-r4

dev-python/pycrypto: 2.0.1-r5

sys-apps/sandbox:    1.2.17

sys-devel/autoconf:  2.61

sys-devel/automake:  1.6.3, 1.9.6-r2, 1.10

sys-devel/binutils:  2.17

sys-devel/gcc-config: 1.3.16

sys-devel/libtool:   1.5.22

virtual/os-headers:  2.6.20-r2

ACCEPT_KEYWORDS="ppc"

AUTOCLEAN="yes"

CBUILD="powerpc-gentoo-linux-uclibc"

CFLAGS="-O2 -mcpu=603e -pipe -ggdb"

CHOST="powerpc-gentoo-linux-uclibc"

CONFIG_PROTECT="/etc"

CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo"

CXXFLAGS="-O2 -mcpu=603e -pipe -ggdb"

DISTDIR="/usr/portage/distfiles"

FEATURES="autoconfig distlocks fixpackages metadata-transfer nodoc noinfo noman nostrip parallel-fetch sandbox sfperms strict"

GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"

LDFLAGS="-Wl,-z,relro,-nopie"

MAKEOPTS="-j2"

PKGDIR="/usr/portage/packages"

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"

PORTAGE_TMPDIR="/var/tmp"

PORTDIR="/usr/portage"

PORTDIR_OVERLAY="/usr/local/portage"

SYNC="rsync://10.0.0.2/gentoo-portage"

USE="alsa bitmap-fonts bzip2 cgi cli cracklib dri embedded fastcgi gd hardened ipv6 mudflap mysql ncurses openmp pcre php pic ppc readline reflection session spl ssl syslog truetype-fonts type1-fonts uclibc unicode xorg zlib" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="uclibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="dummy fbdev v4l"

Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
```

Even though my toolchain is hardened (and I've rebuilt world with it), I'm not actually running a hardened kernel at the moment.  I usually do, but I'm just trying to simplify matters while I fix this problem.

Thanks for your attention  :Smile: 

----------

## Hu

Instead of using -nopie, try switching the system compiler to a non-hardened version using gcc-config.  You should switch back before building any other packages, of course.  :Smile: 

When GDB breaks in, try to print a backtrace using bt.  This may work even if the current instruction pointer is bogus.  Also, run info registers to print out the contents of the processor registers at the time of the abort.  I am guessing at what information to retrieve.  Hopefully, one of these will turn up useful information.

----------

## Napalm Llama

Ah, that's a bit better.  Thanks for the pointers.

Ahem.  Here's the backtrace:  :Smile: 

```
(gdb) set args -Df /etc/lighttpd/lighttpd.conf

(gdb) run

Starting program: /usr/sbin/lighttpd -Df /etc/lighttpd/lighttpd.conf

Program received signal SIGABRT, Aborted.

0x302a7608 in kill () from /lib/libc.so.0

(gdb) bt

#0  0x302a7608 in kill () from /lib/libc.so.0

#1  0x30294fcc in raise () from /lib/libc.so.0

#2  0x302a1000 in abort () from /lib/libc.so.0

#3  0x10016dac in fdevent_get_handler (ev=0x0, fd=6) at fdevent.c:171

#4  0x100068a4 in main (argc=0, argv=0x10020000) at server.c:1301

(gdb) info registers

r0             0x25     37

r1             0x7fa5e0a0       2141577376

r2             0x0      0

r3             0x0      0

r4             0x6      6

r5             0x28004082       671105154

r6             0x3e36   15926

r7             0x302a6fb4       808087476

r8             0xf932   63794

r9             0x0      0

r10            0x1032   4146

r11            0x0      0

r12            0xc2852000       3263504384

r13            0xbfffffff       3221225471

r14            0x10022b68       268577640

r15            0x10022b54       268577620

r16            0x10040000       268697600

r17            0x0      0

r18            0x10040000       268697600

r19            0x0      0

r20            0x10020000       268566528

r21            0x10020000       268566528

r22            0x1      1

---Type <return> to continue, or q <return> to quit---

r23            0x10020000       268566528

r24            0x0      0

r25            0x10020000       268566528

r26            0x0      0

r27            0x1      1

r28            0x100413b8       268702648

r29            0x6      6

r30            0x302c40ec       808206572

r31            0x302c42a8       808207016

pc             0x302a7608       808089096

cr             0x28004082       671105154

lr             0x30294fcc       808013772

ctr            0x0      0

xer            0x20000000       536870912
```

Make of that what you will  :Confused: 

----------

## Hu

fdevent_get_handler is calling abort() from fdevent.c:171.  If I had to guess, I would say some sanity check is failing, resulting in lighttpd aborting.  Check what conditions can cause it to reach that line.  Based on the shallowness of the call stack, the file is probably part of lighttpd, rather than one of its dependents.

----------

## Napalm Llama

fdevent.c is indeed part of lighttpd, and the offending function looks like this:

```
fdevent_handler fdevent_get_handler(fdevents *ev, int fd) {

   if (ev->fdarray[fd] == NULL) SEGFAULT();

   if (ev->fdarray[fd]->fd != fd) SEGFAULT(); // Line 171 [my comment]

   return ev->fdarray[fd]->handler;

}
```

The steps required to make the server crash are simply making an ordinary HTTP connection to it.  Unfortunately I only have a passing knowledge of C, and I don't understand enough of it to work out the route through the code to get to that point.  Is this a bug I should take up with the lighttpd folk?

----------

## Hu

Like I suspected, some sanity check is failing.  The authors expect that the fd (file descriptor) in a particular array slot has the same value as the index of that slot.  When it does not, they abort the program because something is wrong.  A quick scan of the Gentoo patches for this suggests that Gentoo does not patch this function, so this is probably something you need to take upstream.

If possible, you should try to give them the contents of those variables.  I cannot believe that they would leave such a serious bug in if they knew about it and it could be easily reproduced, so there must be something in your setup that exposes (or maybe even causes) the problem.  I do not have lighttpd, so this will be from memory, but try the following:

```
# Attach gdb to the running lighttpd.  We need to set a breakpoint before you cause the abort.

> gdb -p $(pgrep lighttpd)

# Break when this function is hit

> break fdevent_get_handler

```

Now make the connection.  gdb should break in when fdevent_get_handler is reached, allowing you to:

```

> print fd

> print *ev

> print *ev->fdarray[fd]
```

Hopefully, the output from those commands will give the lighttpd maintainer some insight into how your process got into this state.  As always when reporting serious bugs, be prepared to go through a few iterations if the maintainer cannot reproduce the problem.  Good luck, and let us know how it goes (or if you need more help for reporting this).

----------

## Napalm Llama

Hmm, doesn't seem to work:

```
[ Loading / Reading lots of symbols ]

0x302a6288 in epoll_wait () from /lib/libc.so.0

(gdb) break fdevent_get_handler

Breakpoint 1 at 0x10016d54: file fdevent.c, line 170.

(gdb) print fd

No symbol "fd" in current context.

(gdb) print *ev

No symbol "ev" in current context.

(gdb) print *ev->fdarray[fd]

No symbol "ev" in current context.
```

However, a bug appear to exist for this already:

http://trac.lighttpd.net/trac/ticket/1197

The reporter is running OSX, so I'm thinking maybe it's a PPC problem, rather than a uClibc one, as that seems to be the only common element (other than the Unix-like operating system, which is kind of obvious). I'll add my 2¢ to the bugtracker there, and see what happens.  I guess I'm stuck without an HTTP server until a fix comes through...

----------

## Hu

After setting the breakpoint, you need to continue the program.  Make the connection, then execute the print statements when gdb breaks in.  Since the bug report already exists, it may be a moot point whether you can get the values of those variables.

----------

## Napalm Llama

OK, that produced something a bit more interesting:

```
(gdb) break fdevent_get_handler

Breakpoint 1 at 0x10016d54: file fdevent.c, line 170.

[ At this point I made the connection ]

(gdb) continue

Continuing.

Breakpoint 1, fdevent_get_handler (ev=0x10048a80, fd=0) at fdevent.c:170

170     fdevent.c: No such file or directory.

        in fdevent.c

(gdb) print fd

$1 = 0

(gdb) print *ev

$2 = {type = FDEVENT_HANDLER_LINUX_SYSEPOLL, fdarray = 0x10048ee8,

  maxfds = 1015, in_sigio = 0, signum = 0, sigset = {__val = {

      0 <repeats 32 times>}}, siginfo = {si_signo = 0, si_errno = 0,

    si_code = 0, _sifields = {_pad = {0 <repeats 29 times>}, _kill = {

        si_pid = 0, si_uid = 0}, _timer = {_timer1 = 0, _timer2 = 0}, _rt = {

        si_pid = 0, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}},

      _sigchld = {si_pid = 0, si_uid = 0, si_status = 0, si_utime = 0,

        si_stime = 0}, _sigfault = {si_addr = 0x0}, _sigpoll = {si_band = 0,

        si_fd = 0}}}, sigbset = 0x0, epoll_fd = 6, epoll_events = 0x10049ec8,

  pollfds = 0x0, size = 0, used = 0, unused = {ptr = 0x0, used = 0, size = 0},

  select_read = {__fds_bits = {0 <repeats 32 times>}}, select_write = {

    __fds_bits = {0 <repeats 32 times>}}, select_error = {__fds_bits = {

      0 <repeats 32 times>}}, select_set_read = {__fds_bits = {

      0 <repeats 32 times>}}, select_set_write = {__fds_bits = {

      0 <repeats 32 times>}}, select_set_error = {__fds_bits = {

      0 <repeats 32 times>}}, select_max_fd = 0, reset = 0,

  free = 0x1001c9bc <fdevent_linux_sysepoll_free>,

  event_add = 0x1001caa8 <fdevent_linux_sysepoll_event_add>,

  event_del = 0x1001c9f4 <fdevent_linux_sysepoll_event_del>,

  event_get_revent = 0x1001cbd4 <fdevent_linux_sysepoll_event_get_revent>,

  event_get_fd = 0x1001cbe4 <fdevent_linux_sysepoll_event_get_fd>,

  event_next_fdndx = 0x1001cbf8 <fdevent_linux_sysepoll_event_next_fdndx>,

  poll = 0x1001cba0 <fdevent_linux_sysepoll_poll>, fcntl_set = 0}

(gdb) print *ev->fdarray[fd]

Cannot access memory at address 0x0
```

I'll see what the lighttpd folk want though.  If necessary, I'll point them back to this thread.

Thanks for all your patience  :Very Happy: 

----------

