# cyrus and database (bdb) serious problem

## drrrl

Hi,

I've got a serious periodical problem with my cyrus databases. It happens irregularly once per month average. There is no visible reason for such behaviour and I've got no idea why doeas it happen.

Symptoms:

- logs start to show something like that:

```

Nov 25 09:27:05 [ctl_cyrusdb] checkpointing cyrus databases

Nov 25 09:27:05 [ctl_cyrusdb] DBERROR db4: DB_LOGC->get: log record checksum mismatch

Nov 25 09:27:05 [ctl_cyrusdb] DBERROR db4: DB_LOGC->get: catastrophic recovery may be required

Nov 25 09:27:05 [ctl_cyrusdb] DBERROR db4: PANIC: DB_RUNRECOVERY: Fatal error, run database recovery

Nov 25 09:27:05 [ctl_cyrusdb] DBERROR: critical database situation

Nov 25 09:27:14 [imap] DBERROR db4: fatal region error detected; run recovery

Nov 25 09:27:14 [imap] DBERROR: error exiting application: DB_RUNRECOVERY: Fatal error, run database recovery

Nov 25 09:27:18 [imap] DBERROR db4: fatal region error detected; run recovery

Nov 25 09:27:18 [imap] DBERROR: dbenv->open '/var/imap/db' failed: DB_RUNRECOVERY: Fatal error, run database recovery

Nov 25 09:27:18 [imap] DBERROR: init() on berkeley

Nov 25 09:29:28 [imap] DBERROR db4: fatal region error detected; run recovery

Nov 25 09:29:28 [imap] DBERROR: dbenv->open '/var/imap/db' failed: DB_RUNRECOVERY: Fatal error, run database recovery

Nov 25 09:29:28 [imap] DBERROR: init() on berkeley

.

repeating every 1-2 minutes

```

And about 10 minutes after the original problem appeared the following logs from postfix begun in addition (what is somehow a logical consequence):

```

Nov 25 09:43:17 [lmtpunix] DBERROR db4: fatal region error detected; run recovery

Nov 25 09:43:17 [lmtpunix] DBERROR: dbenv->open '/var/imap/db' failed: DB_RUNRECOVERY: Fatal error, run database recovery

Nov 25 09:43:17 [lmtpunix] DBERROR: init() on berkeley

Nov 25 09:43:17 [lmtpunix] DBERROR db4: environment not yet opened

Nov 25 09:43:17 [lmtpunix] DBERROR: opening /var/imap/deliver.db: Invalid argument

Nov 25 09:43:17 [lmtpunix] DBERROR: opening /var/imap/deliver.db: cyrusdb error

Nov 25 09:43:17 [lmtpunix] FATAL: lmtpd: unable to init duplicate delivery database

Nov 25 09:43:17 [master] service lmtpunix pid 14560 in READY state: terminated abnormally

(repeating all the time I mean every second)

```

- mail stops to be delivered locally (it's obvious)

- server load goes up (metalog takes a lot of CPU resources)

There is, fortunately, a way to recover the services: a simple '/etc/init.d/cyrus restart' helps immediately:

```

Nov 25 11:25:55 [master] exiting on SIGTERM/SIGINT

Nov 25 11:25:55 [deliver] backend_connect(): couldn't read initial greeting: Connection reset by peer

                - Last output repeated 2 times -

Nov 25 11:25:55 [postfix/pipe] 14FAE53092: to=<anyuser@anything.com-example>, relay=cyrus,

delay=3559, status=deferred (temporary failure. Command output: couldn't connect to lmtpd: Connection reset by peer_ 421 4.3.0 deliver: couldn't connect to lmtpd_ )

Nov 25 11:25:55 [deliver] connect(/var/imap/socket/lmtp) failed: Connection refused

Nov 25 11:25:55 [postfix/pipe] E2B7D5309A: to=<anyuser@anything.com-example>, relay=cyrus, delay=572, status=deferred (temporary failure. Command output: couldn't connect tolmtpd: Connection reset by peer_ 421 4.3.0 deliver: couldn't connect to lmtpd_ )

Nov 25 11:25:55 [postfix/pipe] 459BA53091: to=<anyuser@anything.com-example>, relay=cyrus, delay=3811, status=deferred (temporary failure. Command output: couldn't connect to lmtpd: Connection refused_ 421 4.3.0 deliver: couldn't connect to lmtpd_ )

.

.

.

Nov 25 11:25:56 [master] setrlimit: Unable to set file descriptors limit to -1: Operation not permitted

Nov 25 11:25:56 [master] retrying with 1024 (current max)

Nov 25 11:25:56 [master] process started

Nov 25 11:26:00 [ctl_cyrusdb] recovering cyrus databases

Nov 25 11:26:00 [ctl_cyrusdb] skiplist: recovered /var/imap/mailboxes.db (152 records, 11764 bytes) in 0 seconds

Nov 25 11:26:00 [ctl_cyrusdb] skiplist: recovered /var/imap/annotations.db (0 records, 144 bytes) in 0 seconds

Nov 25 11:26:00 [ctl_cyrusdb] done recovering cyrus databases

Nov 25 11:26:01 [master] ready for work

Nov 25 11:26:01 [ctl_cyrusdb] checkpointing cyrus databases

Nov 25 11:26:01 [tls_prune] tls_prune: purged 1 out of 3 entries

Nov 25 11:26:01 [cyr_expire] duplicate_prune: pruning back 3 days

Nov 25 11:26:01 [cyr_expire] duplicate_prune: purged 0 out of 431 entries

Nov 25 11:26:01 [cyr_expire] expunged 0 out of 0 messages from 0 mailboxes

Nov 25 11:26:04 [ctl_cyrusdb] done checkpointing cyrus databases

```

So the real question is what is the reason of such unexpected problems?

Is it a problem with cyrus or with bdb? Maybe it would be worth to change the database from bdb to something else (as proposed in this topic for ldap, but to be honest I have no idea if it is possible and how to do it. cyrus does not offer any flags to use another database:

```
# equery u cyrus-imapd

[ Searching for packages matching cyrus-imapd... ]

[ Colour Code : set unset ]

[ Legend    : Left column  (U) - USE flags from make.conf              ]

[           : Right column (I) - USE flags packages was installed with ]

[ Found these USE variables for net-mail/cyrus-imapd-2.2.12 ]

 U I

 - - afs      : Adds OpenAFS support (distributed file system)

 - - drac     : Enable dynamic relay support in the cyrus imap server

 - - idled    : Enable idled vs poll IMAP IDLE method

 - - kerberos : Adds kerberos support

 + + pam      : <unknown>

 - - snmp     : Adds support for the Simple Network Management Protocol if available

 + + ssl      : Adds support for Secure Socket Layer connections

 + + tcpd     : Adds support for TCP wrappers

```

So - please help!

Current version of my software are:

```
# eix net-mail/cyrus-imapd

* net-mail/cyrus-imapd

     Available versions:  2.2.10 ~2.2.10-r1 2.2.12 ~2.2.12-r1 ~2.2.12-r2 ~2.2.12-r3

     Installed:           2.2.12

     Homepage:            http://asg.web.cmu.edu/cyrus/imapd/

     Description:         The Cyrus IMAP Server.

# eix sys-libs/db

* sys-libs/db

     Available versions:  1.85-r1 1.85-r2 ~1.85-r3 3.2.9-r7 3.2.9-r10 4.0.14-r2 4.0.14-r3 4.1.25_p1-r3 4.1.25_p1-r4 ~4.2.52_p1 4.2.52_p2 [M]4.3.27

     Installed:           1.85-r2 4.1.25_p1-r4 4.2.52_p2

     Homepage:            http://www.sleepycat.com/

     Description:         Berkeley DB

```

Thanks in advance, Grzegorz

PS. Half a year ago my cyrus databases have been corrupted and restart did not help (see that topic) so I had to make manual recovery. Now, as I have DB4.2 installed recovery is easier (cyrus restart is enough), but the problem still exist...

----------

## Janne Pikkarainen

I had this problem with older version of Cyrus, but 2.2.12 has been rock-solid since day one, from about March of this year or so. If you're still running some older Cyrus, I urge you to upgrade as soon as possible. Newer versions use skiplist as backend for most of the stuff and that has helped me enormously from both reliability and performance point-of-view. 

For me older versions kept crashing with default settings because of too many concurrent users to all db files. deliver.db was especially problematic - for example it became very badly corrupted once, causing Cyrus to malfunction every 30 minutes. Nuking deliver.db (since it's not critical file) helped that time. The problem is that by default BDB uses very small values for its cache (256 kilobytes) and transaction log size (32 kilobytes), which soon becames a Major Pain if your server has lots of users and traffic.

First of all, if you encounter crashes like you've seen with Cyrus:

   - stop Cyrus

   - cd /var/imap/db && db4.2_recover

   - start Cyrus

If that doesn't help:

   - stop Cyrus

   - backup your current /var/imap/*.db and everything under /var/imap/logs/, just in case.

   - dump your current mailboxes.db with ctl_mboxlist -d >some_temp_file.txt

   - remove your deliver.db

   - If using tls sessions, remove also tls_sessions.db

   - Put this to /var/imap/db/DB_CONFIG (to give BDB 16 MB cache and 512 kb transaction log size): 

```
set_cachesize   0    16777216    0

set_lg_bsize   524288
```

   - Reconstruct your mailboxes.db with ctl_mboxlist -u < some_temp_file.txt

   - Make sure all the file permissions seem to be ok

   - Restart Cyrus and make sure it still works. If it doesn't, immediately restore your backups.

BerkeleyDB has caused me reliability problems under other programs and distributions (Red Hat), too, so I try to avoid it whenever possible. For example, when I used BDB as backend for SpamAssassin's bayesian filtering and auto-whitelisting, it caused all kind of deadlocks (not for the whole server, only amavisd-new/SpamAssassin) and mail spool started to grow... and grow... ever since I threw all SA data to MySQL + InnoDB, it has been rock-solid and fast.

Edit: Whoops, didn't notice at first you're already using Cyrus 2.2.12. And since your mailboxes.db seems to be already in skiplist format, you should consider migrating deliver.db to skiplist or alternatively disable duplicatesuppression in /etc/imapd.conf.

----------

## drrrl

Hello,

 *Janne Pikkarainen wrote:*   

> 
> 
> BerkeleyDB has caused me reliability problems under other programs and distributions (Red Hat), too, so I try to avoid it whenever possible.
> 
> 

 

that's true - I had a lot of problems with BDB and slapd. It was one of the reasons I gave up with openldap...

 *Quote:*   

> 
> 
> Whoops, didn't notice at first you're already using Cyrus 2.2.12. And since your mailboxes.db seems to be already in skiplist format, you should consider migrating deliver.db to skiplist or alternatively disable duplicatesuppression in /etc/imapd.conf.

 

I have disabled duplicatesuppression long time ago - I like to see every single email coming, especially in case of problems  :Smile: 

By migrating deliver.db to skiplist you mean the following?

 *Quote:*   

> 
> 
> If that doesn't help:
> 
>    - stop Cyrus
> ...

 

The thing I don't quite understand is why a simple cyrus restart helps. It seems that either database corruption is not so bad in fact or cyrus has very powerful self-repairing mechanism. Or - the third option - I don't know something  :Wink: 

Maybe there is a way to migrate from BDB to any other DB for cyrus?

G.

----------

## Janne Pikkarainen

You can migrate any Cyrus .db file to another format with command cvt_cyrusdb. 

http://asg.web.cmu.edu/cyrus/download/imapd/install-upgrade.html

Look under "Upgrading from 2.1.x or earlier" - it only tells you how to upgrade mailboxes.db, but the magic can be applied to other .db files as well.

At first I was also very frustrated with OpenLDAP + BDB backend, but after tuning DB_CONFIG file it has been reliable for me. I still don't trust it 100%, so I take frequent plain text file backups of all my BDB databases. But I honestly think BDB becames more reliable just by growing its cache size and transaction log size; that prevents many deadlock situations and reduces disk I/O.

Anyway, I believe that corruption only occurs whenever Cyrus commits its changes from the transaction logs to actual database files, and even then only if some corrupted part is accessed. That's why crashes can be very rare.

And why simple restart helps... well, maybe newer versions of Cyrus does that db4.2_recover thing for us during the restart.

----------

## drrrl

Thanks !

I'll try this in the Friday's night  :Wink: 

G.

----------

## drrrl

 *Janne Pikkarainen wrote:*   

> You can migrate any Cyrus .db file to another format with command cvt_cyrusdb. 
> 
> http://asg.web.cmu.edu/cyrus/download/imapd/install-upgrade.html
> 
> Look under "Upgrading from 2.1.x or earlier" - it only tells you how to upgrade mailboxes.db, but the magic can be applied to other .db files as well.
> ...

 

Unfortunately just the migration is not a solution.

I run:

```
# /etc/init.d/cyrus stop

# /usr/lib/cyrus/cvt_cyrusdb /var/imap/deliver.db berkeley /var/imap/deliver.db.new skiplist

# mv deliver.db.new deliver.db

# /etc/init.d/cyrus start
```

the conversion was successful:

```
Dec  3 23:43:57 [cvt_cyrusdb] skiplist: checkpointed /var/imap/deliver.db.new (449 records, 42832 bytes) in 1 second

```

but after starting cyrus I got:

```
Dec  3 23:45:41 [master] setrlimit: Unable to set file descriptors limit to -1:

Operation not permitted

Dec  3 23:45:41 [master] retrying with 1024 (current max)

Dec  3 23:45:41 [master] process started

Dec  3 23:45:42 [ctl_cyrusdb] DBERROR db4: /var/imap/deliver.db: unexpected file type or format

                - Last output repeated 6 times -

.

.

.

Dec  3 23:45:43 [cyr_expire] DBERROR db4: /var/imap/deliver.db: unexpected file type or format

Dec  3 23:45:43 [cyr_expire] DBERROR: opening /var/imap/deliver.db: Invalid argument

Dec  3 23:45:43 [cyr_expire] DBERROR: opening /var/imap/deliver.db: cyrusdb error

```

So... it seems that cyrus "knows" somehow, that deliver.db must be in BDB format, while both mailboxes.db and annotations.db as well as /var/imap/user/?/*.seen are in skiplist format. The question is how to change it?

G.

----------

