Reply to comment

Massive postfix

Published in

One of the most critical systems I manage is the mail gateway that handles the e-mail for 20 domains of the Catalan Healthcare system.
This system consists of two Proliant DL380G5 running Redhat Enterprise Linux 5, postfix, amavisd-new, clamav and some other antispam tools.
The infrastructure is relaying emails for about 12000 users (when it was started it was for around 25000 users, but at the present time there is a massive migration to a centralized system for the whole Catalan Government, which is running Exchange, and giving more problems than sollutions).
As you may imagine, on peak hours there are quite a lot of emails going in and out, and each email that is being scanned represents a big amount of file descriptors.
Since two weeks ago I got some error messages on my inbox:

Transcript of session follows.

Out: 220 xxxxxxx ESMTP Postfix
In: EHLO xxxxxxx
Out: 250-xxxxxxx
Out: 250-PIPELINING
Out: 250-SIZE 20480000
Out: 250-VRFY
Out: 250-ETRN
Out: 250-ENHANCEDSTATUSCODES
Out: 250-8BITMIME
Out: 250 DSN
In: MAIL From:<xxxxxxx@xxxxxxx> SIZE=30638
Out: 250 2.1.0 Ok
In: RCPT To:<xxxxxxx@xxxxxxx>
Out: 250 2.1.5 Ok
In: RCPT To:<xxxxxxx@xxxxxxx>
Out: 250 2.1.5 Ok
In: RCPT To:<xxxxxxx@xxxxxxx>
Out: 250 2.1.5 Ok
In: RCPT To:<xxxxxxx@xxxxxxx>
Out: 250 2.1.5 Ok
In: RCPT To:<xxxxxxx@xxxxxxx>
Out: 250 2.1.5 Ok
In: DATA
Out: 354 End data with <CR><LF>.<CR><LF>
Out: 451 4.3.0 Error: queue file write error
In: QUIT
Out: 221 2.0.0 Bye

I got some spare time this morning to investigate, as the mails were arriving and nobody complained about lost emails I left it as a "not so critical" issue.
Normally when the "queue file write error" is given by postfix in my landscape, it means that amavisd-new is not able to handle the request. So I reviewed the amavisd-new logfiles and found out quite a bunch of this error:

Jun 18 11:17:40 xxxxxxx /usr/sbin/amavisd[8631]: (08631-95) (!)ESMTP: NOTICE: ABORTING the session: Can't write to mail file: Bad file descriptor at (eval 82) line 653, <GEN370> chunk 6.

Basically it means that there are no more file descriptors available. I checked the limit:

[root@p0030 postfix]# ulimit -a |grep open
open files (-n) 1024

And then how many fd's where opened:

[root@p0030 var]# ll /proc/[0-9]*/fd | grep -c "root 64"
1018

So yes, it looked like the limit was too low, I increased it by adding the following line to the /etc/security/limits.conf file

* - nofile 4096

Then logged again to get the new limit and restarted amavis. The issue has been fixed!
currently I have near 3500 fd's opened in those systems:

[root@p0030 var]# ll /proc/[0-9]*/fd | grep -c "root 64"
3465

Reply

  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • You may post PHP code. You should include <?php ?> tags.

More information about formatting options