Solution to the problem SSH(pam_limits.so)
Photo by Florian Krumm on Unsplash
Colleagues, good afternoon!
Having worked in support since 2007, I faced many problems, one of which I would like to talk about.
Introduction
One Friday at the end of the working day, a message is received about errors from one production server serving clients from Kaliningrad to Sakhalin.
A quick analysis of the server status showed that the problem is only with triggers that call the Bash shell to run, which seemed very strange. The second oddity seemed that the server was in production, and the problem had been fixed for about 30 minutes, there were no user requests! A survey of colleagues in a working chat, determined that today on this server they solved the problem with “too many open files” and solved it. We launch an SSH session on the server, and ... NO connection !!!
Problem
Analysis of the SSH client logs showed that the channel breaks when trying to start the command interpreter, i.e. authorization and authentication goes through, but the console does not start. Connecting via iLo - the result is the same.
[svutkin@vlg-jan-db01 ~]$ ssh 10.108.176.252 -l svutkin –vvv … debug1: Authentication succeeded (keyboard-interactive). Authenticated to 10.108.176.252 ([10.108.176.252]:22). debug2: shell request accepted on channel 0 … * ******************************************************************** * Welcome to ******************* running RedHat 7.4 * Puppet hostgroup: Base/Oracle/RDBMS * AD OU: OU=DWH,OU=DB,OU=Linux,OU=UNIX,OU=Servers,DC=*******,DC=ru * This system supported by IT-Platform-OS-Unix@*********. * Run 'sudo -l' to view your privileges * ******************************************************************** … debug2: channel 0: almost dead debug2: channel 0: gc: notify user debug2: channel 0: gc: user detached debug2: channel 0: send close debug3: send packet: type 97 debug2: channel 0: is dead … debug1: Exit status 254
Solution
The first search results came up with a solution - Disable PAM.
And change /etc/security/limits.conf
After gaining access to the server (restart and Live-CD) we were surprised, there are no lifted limits!
What happened
- The SSH connection to the server can be broken down into the following steps:
Authorization/Authentication
Starting a command shell
Allocating resources to the user
At the last step, the problem arose, it is responsible for this - pam_limits.so.
pam_limits serves to allocate resources to the user,
the setrlimit
system call is launched, in addition to allocating restrictions
on CPU and RAM, the maximum number of open files is also allocated.
The strace
log confirmed that an error occurs when allocating NOFILE
resources:
setrlimit(RLIMIT_NOFILE, {rlim_cur=1685744, rlim_max=16815744}) = -1
We understand that the value 1685744 (/etc/security/limits.conf
) is not correct,
but we do not understand why.
The next hour, solving the problem, was devoted to studying the documentation
and the Internet, and this is what we learned:
The value set in setrlimit
for NOFILE
should not exceed the value
/proc/sys/fs/nr_open
core, our value was almost 3 times higher.
Aligning the values, the problem was solved.
Why did it happen?
When solving the error “too many open files”, the parameter was increased based
on the value of /proc/sys/fs/file-max
, as well as this value was added to
the file /etc/security/limits.conf
which exceeded /proc/sys/fs/nr_open
.
Why can this happen to me?
On my machine on which I am typing this text now
Conclusions
file-max & file-nr
: The value in file-max denotes the maximum number of file-
handles that the Linux kernel will allocate. When you get lots
of error messages about running out of file handles, you might
want to increase this limit.
nr_open
: This denotes the maximum number of file-handles a process can
allocate. Default value is 1024*1024 (1048576) which should be
enough for most machines. Actual limit depends on RLIMIT_NOFILE
resource limit.
pam_limits.so
- uses setrlimit
, to allocate resources to the user,
which in turn uses the kernel's nr_open
value to determine the maximum value
for file descriptors.
What other applications use the setrlimit
call is not known,
so BE CAREFUL ON DEPLOYING !!!