Fixing Slow SSH Login Times
tl/dr: rm /var/log/btmp
Over time, I’ve noticed that logging into my VPS keeps taking longer and longer. And for once, the internet was not that helpful in finding a solution, so some good old troubleshooting was in order.
Looking at the client Logs
Running ssh -vvv host.name id
, yields that we are stuck for a couple of seconds after debug3: client_repledge: enter
Authenticated to the.fqdn.of.the.server ([12.34.56.78]:22) using "publickey".
debug1: channel 0: new session [client-session] (inactive timeout: 0)
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug3: send packet: type 90
debug1: Requesting no-more-sessions@openssh.com
debug3: send packet: type 80
debug1: Entering interactive session.
debug1: pledge: filesystem
debug3: client_repledge: enter
So we can see that the connection is quickly established, but then the server is stuck doing something.
Time to check what the server is doing
Using strace, I attach to the main ssh process (with relative timestamps -rT
and follow fork mode -f
):
strace -rTfp $(pgrep -f /usr/sbin/sshd) 2>log.txt
and then trying to ssh in. After 30s of being stuck, I cancel the tracing. This is what is in the logs:
[pid 3067932] 0.000265 openat(AT_FDCWD, "/var/log/btmp", O_RDONLY) = 5 <0.000108>
[pid 3067932] 0.000313 read(5, "\6\0\0\0\207\330\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000719>
[pid 3067932] 0.000881 read(5, "\6\0\0\0\207\330\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000092>
[pid 3067932] 0.000248 read(5, "\6\0\0\0006\332\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000129>
[pid 3067932] 0.000319 read(5, "\6\0\0\0006\332\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000103>
[pid 3067932] 0.000269 read(5, "\6\0\0\0006\332\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000088>
[pid 3067932] 0.000238 read(5, "\6\0\0\0006\332\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000088>
[pid 3067932] 0.000233 read(5, "\6\0\0\0\315\4\n\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000101>
[pid 3067932] 0.000281 read(5, "\6\0\0\0\315\4\n\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000102>
... hundreds-of-thousands of further read calls follow.
The Issue
We are reading a file /var/log/btmp
character by character, but what’s the harm in that?
~# ls -lah /var/log/btmp
-rw-rw---- 1 root utmp 1.2G Jul 18 11:13 /var/log/btmp
oh.
/var/log/btmp
is a log of failed ssh login attempts and for each login, we are parsing a 1.2GiB file.
I had long known that my server having an exposed port 25 on a public ipv4 address is being hammered by bots, but I was shocked by the amount of login attempts.
My monitoring also did not catch this slowly but steadily growing logfile.
The fix
For now I just truncated the logfile, then set up logrotate on it.
I may look into enabling fail2ban
again in future, but I’ve managed to lock myself multiple times using it.