Fixing Slow SSH Login Times

July 18, 2024 - 2 mins read

tl/dr: rm /var/log/btmp

Over time, I’ve noticed that logging into my VPS keeps taking longer and longer. And for once, the internet was not that helpful in finding a solution, so some good old troubleshooting was in order.

Looking at the client Logs

Running ssh -vvv host.name id, yields that we are stuck for a couple of seconds after debug3: client_repledge: enter

Authenticated to the.fqdn.of.the.server ([12.34.56.78]:22) using "publickey".
debug1: channel 0: new session [client-session] (inactive timeout: 0)
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug3: send packet: type 90
debug1: Requesting no-more-sessions@openssh.com
debug3: send packet: type 80
debug1: Entering interactive session.
debug1: pledge: filesystem
debug3: client_repledge: enter

So we can see that the connection is quickly established, but then the server is stuck doing something.

Time to check what the server is doing

Using strace, I attach to the main ssh process (with relative timestamps -rT and follow fork mode -f):

strace -rTfp $(pgrep -f /usr/sbin/sshd) 2>log.txt

and then trying to ssh in. After 30s of being stuck, I cancel the tracing. This is what is in the logs:

[pid 3067932]      0.000265 openat(AT_FDCWD, "/var/log/btmp", O_RDONLY) = 5 <0.000108>
[pid 3067932]      0.000313 read(5, "\6\0\0\0\207\330\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000719>
[pid 3067932]      0.000881 read(5, "\6\0\0\0\207\330\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000092>
[pid 3067932]      0.000248 read(5, "\6\0\0\0006\332\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000129>
[pid 3067932]      0.000319 read(5, "\6\0\0\0006\332\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000103>
[pid 3067932]      0.000269 read(5, "\6\0\0\0006\332\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000088>
[pid 3067932]      0.000238 read(5, "\6\0\0\0006\332\t\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000088>
[pid 3067932]      0.000233 read(5, "\6\0\0\0\315\4\n\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000101>
[pid 3067932]      0.000281 read(5, "\6\0\0\0\315\4\n\0ssh:notty\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) = 384 <0.000102>
... hundreds-of-thousands of further read calls follow.

The Issue

We are reading a file /var/log/btmp character by character, but what’s the harm in that?

~# ls -lah /var/log/btmp
-rw-rw---- 1 root utmp 1.2G Jul 18 11:13 /var/log/btmp

oh.

/var/log/btmp is a log of failed ssh login attempts and for each login, we are parsing a 1.2GiB file.

I had long known that my server having an exposed port 22 on a public ipv4 address is being hammered by bots, but I was shocked by the amount of login attempts.

My monitoring also did not catch this slowly but steadily growing logfile.

The fix

For now I just truncated the logfile, then set up logrotate on it.

~# cat /etc/logrotate.d/btmp
/var/log/btmp {
    missingok
    daily
    create 0660 root utmp
    rotate 1
}

I may look into enabling fail2ban again in future, but I’ve managed to lock myself multiple times using it.