this post was submitted on 28 Apr 2024

13 points (100.0% liked)

Selfhosted

38707 readers

677 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago

MODERATORS

[email protected]

Proxmox Disk Performance Problems (lemmy.procrastinati.org)

submitted 4 months ago* (last edited 4 months ago) by [email protected] to c/[email protected]

23 comments fedilink hide all child comments

I've started encountering a problem that I should use some assistance troubleshooting. I've got a Proxmox system that hosts, primarily, my Opnsense router. I've had this specific setup for about a year.

Recently, I've been experiencing sluggishness and noticed that the IO wait is through the roof. Rebooting the Opnsense VM, which normally only takes a few minutes is now taking upwards of 15-20. The entire time my IO wait sits between 50-80%.

The system has 1 disk in it that is formatted ZFS. I've checked dmesg, and the syslog for indications of disk errors (this feels like a failing disk) and found none. I also checked the smart statistics and they all "PASSED".

Any pointers would be appreciated.

Example of my most recent host reboot.

Edit: I believe I've found the root cause of the change in performance and it was a bit of shooting myself in the foot. I've been experimenting with different tools for log collection and the most recent one is a SIEM tool called Wazuh. I didn't realize that upon reboot it runs an integrity check that generates a ton of disk I/O. So when I rebooted this proxmox server, that integrity check was running on proxmox, my pihole, and (I think) opnsense concurrently. All against a single consumer grade HDD.

Thanks to everyone who responded. I really appreciate all the performance tuning guidance. I've also made the following changes:

Added a 2nd drive (I have several of these lying around, don't ask) converting the zfs pool into a mirror. This gives me both redundancy and should improve read performance.
Configured a 2nd storage target on the same zpool with compression enabled and a 64k block size in proxmox. I then migrated the 2 VMs to that storage.
Since I'm collecting logs in Wazuh I set Opnsense to use ram disks for /tmp and /var/log.

Rebooted Opensense and it was back up in 1:42 min.

top 23 comments

sorted by: hot top controversial new old

[–] [email protected] 3 points 4 months ago (1 children)

There was a recent conversation on the Practical ZFS discourse site about poor disk performance in Proxmox (https://discourse.practicalzfs.com/t/hard-drives-in-zfs-pool-constantly-seeking-every-second/1421/). Not sure if you’re seeing the same thing, but it could be that your VMs are running into the same too-small volblocksize that PVE uses to make zvols for its Vans under ZFS.

If that’s the case, the solution is pretty easy. In your PVE datacenter view, go to storage and create a new ZFS storage pool. Point it to the same zpool/dataset as the one you’ve already got and set the block size to something like 32k or 64k. Once you’ve done that, move the VM’s disk to that new storage pool.

Like I said, not sure if you’re seeing the same issue, but it’s a simple thing to try.

[–] [email protected] 1 points 4 months ago

This was really interesting, thanks for the info.

[–] [email protected] 2 points 4 months ago (1 children)

Check the ZFS pool status. You could lots of errors that ZFS is correcting.

[–] [email protected] 1 points 4 months ago

I'm starting to lean towards this being an I/O issue but I haven't figure out what or why yet. I don't often make changes to this environment since it's running my Opnsens router.

root@proxmox-02:~# zpool status
  pool: rpool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:56:10 with 0 errors on Sun Apr 28 17:24:59 2024
config:

        NAME                                    STATE     READ WRITE CKSUM
        rpool                                   ONLINE       0     0     0
          ata-ST500LM021-1KJ152_W62HRJ1A-part3  ONLINE       0     0     0

errors: No known data errors

[–] [email protected] 1 points 4 months ago (1 children)

(If it's not failing, which would be the first thing I'd check)

Do you have any new VMs up and running. IO was the bane of my existence with proxmox, but realized it's just that VMs eat a ton of IO, especially with ZFS. A standard HDD won't cut it (unless you have one and only one VM using that disk). Even sata SSDs just didn't cut it over time, I had to build a full raid that would support 5-10 VMs on it before I saw IO wait drop enough.

[–] [email protected] 1 points 4 months ago (1 children)

I'm trying to think of anything I may have changed since the last time I rebooted the opnsense VM. But I try to keep up on updates and end up rebooting pretty regularly. The only things on this system are the opnsense VM and a small pihole VM. At the time of the screenshot above, the opnsense VM was the only thing running.

If it's not a failing HDD, my next step is to try and dig into what's generating the I/O to see if there's something misbehaving.

[–] [email protected] 2 points 4 months ago

I had bad luck with ZFS on proxmox because of all of the overhead, I found with my tiny cluster it was better to do good old ext4 and then just do regular backups. ZFS actually killed quite a few of my drives because of it's heavyweight. Not saying that's your problem, but I wouldn't be surprised if it was

[–] [email protected] 1 points 4 months ago (1 children)

It could be a disk slowly failing but not throwing errors yet. Some drives really do their best to hide that they're failing. So even a passing SMART test I would take with some salt.

I would start by making sure you have good recent backups ASAP.

You can test the drive performance by shutting down all VMs and using tools like fio to do some disk benchmarking. It could be a VM causing it. If it's an HDD in particular, the random reads and writes from VMs can really cause seek latency to shoot way up. Could be as simple as a service logging some warnings due to junk incoming traffic, or an update that added some more info logs, etc.

[–] [email protected] 1 points 4 months ago

I would start by making sure you have good recent backups ASAP.

I do.

Could be as simple as a service logging some warnings due to junk incoming traffic, or an update that added some more info logs, etc.

Possible. It's a really consistent (and stark) degradation in performance tho and is repeatable even when the opnsense VM is the only one running.

[–] [email protected] 0 points 4 months ago (2 children)

Did you do a smart test?

[–] [email protected] 1 points 4 months ago

Short test completed without error.

[–] [email protected] 0 points 4 months ago (1 children)

Kinda feel dumb that my answer is no. Let me do that and report back.

[–] [email protected] 2 points 4 months ago (1 children)

While you're waiting for that, I'd also look at the smart data and write the output to a file, then check it again later to see if any of the numbers have changed, especially reallocated sectors, pending sectors, corrected and uncorrected errors, stuff like that.

Actually, I'm pretty sure that Proxmox will notify you of certain smart issues if you have emails configured.

You should also shut the host down and reseat the drives, and check the cables to make sure they're all properly seated too. It's possible that one has come loose but not enough to drop the link.

[–] [email protected] 1 points 4 months ago

While you’re waiting for that, I’d also look at the smart data and write the output to a file, then check it again later to see if any of the numbers have changed, especially reallocated sectors, pending sectors, corrected and uncorrected errors, stuff like that.

That's a good idea. Thanks.

[–] [email protected] 0 points 4 months ago (1 children)

iowait is indicative of storage not being able to keep up with the performance of the rest of the system. What hardware are you using for storage here?

[–] [email protected] 1 points 4 months ago (2 children)

It's an old Optiplex SFF with a single HDD. Again, my concern isn't that it's "slow". It's that performance has rather suddenly tanked and the only changes I've made are regular OS updates.

[–] [email protected] 2 points 4 months ago* (last edited 4 months ago) (1 children)

If I had to guess there was a code change in the PVE kernel or in their integrated ZFS module that led to a performance regression for your use case. I don't really have any feedback there, PVE ships a modified version of an older kernel (6.2?) so something could have been backported into that tree that led to the regression. Same deal with ZFS, whichever version the PVE folks are shipping could have introduced a regression as well.

Your best bet is to raise an issue with the PVE folks after identifying which kernel version introduced the regression, you'll want to do a binary search between now and the last known good time that this wasn't occurring to determine exactly when the issue started - then you can open an issue describing the regression.

Or just throw a cheap SSD at the problem and move on, that's what I'd do here. Something like this should outlast the machine you put it in.

Edit: the Samsung 863a also pops up cheaply from time to time, it has good endurance and PLP. Basically just search fleaBay for SATA drives with capacities of 400/480gb, 800/960gb, 1.6T/1.92T or 3.2T/3.84T and check their datasheets for endurance info and PLP capability. Anything in the 400/800/1600/3200Gb sequence is a model with more overprovisioning and higher endurance (usually refered to as mixed use) model. Those often have 3 DWPD or 5 DWPD ratings and are a safe bet if you have a write heavy workload.

[–] [email protected] 1 points 4 months ago (1 children)

I thought cheap SSDs and ZFS didn't play well together?

[–] [email protected] 2 points 4 months ago* (last edited 4 months ago) (1 children)

Depends on the SSD, the one I linked is fine for casual home server use. You're unlikely to see enough of a write workload that endurance will be an issue. That's an enterprise drive btw, it certainly wasn't cheap when it was brand new and I doubt running a couple of VMs will wear it quickly. (I've had a few of those in service at home for 3-4y, no problems.)

Consumer drives have more issues, their write endurance is considerably lower than most enterprise parts. You can blow through a cheap consumer SSD's endurance in mere months with a hypervisor workload so I'd strongly recommend using enterprise drives where possible.

It's always worth taking a look at drive datasheets when you're considering them and comparing the warranty lifespan to your expected usage too. The drive linked above has an expected endurance of like 2PB (~3 DWPD, OR 2TB/day, over 3y) so you shouldn't have any problems there. See https://www.sandisk.com/content/dam/sandisk-main/en_us/assets/resources/enterprise/data-sheets/cloudspeed-eco-genII-sata-ssd-datasheet.pdf

Older gen retired or old stock parts are basically the only way I buy home server storage now, the value for your money is tremendous and most drives are lightly used at most.

Edit: some select consumer SSDs can work fairly well with ZFS too, but they tend to be higher endurance parts with more baked in over provisioning. It was popular to use Samsung 850 or 860 Pros for a while due to their tremendous endurance (the 512GB 850s often had an endurance lifespan of like 10PB+ before failure thanks to good old high endurance MLC flash) but it's a lot safer to just buy retired enterprise parts now that they're available cheaply. There are some gotchas that come along with using high endurance consumer drives, like poor sync write performance due to lack of PLP, but you'll still see far better performance than an HDD.

[–] [email protected] 1 points 4 months ago

Thanks for all the info. I'll keep this in mind if I replace the drive. I am using refurb enterprise HDDs in my main server. Didn't think I'd need to go enterprise grade for this box but you make a lot of sense.

[–] [email protected] 1 points 4 months ago (2 children)

Quick and easy fix attempt would be to replace the HDD with an SSD. As others have said, the drive may just be failing. Replacing with an SSD would not only get rid of the suspect hardware, but would be an upgrade to boot. You can clone the drive, or just start fresh with the backups you have.

[–] [email protected] 1 points 4 months ago

That's what I'd do here, used enterprise SSDs are dirt cheap on fleaBay

[–] [email protected] 1 points 4 months ago

I may end up having to go that route. I'm no expert but aren't you supposed to use different parameters when using SSDs on ZFS vs an HDD?