Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having more than a certain amount of snapshots/filesystem brings down system #87

Open
alexanderharm opened this issue Nov 13, 2023 · 19 comments
Labels
enhancement future enhancement in a future version

Comments

@alexanderharm
Copy link

First of all thanks for this plugin, I really appreciate being able to use OMV together with ZFS.

Today I was bitten by the problem that the zpool-overview is generated by running zfs list -t all which takes a certain amount of time depending on your objects. In my case there are a lot of snapshots so the WebGUI simply times out and since the call never finishes it just starts spawning new zfs list -t all until the system runs out of memory and the kernel crashes.

I wonder if one could take another approach and separate zpool in actual zpool(s), filesystems and volumes. And maybe one could limit the snapshot list to a previously selected filesystem...

This leaves the bug that the command gets issued over and over again as long as you remain on the zpool page in the webgui.

My PHP-foo is poor but I would be willing to spend some time on this. But I would not want to do this without being sure that this direction has some support among the creators of this plugin.

@ryecoaaron
Copy link
Member

I’m not sure how this could be separated but more than willing to let you try. I just don’t the changes to make the plugin harder to use for people who have few objects. Just out of curiosity, how many snapshots are we talking about?

@alexanderharm
Copy link
Author

Depends... I found it rather confusing that I couldn't set zfs properties like xattr on the root level since it is only listed as zpool (doesn't have this property) and not as zpool and filesystem. I guess there are thousands of snapshots and too many. I will clean them up and test again. But the timeout on the web-gui is really low, so I believe not much is needed to exceed it and potentially run into above issue.

@alexanderharm
Copy link
Author

I tried a quick fix for myself (in case someone else would click that button) and removed -t all from the code in /usr/share/omvzfs/Utils.php but the change is not picked up. Do I need to run a command similar to omv-workbench to commit the change?

@ryecoaaron
Copy link
Member

It was hard porting the zfs plugin from omv 5.x to 6.x since the 5.x plugin used a custom treeview that isnt possible in 6.x. If a feature is missing, I either missed it or didnt know it existed. In the xattr case, I didn’t know root settings were a thing. Another good reason I shouldn’t be maintaining the zfs plugin but no one else will.

The php timeout is omv-wide but thousands of snapshots is hard to deal with no matter what.

If you change php, you have to restart omv-engined. sudo monit restart omv-engined

@alexanderharm
Copy link
Author

I would just start with some minor changes, like splitting the zpool overview over three tabs of zpools, filesystems, volumes and adjust the table accordingly. Additionally maybe filtering out docker/* to not mess up the overview...

Do you have an idea on how to prevent the commands from being re-triggered when webgui connection re-establishes after a timeout? Work with a lock file?

@ryecoaaron
Copy link
Member

Splitting the tab would make it more like the LVM plugin. I'm not a fan of filtering out anything because people are going to want to customize it.

The datatables can be set to manual refresh or refresh every X seconds. It doesn't know if the previous run timed out. I would avoid lock files because of the repair work required if a lock file is stranded. All three tabs of the LVM plugin are manual refresh. Maybe that is the way to go.

@ryecoaaron
Copy link
Member

I think these changes should be implemented in the OMV 7.x version as well. This is a big change for this late in the OMV 6.x release cycle.

@ryecoaaron ryecoaaron added enhancement future enhancement in a future version labels Feb 18, 2024
@GiorgioAresu
Copy link

I don't think it's an issue with zfs list -t all command, running it from ssh on my odroidhc4:

# zfs list -t all | wc -l
352
# time zfs list -t all
[...]
real	0m0.795s
user	0m0.091s
sys	0m0.611s

Looking at htop while opening the web ui though, I can see the CPU skyrocketing to 100% of IO-wait until seconds after I leave the page (which I'd say it's expected), but a few zfs get omvfsplugin (can't find the exact command now) and one per dataset, continuosly and ever changing zfs set omvzfsplugin:uuid={some-uuid} {dataset-name}. In fact, if I do watch zfs get omvzfsplugin:uuid {dataset-name} I can see it changing continuosly. I think this is what's taking a long time and generating timeouts on the web ui

@ryecoaaron
Copy link
Member

Would you care to test this build where I removed this line?

wget https://omv-extras.org/testing/openmediavault-zfs_7.0.6_amd64.deb -O openmediavault-zfs_7.0.6_amd64.deb
sudo dpkg -i openmediavault-zfs_7.0.6_amd64.deb

@GiorgioAresu
Copy link

Sure, but I forgot to specify it's an arm64 (v8, if you need more detailed specs: https://wiki.odroid.com/odroid-hc4/hardware/hardware#specifications). I tried changing the link but I get 404. If you build it for that, I'll gladly test it and report back. Thanks

@ryecoaaron
Copy link
Member

You should be able to download the arm64 version now.

@GiorgioAresu
Copy link

GiorgioAresu commented Jul 1, 2024

The pools and datasets pages now load and refresh in ~1s consistently. I was able to trigger other timeouts by creating a new dataset and then destroying it, that triggered another round of zfs set omvzfsplugin:uuid=[...] for, it seems, all datasets, but at the moment I'm running a replication task from truenas, so the system is already taxed by that, especially considering this SBC hardware. Despite a handful timeouts, it's still able to load the table, so I´d say that fixes it. Please let me know if you want me to run more tests (I'm keeping this version installed).

@ryecoaaron
Copy link
Member

I didn't remove the line from the create or delete. I need to see if I can setup a test VM to replicate this to make it easier for me to test changes.

@ryecoaaron
Copy link
Member

After testing on a setup with 500 filesystems and 5000 snapshots, I found the problems. There was a major bug that caused a lot of the problems but I also made a few more changes:

a43bd74
91955a5

Please test with this version:

wget https://omv-extras.org/testing/openmediavault-zfs_7.1_amd64.deb -O openmediavault-zfs_7.1_amd64.deb
sudo dpkg -i openmediavault-zfs_7.1_amd64.deb

@GiorgioAresu
Copy link

GiorgioAresu commented Jul 3, 2024

The page is even better now, so the experience is better to just go give a look at the status, but the creation/destroying seems a little flaky. When deleting, it doesn´t go away until the page is refreshed. If I create a dataset, it appears on the page, but (even refreshing it) if I don´t do a discover, it won´t delete it. Instead it'll throw an error, the actual HTTP response being:

{
    "response": null,
    "error": {
        "code": 0,
        "message": "No such Mntent exists",
        "trace": "OMVModuleZFSException: No such Mntent exists in \/usr\/share\/omvzfs\/Utils.php:84\nStack trace:\n#0 \/usr\/share\/openmediavault\/engined\/rpc\/zfs.inc(367): OMVModuleZFSUtil::deleteOMVMntEnt()\n#1 [internal function]: OMVRpcServiceZFS->deleteObject()\n#2 \/usr\/share\/php\/openmediavault\/rpc\/serviceabstract.inc(122): call_user_func_array()\n#3 \/usr\/share\/php\/openmediavault\/rpc\/rpc.inc(86): OMV\\Rpc\\ServiceAbstract->callMethod()\n#4 \/usr\/sbin\/omv-engined(544): OMV\\Rpc\\Rpc::call()\n#5 {main}"
    }
}

It's still an improvement though

@ryecoaaron
Copy link
Member

I might've removed the fixOMVMnt function from too many places. I will do more testing.

@ryecoaaron
Copy link
Member

I made a few improvements. You can download from same url as last.

@GiorgioAresu
Copy link

I'm now also able to create and delete datasets without issues, thanks!

@ryecoaaron
Copy link
Member

7.1 is in the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement future enhancement in a future version
Projects
None yet
Development

No branches or pull requests

3 participants