4
submitted 1 year ago by xylan@kbin.social to c/linux@kbin.social

I've been running an HPC system for a science group for a while now and have built a couple of different systems based on common HPC infrastructures (ROCKS or Open HPC). These have been built on top of the rebuilt RHEL distros (mostly CentOS), but I don't really need the level of stability that these provide and would actually like the sort of updates that you get from something like CentOS stream, so this seems like a time to try this.

The problem is that I haven't found an HPC framework which would natively support this so I'm potentially going to have to roll my own. I don't need anything fancy just some way to automatically deploy nodes and set up slurm to get jobs queued.

Any pointers to suitable frameworks or tools which would help with this and which aren't tied to older distros?

you are viewing a single comment's thread
view the rest of the comments
[-] xylan@kbin.social 3 points 1 year ago

The lack of stability is actually quite attractive to me. In a scientific environment we're normally running fairly new, often unstable code, and we often hit problems because of using older versions of libraries / packages / compilers, so somthing which stays a bit more current would be good and we can deal with breakage if it happens. The trouble is the management systems around HPC assume you're working on enterprise systems, which isn't really true in our case.

I've looked at things like OpenHPC but they're still on RHEL8 (RHEL9 is in testing but not released yet), and even lower level tools like warewulf is still only supporting RHEL8 at the moment which is getting too old for me to want to build a new system from it.

I've looked at more generic tools like Ansible and Chef / Puppet but before I go down that rabbit hole I'd like a sanity check that there isn't something more suited that I'm missing.

this post was submitted on 27 Jun 2023
4 points (100.0% liked)

Linux

58 readers
1 users here now

founded 2 years ago