Job Opportunities


HPC Slurm Cluster Engineer

Apply Now

Type : Full Time

Location : Hickory, NC

Overview
We are seeking a highly skilled HPC/AI/ML Cluster Engineer to support the design, deployment, and ongoing operations of large-scale HPC environments powered by Slurm. This role centers on cluster engineering, administration, and performance optimization, with emphasis on GPU-accelerated computing, advanced networking, and workload scheduling. In this role, you will work closely with our researchers, vendors, and partners to manage Slurm clusters that are used for AI/ML workloads. You will work alongside our team to support in-house, partner, and customer infrastructure

Responsibilities

Cluster Engineering & Deployment

  • Participate in the design and bring-up of bare metal HPC/AI/ML environments

  • Integrate heterogeneous hardware platforms into cohesive scheduling environments.

  • Develop provisioning and imaging workflows (Ansible, MAAS, cloud-init, CI/CD pipelines) for reproducible cluster build-out.

  • Coordinate communications between vendors, researchers, and other partners during cluster bring-up and operation.

Slurm Management

  • Configure and operate the Slurm Workload Manager.

  • Build custom Slurm plugins and scripts (epilog/prolog, pam_slurm_adopt) to extend functionality and integrate with authentication, health checking, and monitoring.

  • Manage federated Slurm setups across multi-site or hybrid cloud environments.

System Administration & Monitoring

  • Administer Linux HPC environments, including network configuration, storage integration, and kernel tuning for HPC workloads.

  • Deploy and maintain observability stacks for system health, GPU metrics, and job monitoring.

  • Automate failure detection, node health checks, and job cleanup to ensure high uptime and reliability.

  • Manage security and access control (LDAP/SSSD, VPN, PAM, SSH session auditing).

User & Stakeholder Support

  • Assist cluster users with developing workflows that make efficient use of compute resources.

  • Containerize HPC applications with Docker/Podman/Enroot-Pyxis and integrate GPU-aware runtimes into Slurm jobs.

  • Automate cost accounting and cluster usage reporting.

Qualifications

  • Previous experience in HPC cluster administration and engineering, with deep knowledge of Slurm.

  • Expert in Slurm configuration, partition design, QoS/preemption policies, and GRES GPU scheduling.

  • Strong background in Linux system administration, networking, and performance tuning for HPC environments.

  • Hands-on experience with parallel file system, advanced networking (InfiniBand, RoCE, 100/200 GbE), and monitoring stacks.

  • Proficient with automation tools (Ansible, Terraform, CI/CD pipelines) and version control.

  • Demonstrated ability to operate GPU-accelerated clusters at scale.

  • Exceptional candidates have familiarity with common AI/ML software package dependencies and researcher workflows


Early Career CFD / Physical Oceanographer

Apply Now

Job Type : Full Time

Location : Hickory, NC

About the Role

We are seeking a motivated early career researcher with expertise in computational fluid dynamics (CFD) and physical oceanography to contribute to the development and application of advanced modeling tools. The position will focus on extending the capabilities of the OceanParcels framework to support unstructured grids and particle-in-cell (PIC) methods, as well as running high-resolution simulations with the MITgcm model.

This role is ideal for a candidate excited about developing new numerical methods for geophysical flows, working at the intersection of high-performance computing and ocean science, and applying state-of-the-art tools to cutting-edge problems in fluid dynamics and oceanography.

Responsibilities

  • Develop and extend Python-based modules within the OceanParcels framework, with emphasis on:

  • Supporting unstructured mesh representations.

  • Implementing and optimizing particle-in-cell methods.

  • Collaborate on design, testing, and benchmarking of new algorithms to ensure compatibility with existing OceanParcels classes (e.g., FieldSet, GridSet, Particle).

  • Run and analyze MITgcm simulations of ocean and climate processes, including pre- and post-processing of model data.

  • Assist with performance evaluation, debugging, and scaling of simulations on high-performance computing platforms.

  • Collaborate with team members to publish results, develop workflows, and contribute to open-source software development.

Qualifications

  • Bachelor’s or Master’s degree in Physical Oceanography, Computational Science, Applied Mathematics, Mechanical Engineering, or a related field.

  • Strong programming skills in Python, including experience with numerical libraries (NumPy, SciPy, xarray).

  • Understanding of unstructured grids, finite element/spectral element approaches, or related CFD methods.

  • Familiarity with particle methods (e.g., PIC, Lagrangian particle tracking) in fluid dynamics or geophysical contexts.

  • Experience running or analyzing simulations with MITgcm or another ocean/climate circulation model.

  • Strong problem-solving skills and ability to work collaboratively in a research software development environment.

Preferred Qualifications

  • Experience with parallel programming or HPC environments (MPI, GPU acceleration, or containerized workflows).

  • Familiarity with modern software development practices (Git/GitHub, testing frameworks, continuous integration).

  • Background in turbulence, mixing processes, or large-scale circulation in the ocean or atmosphere.

  • Prior contributions to open-source scientific software.

What We Offer

  • An opportunity to work at the forefront of CFD and physical oceanography research.

  • A collaborative, interdisciplinary environment with exposure to both applied science and software development.

  • Opportunities for mentorship, conference presentations, and peer-reviewed publications.

  • Hands-on experience with high-performance computing systems and advanced numerical modeling frameworks.