Posted 10 months ago
HPC Tech Lead
Location: Cambridge, United Kingdom or United States
The Linaro Datacenter Cloud Group (LDCG) was established in November 2012 as the first segment focused group within Linaro. The group was established to accelerate Linux ARM server ecosystem development and it extended the list of Linaro members beyond ARM silicon vendors to Server OEM’s and commercial Linux providers. LDCG now consists of a HPC Special Interest Group established in August 2016, with an aim to drive adoption of ARM in HPC.
This joint collaboration focuses on identifying and addressing gaps/optimizations in the ARM Linux server software ecosystem, enabling SoC support upstream to meet HPC requirements, agreeing HPC requirements for SoC software standardization and upstreaming all relevant output.
The HPC SIG steering committee defines the high level tasks – be it heavy lifting “mighty” projects or quick “low hanging fruit” optimizations – and the corresponding priorities and timelines as “cards”. The technical leaders in the engineering team then analyze each card and break it into multiple smaller and more detailed tasks.
The HPC SIG group is newly established and expect the engineering team to grow. We are seeking a candidate to lead this highly technical effort from architecting technical solutions through to coaching upstream development practices.
The candidate shall be experienced in deployment and orchestration of HPC clusters, with deep understanding of scientific computation. Central to the role, will be the continuous integration efforts for compute and storage workloads as well as the underlying cloud infrastructure. In addition, the candidate should have knowledge of containers, such as Docker, LXC and others and how this applies within both the infrastructure layer, as well as container workloads.
Performance and stability of HPC with the ARM architecture, is the key tenant for this role. The candidate will drive the efforts to identify the representative real-life use cases and test suites for HPC deployments at scale. The ultimate goals are to identify performance bottlenecks for optimization as well as competitive advantage from the ARM server scale-out architecture vs more traditional scale-up approaches.
The desired individual is someone who will bring knowledge and creativity to the position as well as have the discipline to drive execution and follow established fundamental engineering processes. The candidate will lead a team of about four to five engineers in addition to his/her technical tasks.
Travel: International travel is required for approximately 1 week duration up to 3-4 times per year to attend Linaro Connect events as well as engineering sprints or public conferences.
- Demonstrated senior engineering management responsibility at technical leader level for a team working in Linux, operating system and/or open source software
- Experience in configuring and optimising HPC infrastructure, as well as compute and storage workloads.
- Experienced with scientific computational libraries, MPICH, FFTW, BLAS, cuBLAS, LAPACK, openBLAS, MPI, ScaLAPACK
- Experienced with compiler technology, GCC, LLVM, openMP and Fortran
- Experienced with devops tooling, such as Docker, Heat, Ironic, Juju and Ansible.
- Desired experience with hardware acceleration, CCIX, FPGAs, GPGPUs
- Familiarity with server systems and cloud architecture
- Familiarity with hyper-scale computing and micro-server, scale up vs scale out
- Suse, CentOS, Canonical Ubuntu, Red Hat Enterprise Linux or Fedora Linux experience
- Familiarity with product delivery including Agile release methodology
- Familiarity with open source tools, culture and processes
- Very good written and verbal communication skills
- Comfortable with online communication and collaboration such as mailing lists, wiki and phone conferences
- Experience working with the ARM and open-source communities is highly desirable
- Active participation in the openHPC project a major plus