responsible for administering diverse HPC clusters, scientific applications relying on the HPC infrasructure, high-speed interconnects, consulting, problem solving, helping with users' job optimization for HPC runs, optimizing current HPC job scheduling environment and design of new HPC clusters in data center facilities.
The position includes designing and building HPC infrastructure, both to the central facility and to niche HPC clusters, apply best practices for scheduling and job resource management & create benchmarking tools.
You will provide support to a broad spectrum of HPC users, assist in porting code to HPC environment, compile, and configure applications for the HPC environment.
You will have to analyze the mix of users' jobs requirements and implement job queues and scheduler configuration to increase the computational resources usage to a maximum, while still satisfying job turnaround time.
Track usage and provide reports for clusters usage.
Create and maintain user and system level documentation of the systems.
Keep users updated of software and system changes.
Manage users conventions.
Write online tutorials.
You will provide on-call support as part of rotating scheduele
Requirments:
- Unix/Linux administration - 5 years experience
- Fluent in scripting languages
- Software development expirience
- Ability to plan, organize and prioritize many tasks and assume lead role in activity
- Excellent interpersonal, presentation and customer service skills
- SGE / PBS / LSF / Condor expirience as a system administrator or support engineer
- Backround in software development
- Scientific background
- Expirience in implementing HPC storage sub-systems (GPFS, Lustre)