Job title: Sr. Systems Engineer
Company: SMART Global Holdings
Job description: The Penguin Solutions™ portfolio, which includes Penguin Computing™ and Penguin Edge™, accelerates customers’ digital transformation with the power of emerging technologies in HPC, AI, and IoT with solutions and services that span the continuum of edge, core, and cloud. By designing highly advanced infrastructure, machines, and networked systems we enable the world’s most innovative enterprises and government institutions to build the autonomous future, drive discovery and amplify human potential.Overview
Penguin Solutions™ is seeking a self-starter with ambition for the purpose of HPC to serve as a Senior Systems Engineer. This engineer will provide dedicated, remote systems administration for complex, integrated environments involving high-performance computing, cloud, and enterprise systems. This position requires substantial technical skills and the ability to understand, document, configure, administer, troubleshoot, and resolve issues in Linux environments. This is a customer-facing position.Responsibilities
- Support a Linux-based, high-performance computing (HPC) and artificial intelligence (AI) environment featuring a wide range of technologies.
- Design, implement and maintain systems automation using Ansible.
- Recommending design improvements for fault tolerance, resilience, efficiency and performance of cluster components.
- Install, configure, and manage compute, storage, and networking infrastructure of Linux servers and virtual machines.
- Deploy, manage and maintain Ethernet and InfiniBand networking systems.
- Render professional, timely, and expert customer support.
- Manage and maintain all system software, specifically Penguin’s Scyld Suite.
- Implement, manage and maintain containerized and virtual environments such as OpenStack, OpenShift, Kubernetes, Singularity, VMware and RedHat Virtualization.
- Fully document processes, procedures, and work performed.
- Establish and maintain configuration management best practices.
- Support other engineers in the delivery of overall service.
- Participate in growing Penguin Solutions’™ technical capabilities through knowledge-sharing and team activities.
QualificationsMinimum Qualifications
- Bachelor’s Degree in Information Technology, Computer Science, or a related field of study (or equivalent experience in systems administration/engineering).
- 8+ years of hands-on experience with Linux (RedHat distributions preferred) server environments.
- Practical knowledge of the administration of high-performance computing (HPC) technologies or similar clustered Linux environments.
- Demonstrated scripting ability to support automation and system administration activities (ie. Ansible, Python, BASH, Perl).
- Familiarity with IPMI tools and BMC configuration and troubleshooting.
- Understanding of provisioning systems and the kickstart process.
- Ability to quickly learn new or unfamiliar technology and products independently using documentation and internet resources.
- Excellent analytical and problem-solving skills.
- Ability to communicate technical details clearly and effectively with team members and clients.
- Exceptional sense of accountability and ownership, with a history of delivering results.
- Superior interpersonal and communication skills, both verbal and written, with an aptitude towards customer service.
- Track record of being a self-starter and effective independent contributor while being comfortable working in a team environment.
- Ability to think intuitively to design and implement new solutions to technical problems.
- Must be a US Citizen.
Preferred Qualifications
- Experience with Cumulus Linux and NVIDIA/Mellanox InfiniBand network operating systems.
- Knowledge in building and/or working with hypervisors and in virtual environments, such as VMWare, KVM, RHHI-V, etc.
- Proven expertise with containers and orchestration such as Singularity, Docker, Kubernetes, etc.
- 2+ years of hands-on experience configuring and supporting large scale Ethernet and InfiniBand networks.
- Experience operating and maintaining secure operating system configurations including encrypted filesystems.
- Thorough understanding of HPC components, such as job schedulers, licensing services, node provisioning, GPUs, remote visualization, etc.
- Experience supporting parallel filesystems such as WEKA and object storage systems such as Ceph.
- Technical documentation creation and management; including documenting existing environments.
- RedHat certifications and/or network-related certifications will be looked upon favorably.
Location
- This is a remote position in the United States.
Travel
- Minimal travel may be required.
Compensation & Benefits
The expected pay range for this position in the United States is $110,000 – $145,000; the compensation ultimately offered within the expected range may vary based on business considerations, including job-related knowledge, skills, experience, and education. The position is bonus eligible and there is a range of medical, dental, and vision benefits available, as well as a 401k saving plan and other benefits and plans, such as Paid Time Off, Life Insurance, and an Employee Assistance Plan.Diversity and Inclusion Statement
SGH, together with its affiliates, is committed to creating a diverse environment that embraces differences and fosters inclusion.Equal Opportunity Statement
We are an Affirmative Action/Equal Opportunity Employer and strongly committed to all policies which will afford equal opportunity employment to all qualified persons without regard to age, national origin, race, ethnicity, creed, gender, disability, veteran status, or any other characteristic protected by law.
Expected salary: $110000 – 145000 per year
Location: USA
Job date: Sat, 24 Aug 2024 01:40:00 GMT
Apply for the job now!