Senior Compute SRE (GPU)

Apple Inc

Seattle, WA

Job posting number: #7291889 (Ref:apl-200576383)

Posted: October 30, 2024

Job Description

Summary

Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Join the Apple Services Engineering team as a Site Reliability Engineer to help support and scale cloud services for thousands of development and operations engineers. This is a hands-on role to maintain and improve SRE practices for a private cloud service to accelerate our ability to reliably and consistently deliver thousands of applications.

Description

As a Sr. Site Reliability Engineer you will be responsible for providing the platform for mission critical cloud systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish.

The successful candidate will be highly self-motivated with a passion for excellence, quality and detail. The SRE will not only support operations, but also work closely with the developers and architects within the team to aid in the design and assist with the implementation to improve stability, security and scalability.

AS AN SRE IN THIS TEAM, YOU WILL:
- Design and deploy GPU-accelerated VM and container infrastructure using platforms such as KVM, Qemu, AWS, or Google Cloud.

- Implement GPU-based Kubernetes clusters to support containerized applications and services

- Work with data scientists, developers, and other stakeholders to understand requirements and provide solutions for GPU-accelerated tasks.

- Implement best practices for security, scalability, and high availability environments.

- Monitor and optimize resource utilization to ensure performance and cost-efficiency.

- Actively participate in capacity planning, scale testing, and disaster recovery exercises.

- Able to troubleshoot issues across the entire infrastructure stack

- Cultivate and maintain relationships with internal and external third-party vendors.

Minimum Qualifications

- 5+ years in a Site Reliability Engineering, DevOps, or Infrastructure focused role
- Proven experience with GPU-based virtual machine infrastructure and cloud platforms (e.g., AWS, GCP).
- Experience with GPU hardware (e.g., NVIDIA, AMD) and associated software stack (e.g., CUDA, cuDNN).
- Experience with GitOps, CI/CD tools, and deployment strategies like Spinnaker, Argo
- Ability to implement and coordinate telemetry using monitoring and observability tools such as Splunk, Grafana, and Prometheus
- Outstanding organizational and communications skills

Preferred Qualifications

- Strong verbal and written communication skills
- Knowledge of Kubernetes, including deployment, management, and optimization of clusters.
- Automation advocate - you truly believe in removing operational load via software.
- A strong sense of ownership. At the same time, you're a great teammate who communicates clearly and transparently - Self-motivated, inquisitive, and always looking to learn more.
- Experience managing, scaling, and troubleshooting Golang and GPU applications.
- Ability to work independently and manage multiple priorities effectively.
- CNCF Kubernetes Administration certification

Pay & Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $166,600 and $296,300, and your base pay will depend on your skills, qualifications, experience, and location.
Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Apply Now

Job posting number:	#7291889 (Ref:apl-200576383)
Application Deadline:	Open Until Filled
Employer Location:	Apple Inc Jacksonville,Florida United States
More jobs from this employer