Langsung ke konten utama
Daftar untuk melamar

Sudah punya akun? Masuk

Kembali ke Lowongan

Senior Principal Site Reliability Engineer

Bangun arsitektur reliabilitas AI Akamai, dari GPU hingga inferensi global

Rancang dan implementasikan arsitektur reliabilitas untuk layanan AI Akamai, termasuk SLO dan otomatisasi. Fokus pada peningkatan efisiensi operasional tim SRE.

Kenapa Menarik?

Dapatkan pengaruh langsung pada tim *product engineering* melalui keahlian teknis yang mendalam

Skills Wajib

SREReliability ArchitectureAutomationObservabilityCloud ComputingCapacity Planning

Keywords

SRE AIReliability EngineerGPU ComputeAkamaiCloud InfrastructureSLO
Lihat Deskripsi Asli dari Jobicy

Deskripsi asli dari Jobicy

Do you want to shape the future of AI infrastructure?Ready to define the reliability architecture for AI products, from GPU compute to globally distributed inference, ensuring performance and reliability at... Do you want to shape the future of AI infrastructure? Ready to define the reliability architecture for AI products, from GPU compute to globally distributed inference, ensuring performance and reliability at scale. Join the Akamai AI Team Akamai's Cloud Technology Group offers AI infrastructure globally. The GPU compute platform provides dedicated resources, from single GPUs to full clusters. These resources support training, simulation, inference, and various workloads. Site Reliability Engineering is integrated early to guarantee production-grade reliability and performance. Partner with the best As Senior Principal SRE for AI, this role involves setting technical direction for building, operating, and scaling AI services. Responsibilities include writing code, designing systems, and solving complex reliability issues. Additionally, mentoring team members, defining technical standards, and promoting engineering best practices are essential. Success depends on achieving influence with product engineering teams through exceptional technical expertise. As a Principal Site Reliability Engineer, you will be responsible for: Defining the reliability architecture for Akamai's AI compute and platform services, including SLO frameworks, fault tolerance patterns, and capacity


Bagikan lowongan ini

Bantu teman kamu menemukan kerja remote berikutnya.


Sumber
Jobicy
Tipe Pekerjaan
full time
Lokasi
Regional Remote · Remote
Kategori
Engineering
Level
senior
Diposting
29 Mar 2026