Senior Product Manager, Reliability Platform (Observability, SRE, AIM)
About the position
Responsibilities
• Building and scaling foundational developer platforms that serve as the backbone for our engineering organization.
• Defining and executing a clear product strategy for the Observability, BCDR & Incident Management areas within our internal developer engineering team.
• Leading cross-functional teams to deliver high-impact, developer-facing products in an agile environment.
• Deeply understanding the entire developer workflow-from coding and testing to deployment and operations-and identifying opportunities to remove friction and improve efficiency.
• Owning and prioritizing the product roadmap for a suite of platform services, such as our metrics platform, logging pipelines, alerting systems, on-call and incident response tooling, and BCDR orchestration platform.
• Defining and tracking key operational health metrics, including system availability and Mean Time to Resolution (MTTR), and building tools to help teams manage their services effectively.
• Championing a culture of reliability and ownership by delivering tools that empower developers to build and operate resilient, highly available systems with confidence.
• Identifying and measuring key performance indicators (KPIs) that reflect developer productivity and system health, and using that data to refine the product roadmap.
Requirements
• Bachelor's Degree
• At least 5 years of experience in a technical product management role, such as Developer Tools, Platform Engineering, SRE, Observability, or a related technical field.
• Experience with products in the developer tools, cloud infrastructure, or observability space.
• A proven track record of managing technical products through the full product lifecycle.
• Experience using data to inform product decisions and a strong understanding of how to measure the success of developer-facing tools.
Nice-to-haves
• MBA or equivalent experience
• Direct experience with modern platform engineering concepts, including internal developer platforms (IDPs), service catalogs, and 'Paved Road' engineering.
• Familiarity with modern observability tools (e.g., Grafana), cloud platforms (e.g., Azure, AWS), and container orchestration (e.g., Kubernetes).
• A strong understanding of Site Reliability Engineering (SRE) principles, including SLOs, error budgets, and effective incident management.
• Excellent communication skills and the ability to articulate complex technical concepts to both technical and non-technical audiences.
• A self-starter with a proven ability to operate in an ambiguous, fast-paced environment.
Benefits
• Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family's overall well-being.
• Financial benefits including market-competitive compensation; a 401K savings plan vested from day one that offers a 6% match; performance and recognition-based incentives; and tuition assistance.
• Access to additional benefits like mental healthcare as well as fertility and adoption assistance.
• Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year.
Apply tot his job
Apply To this Job