Posted about 1 year ago
Products and Technology
The Mulesoft Site Reliability Engineering team is a newly formed team and we are looking for an experienced Sr. Manager to help build the overall SRE team and program. This role will report to the Sr. Director, SRE and will be at the heart of Global Operations at MuleSoft. The team is progressively responsible for full stack observability, level one alerting. reliability engineering and incident response for products and services that are on-boarded into the team. We aim to ensure that the Mulesoft Cloud environment remains available to our customers and we aim to proactively address reliability issues before they affect performance or availability.
- Act as 'gatekeeper' for the onbaording of new services into the SRE team to ensure that services accepted into the SRE program have been adequately documented, level 1 response Runbooks have been developed, sufficient monitoring has been put in place and escalation channels for various service owner teams have been agreed
- Own end-to-end availability (SLO/SLA), reliability, and performance of Mulesoft's Cloud Platform by developing processes, metrics and engineering projects that ensure maximum reliability and uptime for our customers
- Hire exceptional SRE talent (in the USA at first) and help build a team with a focus on intrinsic curiosity, fearless assertiveness and exceptional ability at running large scale, highly distributed Production systems in the Public cloud
- Collaborate with the existing Engineering team to understand deployment practices and processes and work towards iteratively improving the release pipeline to ensure a highly resilient deployment strategy, ideally with zero downtime
- Establish an on-call cadence with the team and ensure adequate coverage of key areas
- Foster a healthy and collaborative culture, in line with Salesforce's core values of Trust, Transparency, Equality, Collaboration, Philanthropy, and 'Ohana'
- Participate in 24x7 Site Reliability rotations and escalation workflows
- 5 years+ managing an SRE (or related) team that operates in a public cloud (AWS), highly distributed environment
- Previous experience building a new SRE team from the ground up and progressively increasing the teams responsibilities by coaching the team through the process of on-boarding services into the team
- A passion for SRE/DevOps principals and a clear drive to remove toil and run highly resilient/automated systems
- Experience in managing a release pipeline, from planning to deployment to monitoring impact
- Previous experience working with , New Relic and Sumo Logic
- Strong technical fundamentals including Linux, TCP/IP, Kubernetes/Docker, Jenkins
- Bias towards data driven decisions and ensuring key metrics are agreed on, visible and actionable
- Experience in software development in one or more of the following: Java, Python, Go
- Experience managing an engineering team on projects with technical deep-dives into code, networking, operating systems and/or storage
- BA/BS degree in Computer Science or related technical field, or equivalent practical experience.
Salesforce.com and Salesforce.org are Equal Employment Opportunity and Affirmative Action Employers. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Headhunters and recruitment agencies may not submit resumes/CVs through this Web site or directly to managers. Salesforce.com and Salesforce.org do not accept unsolicited headhunter and agency resumes. Salesforce.com and Salesforce.org will not pay fees to any third-party agency or company that does not have a signed agreement with Salesforce.com or Salesforce.org.
Pursuant to the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, Salesforce will consider for employment qualified applicants with arrest and conviction records.