Skip to content

Director - Cloud Site Reliability Engineering

General Information

Ref #:


Travel Amount Required:

Up to 25%

Job Type:

Regular-Full Time


Weston - Florida - USA

Description & Qualifications


In this compelling leadership position, you will grow and develop a team of technologists with a focus on transformation. You will be responsible for building and executing a vision to implement best practices and best-of-breed tooling for capacity management, incident management, availability, problem management, auto remediations and machine learning. You will be tasked with evangelizing and socializing the Site Reliability Engineering discipline across the Cloud organization, serve as a change agent for driving service prioritization and help promote a culture of continuous improvement measured by operational metrics and KPIs. You will apply your expertise in software and systems engineering to ensure that our mission critical systems meet the appropriate performance needs of our users. In this role, you will be expected to strategize portfolio / program reliability by working with cross-functional IT organizations and build roadmaps to drive reliability into our systems, enable the enterprise to standardize and adopt application reliability metrics and improve application health. The Impact You Will Make:
Working with teams across the organization, you will constantly innovate to enhance our service delivery to all internal and external customers through both technical and operational improvements while delivering on the following responsibilities:
• Set strategy and collaborate with key stakeholders across Cloud, Product Engineering, Architecture and Security teams developing roadmap for team aimed towards reducing the operational overhead of keeping applications healthy, secure, and available for our customers
• Serve as a guide and mentor to members of the Cloud Platform SRE teams to aid in their growth and development
• Drive service reliability by developing tooling that enables metric visibility using SLIs, SLOs, and SLAs focusing on improving customer experience
• Advocate for and drive the implementation of reliable design patterns
• Promote simplicity in solving complex problems across our technology footprint
• Lead and focus teams on root cause analysis, pattern identification and continuous improvement in order to optimize application performance, resiliency and reliability
• Look for opportunities that will drive operational efficiencies while reducing costs


10+ years of relevant professional experience
• Bachelor's degree in related field
• Experience setting strategic vision for an enterprise-wide practice or capability, communicating and selling the vision to leadership, stakeholders and the team
• Exemplary leadership and communication abilities (both verbal and written) are a must; this role will partner closely with business and technology executives in a highly matrixed structure
• Experience with hands- on Senior Systems Reliability Engineers and providing senior level technical direction on enterprise level projects
• Experience collaborating cross-functionally on availability/ performance issues to identify root-cause, determine areas for improvement, and drive those actions to closure through effective solutions
• Adept at managing project plans, resources, and people to ensure successful project completion in an Agile / Scrum environment
• Proven track record of improving reliability, availability, incident management and performance of cloud services
• Proven experience managing software development lifecycle platforms and tools and/or designing, building, servicing, and driving ongoing improvement of service infrastructure systems
• Experience designing and developing highly available systems that utilize load balancing, horizontal scalability, and high availability
• Experience in Chaos Engineering concepts and practices
• Experience defining, measuring, and improving Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Operations Processes (Incident, Problem Management), and Operations Toil Reduction through Automation
• Experience driving the development of dashboards from application and infrastructure health perspectives

Company Overview

Here at UKG, Our Purpose Is People. UKG combines the strength and innovation of Ultimate Software and Kronos, uniting two award-winning, employee-centered cultures. Our employees are an extraordinary group of talented, energetic, and innovative people who care about more than just work. We strive to create a culture of belonging and an employee experience that empowers our people. UKG has more than 13,000 employees around the globe and is known for its inclusive workplace culture. Ready to be inspired? Learn more at

EEO Statement

Equal Opportunity Employer

Ultimate Kronos Group is proud to be an equal opportunity employer and is committed to maintaining a diverse and inclusive work environment. All qualified applicants will receive considerations for employment without regard to race, color, religion, sex, age, disability, marital status, familial status, sexual orientation, pregnancy, genetic information, gender identity, gender expression, national origin, ancestry, citizenship status, veteran status, and any other legally protected status under federal, state, or local anti-discrimination laws. 

View The EEO is the Law poster and its supplement. 

View the Pay Transparency Nondiscrimination Provision

UKG participates in E-Verify. View the E-Verify posters here.

Disability Accommodation

For individuals with disabilities that need additional assistance at any point in the application and interview process, please email or please call 1 (978) 250 9800.