Never pay for any CBT, test or assessment as part of any recruitment process. When in doubt, contact us
Savannah Informatics is a Kenyan e-Health software company founded by clinicians and finance specialists to deliver interoperable, connected solutions for healthcare facilities, organizations and regions.
Our vision is to enable a better healthcare future for Kenya through pioneering use of information technology and knowledge creation
We are looking for an experienced DevOps engineer to join our Infrastructure team and take ownership of Savannah’s infrastructure, streamlining processes across the product lifecycle—from planning and building to deploying and maintaining applications. This is a strategic and hands-on role that demands expertise in Orchestration, Provisioning, Observability, CI/CD, Security, and Connectivity(Networking) working in collaboration with the leading technical officers to achieve success
You will work closely with developers, product teams, and external stakeholders to ensure scalability, security, and increased performance of Savannah’s infrastructure and systems
Responsibilities
- The ideal candidate for this position will be working on the following:
Site Reliability Engineering (SRE):
- Implement and maintain best practices for ensuring the reliability and availability of web applications and services.
- Implement observability tools, perform advanced debugging, and optimise multi-cloud infrastructures.
- Set up and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs) working with cross-functional teams to develop the documentation
- Lead incident management, post-incident reviews, and root cause analysis to continuously improve system reliability and develop strategies to prevent future occurrences.
- Monitor and update reliability processes during the systems or products’ entire lifecycle for its adherence, improvement and minimizing wastage
- Infrastructure:
- Build, implement and maintain scalable, resilient cloud-based infrastructure using tools like Terraform and Ansible to optimize deployment workflows.
- Scale and optimize resources across multi-cloud environments, ensuring cost-efficiency and performance.
- Design robust CI/CD pipelines using tools like GitLab CI/CD, GitHub Actions, or similar platforms.
- Participate in the design, implementation, and optimization of infrastructure, emphasizing scalability, security, and performance.
- Tackles complex system-level challenges, anticipates future challenges and devise innovative solutions, and drive continuous improvement in infrastructure and processes.
- Developer Experience:
- Support and enhance the development process by providing tools and practices that improve developer productivity.
- Collaborate with software development teams to set up and streamline the CI/CD (Continuous Integration/Continuous Deployment) pipeline.
- Create and maintain development environments, including development, staging, and production environments.
- Have the expertise and assist developers in debugging, performance optimization, and troubleshooting issues in the development lifecycle.
- Implement advanced incident management, post-incident reviews, and proactive improvements to reduce downtime and enhance system reliability.
- Tech Financial Operations:
- Manage and optimise technology-related financial aspects, including budgeting, cost tracking, and cost control.
- Implement and monitor cost-effective solutions for infrastructure and services, optimizing cloud resources.
- Work closely with finance and procurement teams to ensure efficient allocation of technology-related budgets.
- Implement cost allocation models to attribute technology expenses accurately.
- Manage periodic reporting on the FIn Ops progress to the management using advanced cost allocation models and tools to monitor and control technology-related expenses.
- Own strategic initiatives, such as cost optimisation, system reliability, and fostering a culture of observability.
- Leadership and Mentorship
- Actively mentor junior engineers, providing guidance on best practices, technical challenges, and process improvements.
- Align technical operations with business goalsby working with cross-functional teams, to achieve overall success
Skills
The ideal candidate for this position will have the following:
Deep knowledge of Linux systems
- The candidate must have strong skills in Operating Systems(Linux/Ubuntu/Debian), understand their way around a UNIX shell and believe that where there is a shell, there is a way.
- Good computer network skills - He/She/They understands how networks work, the OSI model and protocols including TCP/IP, UDP, ICMP, HTTP(s), DNS, DHCP, SMTP etc.
Virtualization and Containerization technologies
- A comprehensive understanding of cloud platforms (e.g., AWS, GCP), Kubernetes, Infrastructure as Code (IaC) tools (e.g., Terraform), and monitoring systems (e.g., Prometheus, Grafana)
- Strong experience in running production applications on Kubernetes.
- Proficient in multiple backend languages (e.g., Python, Go) and frameworks for automating large-scale operations
- Strong understanding of version control systems i.e. Git + GitLab/GitHub/Bitbucket.
- Experience using popular CI/CD pipeline tools - GitLab CI/CD, Github Actions, CircleCI etc.
- Strong knowledge of DBMS mainly but not limited to PostgreSQL is a must.
Cloud-first Mindset
-
Proficient in Cloud computing, specifically but not limited to Google Cloud Platform and Amazon Web Services. Most of our applications are served from the cloud, therefore it is important to understand how the cloud works including products like GCE/EC2, Cloud Run/EBS, Cloud Functions/Lambda, GKE/EKS, S3/GCS, PubSub/SQS etc.
Automation Mastery
- To avoid the hustle of manual tasks, it is a MUST to have an automation mindset. The main automation tool we utilize is Ansible, therefore it is super important to have strong knowledge in writing/modifying and running playbooks written in Ansible.
- Must possess proficiency in infrastructure as code tools including Terraform and/or Pulumi, CloudFormation etc.
- Proficiency in Kubernetes automation tools e.g. Helm v3 (mostly), Kustomize etc is also required.
Coding Proficiency
- Experience working with a modern programming language eg Python, Golang , C# e.t.c
- Experience working with different API architectures such as REST, GraphQL, RPC e.t.c
Observability
-
We need someone with the ability to collect, analyze, and gain insights from data generated by software and infrastructure to ensure system reliability and performance. This skill includes data instrumentation, monitoring, diagnostics, automation, collaboration, and a commitment to continuous improvement. It's about understanding and improving what's happening within a system in real time to proactively address issues and enhance overall system health.
-
Must have experience in running and integrating applications with observability tools such as Grafana, Prometheus, TICK stack, Google Cloud Monitoring/AWS CloudWatch, OpenTelemetry etc.
Detective Skills
-
We need someone who can detect, analyze, debug and follow up on issues end to end along with the effort to enhance the performance of our applications. They should be able to use either existing tools and techniques to debug and resolve issues and write up RCAs on them, which includes our monitoring stack plus Sentry and other monitoring tools.
Understand the full software stack – and go beyond
-
It is important to understand the whole stack in terms of how our apps are developed, deployed and maintained in order to be faster in reproducing and debugging errors and doing the necessary steps in resolving them. Therefore they should not be limited in terms of their knowledge, not a must at the beginning to know everything but it is important to have the will to learn.