Safaricom is the leading provider of converged communication solutions in Kenya. In addition to providing a broad range of first-class products and services for Telephony, Broadband Internet and Financial services, Safaricom seeks to uplift the welfare of Kenyans through value-added services and support for community projects.
Read more about this company
Proactively building and implementing monitoring services, including end to end monitoring, scripting and automation, modern tooling and maintenance software.
Use of AI and Machine learning to perform log analysis and create predictive models that will assist in identifying potential failures.
Design, develop and support Inhouse Observability platform.
Design and maintain scalable, high-availability observability pipelines and dashboards for microservices and cloud infrastructure.
Define and enforce SLO/SLI/ SLA/ Error budgets standards, set actionable alerts, and drive continuous reliability improvements.
Partner with SRE, DevOps, Development Squads and security teams to instrument services using OpenTelemetry and related tooling.
Build custom Agents, exporters, collectors or integrations where off-the-shelf solutions fall short.
Job Requirements:
Bachelor’s Degree in either Computer Science, Software Engineering, Business Information Technology, or any other relevant field.
Domain knowledge in Sysadmin especially Linux, Linux Kernel.
Strong skills in Go, Rust and a scripting language like Python or Bash for building custom exporters, scripts and integrations.
Technical understanding of SRE Practices with respect to providing stable services to customers and adhering to availability KPIs, Service Level Objectives, Service Level Indicators & conforming to target monthly error budget.
Proven experience with multiple observability platforms (Prometheus/Grafana, ELK/Elastic, Dynatrace, etc.).
Deep knowledge of manual and auto-instrumentation using OpenTelemetry SDK and Collector.
Hands-on experience with Kubernetes especially Openshift distro.
Proficiency with Ansible/ Rundeck/ Helm and integration of observability into build and deployment pipelines.
Conversant with both ITIL & Agile ways of working.