Staff Site Reliability Engineer at Wikimedia Foundation June, 2025

Oops! It seems this job from Wikimedia Foundation has expired
View current and similar jobs below

- Staff Site Reliability Engineer at Wikimedia Foundation
- View Jobs in NGO / Non-Profit Associations / View Jobs at Wikimedia Foundation
Posted: Jun 19, 2025

Deadline: Not specified
- Save
- Email
- @gmail.com
- @yahoo.com
- @outlook.com
Never pay for any notarisation, certificate or assessment as part of any recruitment process. When in doubt, contact us

The Wikimedia Foundation is the nonprofit that hosts Wikipedia and our other free knowledge projects. We want to make it easier for everyone to share what they know. To do this, we keep Wikipedia and Wikimedia sites fast, reliable, and available to all. We protect the values and policies that allow free knowledge to thrive. We build new features and tools...
Read more about this company

Staff Site Reliability Engineer
- Job Type Remote
- Qualification BA/BSc/HND
- Experience 7 years
- Location Nairobi
- Job Field ICT / Computer
As a Staff SRE specializing in ML infrastructure, your primary responsibility is designing, developing, maintaining, and scaling the foundational infrastructure that enables Wikimedia's Machine Learning Engineers and Researchers to efficiently train, deploy, and monitor machine learning models in production.

You will be responsible for:
- Designing and implementing robust ML infrastructure used for training, deployment, monitoring, and scaling of machine learning models.
- Improving reliability, availability, and scalability of ML infrastructure, ensuring smooth and efficient workflows for internal ML engineers and researchers.
- Collaborating closely with ML engineers, product teams, researchers, SREs, and the Wikimedia volunteer community to identify infrastructure requirements, resolve operational issues, and streamline the ML lifecycle.
- Proactively monitoring and optimizing system performance, capacity, and security to maintain high service quality.
- Providing expert guidance and documentation to teams across Wikimedia to effectively utilize the ML infrastructure and best practices.
- Mentoring team members and sharing knowledge on infrastructure management, operational excellence, and reliability engineering.
Skills and Experience:
- 7+ years of experience in Site Reliability Engineering (SRE), DevOps, or infrastructure engineering roles, with substantial exposure to production-grade machine learning systems.
- Proven expertise with on-premises infrastructure for machine learning workloads (e.g., Kubernetes, Docker, GPU acceleration, distributed training systems).
- Strong proficiency with infrastructure automation and configuration management tools (e.g., Terraform, Ansible, Helm, Argo CD).
- Experience implementing observability, monitoring, and logging for ML systems (e.g., Prometheus, Grafana, ELK stack).
- Familiarity with popular Python-based ML frameworks (e.g., PyTorch, TensorFlow, scikit-learn).
- Strong English communication skills and comfort working asynchronously across global teams.
Check how your CV aligns with this job

Method of Application

Interested and qualified? Go to Wikimedia Foundation on job-boards.greenhouse.io to apply

Build your CV for free. Download in different templates.
Share
- Save
- Email
- Report
Send your application

Your Name Your Email Your Phone Number Your Current Location Subject of your Application Your cover letter
Attach your CV/Doc

View All Vacancies at Wikimedia Foundation Back To Home

Related Companies Hiring Now

Career Advice

Intelligence-Led Recruitment in Kenya: A Smarter Way for Companies to Hire and Retain Talent MyJobMag Kenya launches its intelligence-led recruitment service to help companies hire smarter using data, insights, and proven success patterns, improving retention and overall hiring outcomes.
How to Network Professionally at Career Events (Plus Templates) Networking at career events can open doors to new opportunities. Discover everything you need to network professionally and make meaningful connections.
60 Behavioural Interview Questions That Expose a Candidate If you’re trying to figure out whether someone is a good fit for your team, emotionally aware, or a strong leader, these questions can help you see who they really are before you hire them.
25 Signs Your Job Interview Went Really Well In this article, we discuss 25 clear signs that your interview probably went really well. These are simple hints that recruiters and employers often show.

View All Career Advice

Send this job to a friend

Did you notice an error or suspect this job is scam? Tell us.

Staff Site Reliability Engineer at Wikimedia Foundation

Staff Site Reliability Engineer

Method of Application

Send your application

Related Companies Hiring Now

Career Advice

Send this job to a friend

Did you notice an error or suspect this job is scam? Tell us.

Staff Site Reliability Engineer at Wikimedia Foundation

Staff Site Reliability Engineer

Method of Application

Send your application

Related Companies Hiring Now

Career Advice

Subscribe to Job Alert