About the Eclipse Foundation
The Eclipse Foundation is a globally recognized nonprofit organization that
supports a vibrant community of open source projects and contributors. With a
commitment to vendor neutrality and transparency, we provide a collaborative
environment for innovation across industries including cloud, edge, AI, and
developer tooling. Our team is remote-first, inclusive, and passionate about
open source.
Position Summary
We are seeking a Senior Manager, Site Reliability Engineer to lead and evolve
the infrastructure supporting critical services used by millions of community
members, including the Open VSX Registry. Reporting to the Director of IT, you
will be leading the transformation of services towards a 24/7 highly available
state, with strong security practices, alongside planning, uptime, incident
response, roadmap execution, and long-term sustainability.
This role is central to our mission of empowering developers, enabling
collaboration, and ensuring user freedoms by delivering services that are
secure, resilient, and aligned with the strategic goals of the Foundation.
Location: Ottawa, Ontario. Must be able to physically go to a data centre when
needed to assist with physical work.
What You’ll Do
- Architect and manage Kubernetes deployments for Open VSX in production
environments
- Oversee PostgreSQL and ElasticSearch clusters, ensuring data integrity,
performance, and scalability
- Implement and refine monitoring, alerting, and incident response systems to
maintain high service reliability
- Collaborate with development teams to improve CI/CD pipelines and deployment
workflows
- Partner with the Security team to implement and uphold organisational
policies and secure-by-design practices
- Lead root cause analysis and postmortems for service disruptions, driving
continuous improvement
- Provide technical leadership and mentorship to junior operations staff
- Engage with the community and users to resolve support issues and gather
feedback
- Maintain documentation and contribute to operational playbooks
- Define and report on service KPIs, SLOs, and operational health indicators
- Provide strategic advice to leadership on platform operations and technology
decisions
- Contribute to annual planning cycles by informing resource needs, tooling
requirements, and infrastructure budgeting
What You’ll Bring
- 5+ years of experience in site reliability engineering, DevOps, or IT
operations
- Deep expertise in Kubernetes, Helm, and container orchestration
- Strong experience with PostgreSQL and ElasticSearch in production
environments
- Proficiency in monitoring and observability tools (e.g., Prometheus, Grafana,
ELK stack)
- Solid scripting and automation skills (e.g., Bash, Python, Ansible)
- Familiarity with GitHub Actions or similar CI/CD tools
- Excellent troubleshooting skills and a proactive mindset
- Ability to work independently in a remote, multicultural team
- Bonus: experience supporting open source infrastructure or registries
- Excellent communication skills
Why Join Us
- Competitive compensation and benefits
- Flexible work hours and remote-first culture
- “Corporate Recharge” days and right-to-disconnect policy
- Opportunity to shape the future of open source infrastructure
We offer competitive compensation along with a comprehensive benefits package.
We thank all applicants for their interest; however, only those selected for an
interview will be contacted. For more information about the Eclipse Foundation,
please visit our website at eclipse.org.
The Eclipse Foundation respects the dignity and independence of people with
disabilities and is committed to providing accommodation and support throughout
any recruitment process. If you require any special accommodation or support,
please let us know when applying.