GitLab is an open-core software company that develops the most comprehensive
AI-powered DevSecOps Platform
[https://about.gitlab.com/solutions/devops-platform], used by more than 100,000
organizations. Our mission [https://about.gitlab.com/company/mission] is to
enable everyone to contribute to and co-create the software that powers our
world. When everyone can contribute, consumers become contributors,
significantly accelerating human progress. Our platform unites teams and
organizations, breaking down barriers and redefining what's possible in software
development. Thanks to products like Duo Enterprise
[https://about.gitlab.com/gitlab-duo/] and Duo Agent Platform
[https://about.gitlab.com/blog/gitlab-duo-agent-platform-what-is-next-for-intelligent-devsecops/],
customers get AI benefits at every stage of the SDLC.
The same principles built into our products are reflected in how our team works:
we embrace AI as a core productivity multiplier, with all team members expected
to incorporate AI into their daily workflows to drive efficiency, innovation,
and impact. GitLab is where careers accelerate, innovation flourishes, and every
voice is valued. Our high-performance culture is driven by our values
[https://handbook.gitlab.com/handbook/values/] and continuous knowledge
exchange, enabling our team members to reach their full potential while
collaborating with industry leaders to solve complex problems. Co-create the
future with us [https://www.youtube.com/watch?v=OuZIb5zszQI] as we build
technology that transforms how the world develops software.
Site Reliability Engineers (SREs) are responsible for keeping all user-facing
services and other GitLab production systems running smoothly. SREs are a blend
of pragmatic operators and software craftspeople that apply sound engineering
principles, operational discipline, and mature automation to our environments
and the GitLab codebase. We specialize in systems, whether it be Databases, the
Linux kernel, or some more specific interest in scaling, algorithms, or
distributed systems.
The Database Operations Team
[https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/data-access/database-operations]mission
is to build, run, own and evolve the entire lifecycle of the PostgreSQL database
engine for GitLab.com [http://gitlab.com]. The team is focused on owning the
reliability, scalability, performance & security of the database engine and its
supporting services. The team should be seeking to build their services on top
of Runway
[https://handbook.gitlab.com/handbook/engineering/infrastructure/team/runway/] services
and cloud vendor managed products, where appropriate, to reduce complexity,
improve efficiency and deliver new capabilities quicker.
GitLab.com is a unique site and it brings unique challenges–it’s the biggest
GitLab instance in existence. In fact, it’s one of the largest single-tenancy
open-source SaaS sites in the world.. The experience of our team feeds back into
other engineering groups within the company, as well as to GitLab customers
running Dedicated, Self-Managed and future Cells installations.
Responsibilities
- Automating every operational task is a core requirement for this role. For
example, package updates, configuration changes across all environments,
creating tools for automatic provisioning of user facing services, etc.
- Responding to platform emergencies, alerts, and escalations from Customer
Support.
- Ensure systems exist to manage software life-cycles (e.g. Operating Systems)
with a minimum of manual effort.
- Develop a fully automated multi-environment observability stack based on the
existing SaaS system, and extend it to predict capacity needs based on the
usage patterns.
- Plan for new service roll-outs, expansion and capacity management of existing
services, and work with users to optimize their resource consumption.
As an SRE you will:
- Function as an SRE in building solutions in service of the Database
Operations team’s mission and goals.
- Work with large databases in a dynamic and growth-oriented environment. Many
interesting and challenging problems await solutions.
- Work on database reliability and performance aspects for GitLab.com from
within the Database Operations team as well as work on shipping solutions
with the product.
- Analyze solutions and implement best practices for our main PostgreSQL
database clusters and its components.
- Work on observability of relevant database metrics and make sure we reach our
database objectives.
- Work with partner DBREs and peer SREs to roll out changes to our production
environment and help mitigate database-related production incidents.
- On-Call support on rotation with the team.
- Provide database subject-matter expertise to engineering teams (for example
through reviews of database migrations, queries and performance
optimizations).
- Work on automation of database infrastructure and help engineering succeed by
providing self-service tools.
- Use the GitLab product to run GitLab.com as a first resort and improve the
product as much as possible.
- Plan the growth of GitLab's database infrastructure.
- Support and debug database production issues across services and levels of
the stack.
- Make monitoring and alerting alert on symptoms and not on outages.
- Document every action so your learnings turn into repeatable actions and then
into automation.
You may be a fit to this role if you:
- Have extensive experience as an SRE supporting database operations teams.
- Have strong experience running PostgreSQL at scale in large production
environments.
- Have strong experience with infrastructure automation and configuration
management (Chef, Ansible, Puppet, Terraform…)
- Have solid understanding of SQL and PL/pgSQL
- Significant experience working in a Large SaaS distributed Systems production
environment
- Share our values [https://about.gitlab.com/handbook/values/], and work in
accordance with those values.
- Have an urge to document all the things so you don't need to learn the same
thing twice, and an urge for delivering quickly and iterating fast.
- Have a proactive, go-for-it attitude. When you see something broken, you
can't help but fix it
- Strong data modeling and data structure design skills
- Bonus: Strong programming skills as a (former) backend engineer - Preferably
with Ruby and/or Go.
Projects you could work on:
- Review, analyze and implement solutions regarding database administration
(e.g., backups, performance tuning)
- Work with Terraform, Chef, Ansible and other tools to build mature automation
(automatic setup of new replicas or testing and monitoring of backups).
- Implement self-service tools for our engineers using GitLab ChatOps.
- Provide technical assistance and support to other teams on database and
database-related application design methodologies, system resources,
application tuning.
- Review database related changes from engineering teams (e.g., database
migrations).
- Recommend query and schema changes to optimize the performance of database
queries.
- Jump on a production incident to mitigate database-related issues on
GitLab.com.
- Participate actively in the infrastructure design and scalability
considerations focusing on data storage aspects.
- Make sure we know how to take the next step to scale the database.
- Design and develop specifications for future database requirements including
enhancements, upgrades, and capacity planning; evaluate alternatives; and
make appropriate recommendations.
Performance Indicators
Site Reliability Engineers have the following job-family performance indicators:
HOW GITLAB WILL SUPPORT YOU
Please note that we welcome interest from candidates with varying levels of
experience; many successful candidates do not meet every single requirement.
Additionally, studies have shown that people from underrepresented groups
[https://about.gitlab.com/company/culture/inclusion/#examples-of-select-underrepresented-groups]
are less likely to apply to a job unless they meet every single qualification.
If you're excited about this role, please apply and allow our recruiters to
assess your application.
Country Hiring Guidelines: GitLab hires new team members in countries around the
world. All of our roles are remote, however some roles may carry specific
location-based eligibility requirements. Our Talent Acquisition team can help
answer any questions about location after starting the recruiting process.
Privacy Policy: Please review our Recruitment Privacy Policy.
[https://handbook.gitlab.com/handbook/hiring/candidate-faq/recruitment-privacy-policy/]
Your privacy is important to us.
GitLab is proud to be an equal opportunity workplace and is an affirmative
action employer. GitLab’s policies and practices relating to recruitment,
employment, career development and advancement, promotion, and retirement are
based solely on merit, regardless of race, color, religion, ancestry, sex
(including pregnancy, lactation, sexual orientation, gender identity, or gender
expression), national origin, age, citizenship, marital status, mental or
physical disability, genetic information (including family medical history),
discharge status from the military, protected veteran status (which includes
disabled veterans, recently separated veterans, active duty wartime or campaign
badge veterans, and Armed Forces service medal veterans), or any other basis
protected by law. GitLab will not tolerate discrimination or harassment based on
any of these characteristics. See also GitLab’s EEO Policy
[https://about.gitlab.com/handbook/people-policies/inc-usa/#equal-employment-opportunity-policy] and EEO
is the Law
[https://about.gitlab.com/handbook/labor-and-employment-notices/#eeoc-us-equal-employment-opportunity-commission-notices].
If you have a disability or special need that requires accommodation
[https://about.gitlab.com/handbook/people-policies/inc-usa/#reasonable-accommodation],
please let us know during the recruiting process
[https://about.gitlab.com/handbook/hiring/interviewing/#adjustments-to-our-interview-process].