We are seeking a Senior DevOps Engineer to assist in building risk management solutions as part of a Cybersecurity Risk Platform. This role will work closely with the Platform Team, other members of the Engineering Operations Teams, and customers to efficiently deploy scalable and reliable software systems. Members of the team enjoy a collaborative work environment focused on delivering value to large-scale enterprise customers.
In this role, you will assist the Development and Quality Assurance Teams in optimizing the SDLC, as well as work with Cloud Operations and Site Reliability Engineering Teams in the deployment, monitoring, and ongoing management of customer systems. You must love efficiently delivering, managing, and optimizing reliable cloud infrastructure and services by using DevOps principles applied to cloud-based architectures.
The ideal candidate must have a background in improving developer productivity and managing the operations of large-scale SaaS software solutions. You will need to be able to build and manage CI/CD pipelines, use Infrastructure as Code tooling to deploy infrastructure, be comfortable using monitoring tools to understand the operating environment, and respond to service interruptions. This position reports to a Manager of Engineering Operations or above.
Key Responsibilities:
- Collaborate with the development teams on how we develop, test, and deploy the platforms.
- Maintain CI/CD pipelines to ensure that our build/deployment pipelines are automated to remove inefficiencies, toil, risk, and unnecessary cost.
- Automate mundane tasks to empower teams to focus on strategic development and driving business value.
- Work with developers to build highly observable systems that proactively report on system performance and reliability.
- Resolve critical customer issues that improve customer satisfaction and renewal rates.
- Work with the security and compliance teams to identify and appropriately mitigate risks.
- Participate as a First Responder in on-call, incident response, and incident management.
- Champion processes that support team-led work planning and value delivery.
- Participate in planning sessions that ensure objectives are well-understood so that standards and metrics can be established.
- Assist in building a highly capable team based on great talent identification and recruiting.
- Help ensure that the engineering team is happy, prolific, and autonomous.
- Maximize the productivity of our technologies by assisting in the development of technical documentation.
- Participate in incident postmortems to analyze the root causes of incidents and assess responses.
Requirements
Qualifications \& Experience:
- 5 years of experience with Linux systems administration and scripting languages like JavaScript, Python, Bash, etc.
- 5 years of experience designing and maintaining highly available, secure cloud infrastructure, including application/web firewalls, routing, VPCs, load balancers, auto-scaling, IDS, etc.
- 3 years of experience with IaC and configuration management tools like Terraform and Ansible.
- 3 years of experience with monitoring software such as Datadog, Elastic Stack, New Relic; including implementation of observability, monitoring, and reporting solutions.
- 3 years of experience building CI/CD pipelines using tools like GitHub, Bitbucket, Jenkins, etc.
- Experience maintaining cloud platforms that comply with industry standards and best practices for security and privacy (e.g., SOC2, PCI-DSS, HIPAA, GDPR).
- Experience with containers and orchestration via Docker and Kubernetes in a public cloud environment.
- Excellent verbal and written communication skills.
Great to have:
- BS in Computer Science, Computer Engineering, or Electrical Engineering.
- Experience working with Neo4J, Yugabyte, Spring Boot, Kubernetes, GCP
- Experience developing cybersecurity or IT systems management applications.
- Experience as a hands-on software engineer.
Please note there will be the potential for an on-site requirement (Office is located in Halifax, NS.)