Petal is a leading Canadian healthcare orchestration and billing company that
revolutionizes healthcare systems to make them agile, efficient, and resilient
by enabling the forecasting and shaping of world-class healthcare through
Healthcare BI, advanced analytics, and informed insights.
Our commitment to fostering an exceptional workplace culture has earned us
notable recognitions, including being listed as a Great Place to Work in both
the technology and healthcare sectors. Join us in our mission to empower
healthcare innovators and improve healthcare differently.
What you can expect when joining the team
As a Staff SRE Specialist, you will play a crucial role in ensuring the
reliability, performance, and scalability of our services. You will be
responsible for improving and maintaining the resilience of our infrastructure
through automation, monitoring, and incident management. In this role, you will
lead the charge in bridging the gap between software development and operations
to ensure efficient delivery and high availability of our applications.
Your daily life
In your day to day, you will be led to:
-
Define and drive SRE best practices (SLIs/SLOs, error budgets, blameless
post-mortems) to ensure availability, reliability, and scalability of
critical systems, working closely with the Principal Developer on technical
vision and architecture;
-
Establish and maintain reliability metrics (SLIs, SLOs, recovery time) and
design robust monitoring, alerting, and observability systems while
optimizing infrastructure costs;
-
Eliminate toil through automation of critical operations such as incident
response, auto-scaling, and CI/CD pipelines to improve service reliability
and team productivity;
-
Architect technical implementation plans and support their delivery,
establishing partnerships with teams to meet reliability, performance, and
security standards;
-
Contribute to internal tooling and platform development (deployment tools,
dashboards, monitoring frameworks) to improve operational efficiency and
developer experience while maintaining security standards in systems and
processes;
-
Lead resilience improvement efforts including capacity planning, disaster
recovery, and system optimization through load balancing, failovers, and
other high availability strategies;
-
Manage critical incident response by minimizing MTTR (Mean Time To Recovery),
including intervention, resolution, and post-incident analysis with
documentation and recommendations to prevent recurrence;
-
Proactively identify optimization opportunities for system performance and
cost-effectiveness in cloud environments while contributing to strategic
infrastructure planning;
-
Provide 24/7 production support through on-call rotation, maintain system
availability, and manage internal communications during major incidents;
-
Mentor and coach team members and contribute to in-depth technical analyses
to address strategic business needs.
Your profile
Are you a proactive technical leader with deep expertise in site reliability?
Are you passionate about building resilient and high-performing systems and
guiding teams toward excellence? The sky is the limit! If you have:
-
College diploma (DEC) or bachelor's degree in computer science or related
field;
-
More than 10 years of relevant professional experience, with at least 5 years
focused on SRE or similar roles;
-
Deep knowledge of cloud infrastructure (AWS, GCP, or Azure), system
architecture, orchestration tools, and automation frameworks;
-
Advanced knowledge of SRE tools and practices (monitoring, alerting, incident
response, capacity planning). Proficiency with tools like Prometheus,
Grafana, Kubernetes, and Terraform;
-
Strong experience with infrastructure automation tools, scripting (Python,
Go, or Bash), and CI/CD pipelines;
-
Proven ability to guide cross-functional teams, mentor junior engineers, and
lead reliability initiatives that align with business objectives;
-
Strong problem-solving and analytical skills with the ability to handle
complex technical issues;
-
Excellent verbal and written communication skills, with the ability to
document and explain complex concepts to both technical and non-technical
stakeholders;
-
Proficiency in English and French is preferred, as you will work with diverse
teams and stakeholders.
Petal's position on remote working
In our opinion, a company cannot claim to be modern, innovative and have the
well-being of their team at heart, without attempting to integrate remote
working to the level that their business model allows them to. Petal employees
continue to benefit from the option of teleworking up to the maximum flexibility
permitted by the nature of the position and the smooth running of operations.
Our benefits
-
A signing bonus of $1,000 for your remote work set-up;
-
Compensation that recognizes your contribution;
-
4 to 6 weeks of paid vacation per year;
-
5 paid personal days per year;
-
A group RRSP / DPSP plan with employer contribution;
-
A complete group insurance plan, from day 1;
-
An annual wellness allowance;
-
Access to the Lumino Health™ telehealth application;
-
Flexible work hours and more.
Petal is an active participant in the equal opportunity employment program, and
members of the following target groups are encouraged to apply: women, people
with disabilities, aboriginal peoples and visible minorities. If you are a
person with a disability, assistance with the screening and selection process is
available on request.
A quick important note: We've noticed that some external websites are posting
our job openings under incorrect job titles. To find our real opportunities and
join our team, please make sure to apply through our official careers page or
our trusted partners. We can't wait to hear from you!