There has never been a better time to join Extreme, with several acquisitions
extending our portfolio and go to market strategy, we have seen enormous
opportunity and growth within the region.
Aside from being a Technology Leader in the Gartner Magic Quadrant, we also
adamantly promote an internal culture that truly embraces diversity, inclusion,
and equality in the workplace. Having Diversity and Inclusion as part of our
core values and beliefs, we’re proud to foster an environment where every
Extreme employee can thrive because of their differences, not despite them.
Cloud Operations Engineer – Monitoring Lead (Thornhill, Toronto - Hybrid)
We are seeking a highly skilled and experienced Cloud Operations Engineer –
Monitoring Lead to join our growing Cloud Operations team. In this critical
role, you will be responsible for designing, implementing, and optimizing our
comprehensive monitoring and alerting strategy across our cloud infrastructure
and applications. You will drive proactive identification of issues, ensure
system health, and contribute significantly to our operational excellence and
reliability goals. We're looking for the best and the brightest 'A' players who
want to make a difference doing a job they love.
\n
Responsibilities
- Lead the design, implementation, and continuous improvement of our end-to-end
monitoring and alerting framework for cloud infrastructure (AWS, Azure, GCP),
applications, and services.
- Define key performance indicators (KPIs), service level indicators (SLIs),
and service level objectives (SLOs) for critical systems.
- Evaluate, select, and integrate monitoring tools (e.g., Prometheus, Grafana,
Datadog, Splunk, CloudWatch, Azure Monitor, GCP Operations Suite) to meet
evolving needs.
- Develop and implement automation scripts and tools (e.g., Python, Bash,
PowerShell) to streamline monitoring deployment, configuration, and incident
remediation.
- Build and maintain dashboards, alerts, and reports that provide actionable
insights into system performance, health, and availability.
- Analyze monitoring data to identify performance bottlenecks, resource
inefficiencies, and potential cost optimization opportunities.
- Collaborate with engineering teams to implement performance improvements and
cost-saving measures.
- Create and maintain comprehensive documentation for monitoring systems,
procedures, and best practices.
- Proactively identify areas for improvement in our cloud operations and
monitoring capabilities.
- Provide 24* 7 support for Cloud services
- Participate in cloud security and compliance implementation.
Ideal Qualifications:
- BS level technical degree required; Computer Science or Engineering
background preferred.
- 8+ years of progressive experience in Cloud Operations, DevOps, or Site
Reliability Engineering roles, with a strong focus on monitoring.
- Deep expertise with at least one major public cloud platform (AWS, Azure, or
Google Cloud Platform).
- Proven experience as a technical lead or senior contributor in a
monitoring-focused role.
- Working knowledge of container-based architecture and deployment (Docker,
Kubernetes.)
- Extensive experience with various monitoring and observability tools (e.g.,
Prometheus, Grafana, Datadog, Splunk, ELK Stack, vendor-specific monitoring
solutions).
- Excellent problem-solving, analytical, and troubleshooting skills.
- Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Kafka and
RabbitMQ.
- Comfortable working within a distributed team located in multiple time zones.
\n
$120,000 - $130,000 a year
\n