Job Description:
On behalf of our public sector client, Affinity is looking for a Senior Site Reliability Engineer who will be responsible for developing robust observability solutions using Dynatrace, and automating key monitoring processes through Terraform and PowerShell for Azure based Services. The role aims to design and implement custom solutions that provide comprehensive, end-to-end monitoring of applications built on Azure services, ensuring optimized performance, reliability, and scalability. This is a hands-on role, requiring strong expertise in scripting and automation using Terraform, PowerShell, and Ansible to streamline infrastructure operations.
Responsibilities:
• Serve as the subject matter expert (SME) for Dynatrace, responsible for configuring, optimizing, and managing Dynatrace monitoring solutions.
• Design and implement monitoring strategies using Dynatrace to ensure comprehensive visibility into system performance, availability, and reliability
• Collaborate with our Engineering \& Platform teams to ensure our services, platforms and infrastructure are emitting the right metrics
• Lead the rollout and adoption of Observability practices, tools, and frameworks across teams and projects.
• Collaborate with Incident Management teams to resolve critical incidents, conduct post-incident reviews, and implement preventive measures.
• Communicate complex information clearly and concisely, to explain various business and technical information
• Proactively identify and mitigate potential issues, bottlenecks, and performance degradation to ensure system reliability and uptime
• Drive automation initiatives using tools like Ansible, Terraform, or Kubernetes to streamline deployment, configuration, and management of infrastructure.
• Conduct capacity planning assessments, analyze resource utilization trends, and forecast capacity requirements to support business growth and scalability.
Qualifications:
• Bachelor's degree in computer science, Engineering, or related field; Master's degree preferred.
• Extensive and recent experience as a Site Reliability Engineer (SRE) with a focus on Dynatrace and Observability practices.
• Strong proficiency in Dynatrace monitoring solutions, including configuration, customization, and optimization.
• Hands-on experience with Observability tools and practices such as distributed tracing, logging, metrics collection, and anomaly detection.
• Experience with automation tools (Ansible, Terraform, Kubernetes) and Infrastructure as Code (IaC) principles.
• Solid understanding of cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
• Excellent problem-solving skills, analytical thinking, and the ability to troubleshoot complex technical issues.
• Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams and drive initiatives to completion.
• Relevant certifications (Dynatrace, AWS, Kubernetes, etc.) are a plus.
Affinity Earn:
Know someone who's great for this, or any of our open roles? Earn up to $4,000/year for each successful referral through Affinity Earn. You can also earn up to $50,000 for helping us find new clients. Learn about our referral program at https://affinity-group.ca/earn/ or browse our jobs \& follow us at https://www.linkedin.com/company/affinity-staffing/jobs/
About Affinity:
Affinity Group is a technology and business consulting and services company. We believe in creating long term relationships between clients and consultants that foster a mutually beneficial partnership. Affinity is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All employment is decided on the basis of qualifications, merit and business need.
For more information on Affinity, please visit www.affinity-group.ca
Job Number: 12041