Job Requisition ID #
25WD85835
Job Title: Principal Site Reliability Engineer
Position Overview
We are seeking a highly motivated and experienced Principal Site Reliability Engineer (SRE) to manage critical cloud infrastructure and site reliability operations for Autodesk's global Product Access journey. This pivotal role focuses on ensuring the highest reliability, availability, and performance of our AWS-hosted cloud infrastructure. Reporting to the Engineering Manager, you will be leading design and development of resilient and scalable architecture and innovative solutions for the platform. You will independently manage and deliver end-to-end solutions while engaging with key stakeholders and partners.
Responsibilities
- Lead the architecture, solution design, development, and maintenance of cloud infrastructure tailored for microservices architecture, ensuring alignment with SRE principles.
- Independently manage requirement analysis, solution design, implementation, and release planning, with a focus on SRE best practices.
- Ensure stringent adherence to trust and security compliance, guidelines, and SRE standards.
- Streamline CI/CD processes, enhance system reliability, and guarantee infrastructure scalability and security.
- Automate infrastructure deployment, scaling, and management using modern tools and advanced SRE practices.
- Implement and maintain configuration management and infrastructure as code (IaC) using Terraform, promoting SRE methodologies.
- Lead Disaster Recovery (DR) strategies, conduct failover exercises, gamedays, and perform periodic maintenance activities to ensure system resilience.
- Contribute actively to critical vulnerability (CVE) remediation efforts, prioritizing system integrity and security.
- Promote, document, and enforce security and best practices across all pillars of SRE throughout system design and operations.
- Provide real-time operational support and collaborate across functions to resolve system, infrastructure, and CI/CD issues efficiently.
- Participate in on-call rotations, delivering crucial 24x7 support for production systems, ensuring high availability and reliability.
Minimum Qualifications
- Bachelor's degree or higher in Computer Science, Engineering, or a related field.
- 8 years of progressive experience in Site Reliability Engineering (SRE), Site Operations or a similar field.
- Proficiency with managing AWS resources and understanding of networking and security protocols.
- Expertise in infrastructure as code (IaC) and cloud automation tools such as Terraform, Serverless, and CloudFormation.
- Expertise in defining and building CI/CD processes with tools like Jenkins, GitHub, and Artifactory.
- Experience with container-based technologies like Docker and AWS ECS.
- Experience with monitoring and logging tools such as Dynatrace, Grafana, DataDog, ELK Stack, and CloudWatch.
- Experience in Linux Systems Administration, scripting, and troubleshooting in a production environment.
- Proficiency in programming languages such as UNIX, Python, Go, Bash, Groovy, and Node.js.
- Technology Stack: Java/SpringBoot, AWS (ECS Fargate, Elastic Cache, Lambda, Kinesis, DynamoDB, VPC, IAM policies, API Gateway, NLB/ALB, Route 53, CloudWatch, Kibana, Open Search), Kafka, GoLang, Node.js, Groovy, Python, Jenkins, GitHub, Jira, ServiceNow, and Splunk.
Preferred Qualifications
- Knowledge in applying AI and ML solutions for engineering processes and/or DevOps automation.
- Knowledge of standardized observability frameworks such as OpenTelemetry.
- Relevant certifications (e.g., AWS Certified DevOps Engineer, AWS Site Reliability Engineer).
- Broad knowledge of AWS, Redis, server programming, databases, and cloud architectures.
- Broad knowledge with data streaming pipelines like Kinesis, Firehose, and Kafka.
- Knowledge on core Java and SpringBoot concepts in JVM optimization.
- Knowledge on build tools, e.g. Gradle.
- Strong interpersonal and communication skills to effectively collaborate in an Agile/Scrum-oriented environment.
- Self-directed team player and independent contributor, demonstrating accountability and end-to-end ownership.
Learn More
About Autodesk
Welcome to Autodesk! Amazing things are created every day with our software -- from the greenest buildings and cleanest cars to the smartest factories and biggest hit movies. We help innovators turn their ideas into reality, transforming not only how things are made, but what can be made.
We take great pride in our culture here at Autodesk -- our Culture Code is at the core of everything we do. Our values and ways of working help our people thrive and realize their potential, which leads to even better outcomes for our customers.
When you're an Autodesker, you can be your whole, authentic self and do meaningful work that helps build a better future for all. Ready to shape the world and your future? Join us!
Salary transparency
Salary is one part of Autodesk's competitive compensation package. Offers are based on the candidate's experience and geographic location. In addition to base salaries, we also have a significant emphasis on discretionary annual cash bonuses, commissions for sales roles, stock or long-term incentive cash grants, and a comprehensive benefits package.
Diversity \& Belonging
We take pride in cultivating a culture of belonging and an equitable workplace where everyone can thrive. Learn more here: https://www.autodesk.com/company/diversity-and-belonging
Are you an existing contractor or consultant with Autodesk?
Please search for open jobs and apply internally (not on this external site).