At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, veteran status, pregnancy or related condition (including breastfeeding) or any other basis as protected by applicable law.
About us
Founded in 2017, Wayve is the leading developer of Embodied AI technology. Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems.
Our vision is to create autonomy that propels the world forward. Our intelligent, mapless, and hardware-agnostic AI products are designed for automakers, accelerating the transition from assisted to automated driving.
In our fast-paced environment big problems ignite us—we embrace uncertainty, leaning into complex challenges to unlock groundbreaking solutions. We aim high and stay humble in our pursuit of excellence, constantly learning and evolving as we pave the way for a smarter, safer future.
At Wayve, your contributions matter. We value diversity, embrace new perspectives, and foster an inclusive work environment; we back each other to deliver impact.
Make Wayve the experience that defines your career!
The role
We’re looking for a Machine Learning Engineer with strong experience in reinforcement learning (RL), reward modeling, and large-scale ML systems to advance how we train, evaluate, and deploy embodied AI behaviors. This role sits at the intersection of ML engineering, applied RL research, and ML systems, working on the frameworks that guide how our autonomous agents learn from data, simulation, and real-world experience.
As an MLE on the Accelerated Learning Loop team, you will:
Design and optimise end-to-end pipelines for training reward models and RL agents, ensuring they are reproducible and high-throughput.
Develop tooling for data processing, annotation, and inference within RL workflows.
Build, refine, and deploy reward models that encode safe, interpretable, and effective driving behaviours.
Integrate reward models with diverse data sources: real-world trajectories, simulation, and synthetic datasets.
Conduct ablations, hyperparameter explorations, and controlled studies to analyse how reward structures, data composition, and training dynamics affect policy performance.
Diagnose failure modes, investigate emergent behaviours, and iterate on reward objectives to improve reliability.
Work closely with RL scientists to translate research ideas into scalable engineering solutions.
Partner with evaluation teams to integrate reward and RL models into offline/online testing suites and simulation frameworks.
Establish best practices around code quality, reproducibility, and deployment readiness.
Build internal tools and visualisations that enable faster debugging, deeper insights, and more efficient iteration across the RL and reward modeling stack.
This role is ideal for someone who enjoys building systems and running fast, grounded experiments. Someone who is motivated by delivering real impact on the behaviour of embodied AI systems in the real world.
Must-haves
Experience applying reinforcement learning techniques, including offline RL, reward modeling, RLHF-style approaches, or similar
Proficiency in Python and modern ML frameworks (e.g., PyTorch, JAX, Ray, or equivalent)
Experience building ML pipelines or large-scale training workflows in production or research environments
Strong understanding of simulation environments and/or real-world behavioural data
Ability to design and run experiments, analyse results, and turn insights into actionable improvements
Strong problem-solving skills and the ability to work effectively in cross-functional teams
Nice-to-haves
Experience contributing to research (e.g., publications at NeurIPS, ICLR, CoRL, CVPR)
Understanding of self-driving technologies, sensor data, or real-time decision-making algorithms
Experience with distributed training systems and cloud compute environments (Azure, AWS, GCP)
Exposure to large-scale simulation, embodied AI, or robotics systems
What we offer you
Attractive compensation with salary and equity
Immersion in a team of world-class researchers, engineers and entrepreneurs
A unique position to shape the future of autonomy and tackle the biggest challenge of our time
Bespoke learning and development opportunities
Relocation support with visa sponsorship
Flexible working hours - we trust you to do your job well, at times that suit you and your time
Benefits such as an onsite chef, workplace nursery scheme, private health insurance, therapy, daily yoga, onsite bar, large social budgets, unlimited L&D requests, enhanced parental leave, and more!
This is a full-time role based in our office in Vancouver. At Wayve we want the best of all worlds so we operate a hybrid working policy that combines time together in our offices and workshops to fuel innovation, culture, relationships and learning, and time spent working from home.
We understand that everyone has a unique set of skills and experiences and that not everyone will meet all of the requirements listed above. If you’re passionate about self-driving cars and think you have what it takes to make a positive impact on the world, we encourage you to apply.
For more information visit Careers at Wayve.
To learn more about what drives us, visit Values at Wayve
DISCLAIMER: We will not ask about marriage or pregnancy, care responsibilities or disabilities in any of our job adverts or interviews. However, we do look to capture information about care responsibilities, and disabilities among other diversity information as part of an optional DEI Monitoring form to help us identify areas of improvement in our hiring process and ensure that the process is inclusive and non-discriminatory.