Senior Principal Engineer - AI Framework Programming Expression Technology

Huawei Technologies Canada Co., Ltd. • Markham, York Region • 18h ago

Huawei Canada has an immediate permanent opening for a Senior Principal Engineer.

About the team:

The Distributed Data Storage and Management Lab leads research in distributed data systems, aiming to develop next-generation cloud serverless products that encompass core infrastructure and databases. This lab addresses various data challenges, including cloud-native disaggregated databases, pay-by-query user models, and optimizing low-level data transfers via RDMA. Teams within this lab create advanced cloud serverless data infrastructure and implement cutting-edge networking technologies for Huawei's global AI infrastructure.

About the job:

Responsible for system-level optimization of open-source frameworks in the fields of Large Language Models (LLMs) or reinforcement learning, with a focus on large-scale training and inference.
Possess deep theoretical knowledge of reinforcement learning or LLM algorithms, and lead the design and implementation of advanced system optimization solutions across the AI software stack.
Proactively identify and drive systematic innovations in LLMs, reinforcement learning algorithms, and system optimization, continuously enhancing the core competitiveness of AI frameworks.
Collaborate closely with cross-functional teams, actively foster the growth of the technology ecosystem, and take a leading role in advancing AI framework technologies through personal technical expertise.

About the ideal candidate:

Ph.D. in Mathematics, Computer Science, AI, or a related field, or equivalent research experience and technical proficiency.
At least 3 years of experience in research and development of reinforcement learning, large language models (LLMs), or related algorithms. Familiarity with cutting-edge technologies such as LLMs, multimodal models, and novel reinforcement learning algorithms. Ability to optimize AI systems based on the latest research advances, improving usability and enabling more efficient adaptation to emerging algorithms.
Proficiency with mainstream large model frameworks (e.g., PyTorch, TensorFlow), with hands-on experience in large-scale algorithm training.
Strong cross-functional collaboration and communication skills, with the ability to work closely with customers and internal teams to drive successful project delivery.
Candidates with a track record of core contributions to AI infrastructure communities (e.g., PyTorch, vLLM, LangChain) or significant influence in relevant open-source ecosystems are preferred. Original research publications in top-tier conferences (e.g., ICML, NeurIPS) or journals are considered a strong plus.