Huawei Canada has an immediate permanent opening for a Senior Principal
Engineer.
About the team:
The Distributed Data Storage and Management Lab leads research in distributed
data systems, aiming to develop next-generation cloud serverless products that
encompass core infrastructure and databases. This lab addresses various data
challenges, including cloud-native disaggregated databases, pay-by-query user
models, and optimizing low-level data transfers via RDMA. Teams within this lab
create advanced cloud serverless data infrastructure and implement cutting-edge
networking technologies for Huawei's global AI infrastructure.
About the job:
-
Responsible for system-level optimization of open-source frameworks in the
fields of Large Language Models (LLMs) or reinforcement learning, with a
focus on large-scale training and inference.
-
Possess deep theoretical knowledge of reinforcement learning or LLM
algorithms, and lead the design and implementation of advanced system
optimization solutions across the AI software stack.
-
Proactively identify and drive systematic innovations in LLMs, reinforcement
learning algorithms, and system optimization, continuously enhancing the core
competitiveness of AI frameworks.
-
Collaborate closely with cross-functional teams, actively foster the growth
of the technology ecosystem, and take a leading role in advancing AI
framework technologies through personal technical expertise.
About the ideal candidate:
-
Ph.D. in Mathematics, Computer Science, AI, or a related field, or equivalent
research experience and technical proficiency.
-
At least 3 years of experience in research and development of reinforcement
learning, large language models (LLMs), or related algorithms. Familiarity
with cutting-edge technologies such as LLMs, multimodal models, and novel
reinforcement learning algorithms. Ability to optimize AI systems based on
the latest research advances, improving usability and enabling more efficient
adaptation to emerging algorithms.
-
Proficiency with mainstream large model frameworks (e.g., PyTorch,
TensorFlow), with hands-on experience in large-scale algorithm training.
-
Strong cross-functional collaboration and communication skills, with the
ability to work closely with customers and internal teams to drive successful
project delivery.
-
Candidates with a track record of core contributions to AI infrastructure
communities (e.g., PyTorch, vLLM, LangChain) or significant influence in
relevant open-source ecosystems are preferred. Original research publications
in top-tier conferences (e.g., ICML, NeurIPS) or journals are considered a
strong plus.