Huawei Canada has an immediate permanent opening for a Principal Engineer.
About the team:
Established in 2014, the Distributed Scheduling and Data Engine Lab is Huawei
Cloud's technical innovation center in Canada. The lab focuses on researching
and developing advanced cloud technologies, supporting the productization and
iterative optimization of its technical achievements. Current research areas
include cloud native databases, infrastructure resource scheduling and
prediction, cloud-native middleware, media engines, and user experience studies.
The lab fosters a robust technical environment, allowing collaboration with
industry experts to create a highly competitive cloud platform. Our team has an
immediate permanent opening for a Principal Software Engineer.
About the job:
-
Integrate AI frameworks with cloud infrastructure to optimize end-to-end
architecture for AI inference and fine-tuning scenarios. Focus on improving
the observability, reliability, and performance of AI services.
-
Collaborate with team members to design and develop concept prototypes.
Conduct validation of optimization strategies to ensure effectiveness.
-
Work closely with the product team to support the development of prototypes,
taking into account the constraints and requirements of the product's current
status.
About the ideal candidate:
-
5 years of software development experience, with a minimum of 2 years of
experience in AI infrastructure-related platform R&D for fine-tuning or
inference, including but not limited to AI workload profiling tools
development, vLLM or SGLang development, infrastructure level troubleshooting
and root cause analysis.
-
Proficiency in Golang or Rust. Must be able to write clean, efficient, and
high-quality code from scratch.
-
In-depth understanding of AI technologies and familiarity with the module
interactions involved in AI model training, inference framework and storage
system.
-
Proficient in Kubernetes or Ray, with practical experience in developing
services based on these platforms.
-
Strong understanding of cloud services and platforms such as AWS and Azure.
-
Highly analytical, with strong problem-solving skills and the ability to
address complex technical challenges effectively.
-
Self-driven, with a proven ability to learn quickly and take initiative.
-
Master's or Ph.D. degree in Computer Science, Engineering, or a related
field, or equivalent practical experience.