Job Description: THE ROLE: THE PERSON: KEY RESPONSIBILITIES:
- Optimize open-source deep learning training libraries, such as Megatron and Transformer Engine, for enhanced performance on AMD GPUs.
- Analyze and optimize key deep learning models for performance on AMD GPUs in a distributed computing environment, targeting both scale-up (multi-GPU) and scale-out (multi-node) architectures.
- Apply software engineering best practices while staying informed of trends and innovations in software, hardware, algorithms and architecture.
- Contribute to the development and bring-up of new ASIC and hardware.
- Apply a data-driven approach to optimization efforts and design groundbreaking AMD technologies.
- Debug and resolve existing issues while researching more efficient alternatives to achieve the same objectives.
- Collaborate with internal GPU library teams and develop technical relationships with peers and partners to optimize deep learning training.
PREFERRED EXPERIENCE:
-
Programming \& Development:
-
Expertise in C/C and Python, with strong skills in object-oriented programming, debugging, performance optimization, and concurrent programming.
-
Familiar with source control (GitHub), CI/CD, and Linux debugging/profiling tools.
-
GPU Kernel Development:
-
Experienced in GPU kernel optimization for deep learning using HIP and CUDA on AMD GPUs (GCN, RDNA).
-
Skilled in programming and performance optimization with tools like Composable Kernel (CK), CUTLASS, Triton, and assembly (ASM).
-
Deep Learning \& Optimization:
-
Expertise in integrating GPU performance into TensorFlow and PyTorch for model training and inference optimization.
-
Experience in analyzing and optimizing deep learning workloads with a focus on scaling and throughput.
-
Collaboration \& Communication:
-
Strong problem-solving and communication skills, with proven success in team collaboration.
-
Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Advanced Micro Devices