Our team has an immediate 12-month contract opening for a researcher.
About the team:
Cloud Native Data Engine team within Distributed Scheduling and Data Engine Lab,
led by esteemed technical experts with extensive industry and academic
experience, merge software development with cutting-edge industrial research in
cloud database area. Our research currently focuses on cloud native database
architecture (TaurusDB) and high-performance query and transaction processing
(SQL Engine) in next-generation cloud infrastructure. Team publishes innovative
research at leading conferences SIGMOD, VLDB, ICDE and recognized as key
technology contributors in industry.
About the job:
-
This unique role combines software development with cutting-edge industrial
research in databases, encompassing cloud-native database architecture
(TaurusDB) and high-performance query and transaction processing (GaussDB SQL
Engine) within next-generation cloud infrastructure.
-
Design, implement, and maintain database architectures for machine learning
workloads, ensuring efficient data management and optimized performance.
-
Research and stay updated on emerging trends in database technology and
machine learning to propose innovative solutions that improve system
efficiency and capability.
-
Investigate and summarize state-of-the-art database technologies by reviewing
the latest conference papers, attending workshops, and engaging with industry
trends.
-
Assist in the implementation of AI-driven analytics and advanced features
like vector search, similarity matching, and recommendation systems.
-
Actively pursue opportunities to invent and submit patents, as well as write
papers in leading academic and industrial conference.
About the ideal candidate:
-
1-3 years of strong programming skills in C/C++, with expertise in
systems-level programming and debugging.
-
Deep understanding of cloud computing technologies, including cloud storage,
distributed systems, parallel computing, and consistency protocols.
-
Experience working with machine learning frameworks (e.g., TensorFlow,
PyTorch, scikit-learn) and understanding how they can be applied within
database contexts.
-
Familiarity with MySQL, PostgreSQL, or other open-source databases —
including knowledge of their internal mechanisms such as transaction
management, storage engines, MVCC, SQL optimization, query execution, and
vector execution — is considered an asset.
-
Familiarity with AI agents and practical experience in deployment, or
experience integrating ML models into production databases or data pipelines,
is considered an asset.
-
Experience with database extensions or ML-related plugins (e.g., pgvector for
PostgreSQL); Preferably using modern AI accelerators, such as GPUs, NPUs, or
TPUs.
-
Proven ability to conduct research and quickly learn new technologies and
products.
-
A master’s or Ph.D. in Computer Science, Computer Engineering, Mathematics,
or a related field is an asset.