This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Member of Technical Staff, Data Engineering in United States, United Kingdom, France, Canada.
This role offers the opportunity to shape the foundation of cutting-edge AI systems by managing and optimizing the data pipelines that power advanced language models. You will design and build scalable pipelines, curate high-quality datasets, and ensure data is structured for optimal training efficiency. Working with diverse sources like web data, code repositories, and multilingual corpora, you will bridge research and engineering, enabling faster, more reliable model training. This position operates in a collaborative, fast-paced environment where your contributions directly influence AI model performance and innovation. Flexible remote options are available, and you will interact closely with researchers, engineers, and cross-functional teams globally.
\n
Accountabilities:
Design, develop, and maintain scalable data pipelines for ingestion, parsing, filtering, and optimization of diverse datasets.
Conduct data ablations and experiments to assess quality and improve model performance.
Implement robust data modeling techniques to structure and format datasets for efficient training.
Research and apply innovative data curation strategies to support advancements in natural language processing.
Collaborate with researchers, engineers, and cross-functional teams to meet the evolving needs of AI models.
Ensure datasets are diverse, reliable, and optimized for throughput and accelerator utilization.
Requirements:
Strong software engineering skills, particularly in Python.
Experience building and maintaining large-scale data pipelines.
Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or equivalent.
Experience working with large-scale web datasets (e.g., CommonCrawl).
Passion for combining research and engineering to solve complex data challenges in AI.
Excellent collaboration and communication skills to work effectively across global teams.
Nice to Have:
Publications at top-tier AI and ML venues (NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, COLING, ACL, EMNLP).
Experience with multilingual corpora and diverse data sources.
Background in NLP or generative AI research.
Benefits:
Open and inclusive work culture with global collaboration opportunities.
Weekly lunch stipends, in-office meals, and snacks.
Comprehensive health, dental, and mental health benefits.
100% parental leave top-up for up to six months.
Personal enrichment budget for arts, culture, fitness, well-being, and workspace improvements.
Remote-flexible work options with offices in Toronto, New York, San Francisco, London, and Paris, including co-working stipends.
6 weeks (30 working days) of vacation.
\n
Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.
When you apply, your profile goes through our AI-powered screening process designed to identify top talent efficiently and fairly.
🔍 Our AI evaluates your CV and LinkedIn profile thoroughly, analyzing your skills, experience, and achievements.
📊 It compares your profile to the job’s core requirements and past success factors to determine your match score.
🎯 Based on this analysis, we automatically shortlist the three candidates with the highest match to the role.
🧠 When necessary, our human team may perform an additional manual review to ensure no strong profile is missed.
The process is transparent, skills-based, and free of bias — focusing solely on your fit for the role. Once the shortlist is completed, we share it directly with the company that owns the job opening. The final decision and next steps (such as interviews or additional assessments) are then made by their internal hiring team.
Thank you for your interest!
LI-CL1