<p>
</p><h3 id="about-toptal">About Toptal</h3>
<p>Toptal is a global network of top talent in business, design, and technology that enables companies to scale their teams, on-demand. With $200+ million in annual revenue <strong>and team members based around the globe</strong>, Toptal is the <a data-faitracker-click-bind="true" href="https://www.toptal.com/remote-work-playbook">world’s largest fully remote workforce</a>.</p>
<p>We take the best elements of virtual teams and combine them with a support structure that encourages innovation, social interaction, and fun. We see no borders, move at a fast pace, and are never afraid to break the mold.</p>
<h3 id="job-summary">Job Summary</h3>
<p>We are looking for a Senior Data Scientist to join us as the first Data Scientist on a new product we are building. This is a founding role: you will shape the data science function from the ground up, set technical direction, and own the end-to-end delivery of intelligent systems that define how our product creates value. You will tackle open-ended problems involving Task Mining, Process Mining, behavioral workflow analysis, pattern discovery, predictive modeling, and applied GenAI/ML systems. The goal is not just to build models, but to turn raw interaction data into measurable product and business impact: discovered workflows, bottlenecks, optimization opportunities, and scalable foundations for future DS/ML work.</p>
<p>This is a remote position. We do not offer visa sponsorship or assistance. Resumes and communication must be submitted in English.</p>
<h3 id="responsibilities">Responsibilities</h3>
<ul style="list-style-type: disc;">
<li>Act as the founding Data Scientist on the product: define the DS strategy, choose the right tools and frameworks, and establish best practices.</li>
<li>Design and build Task Mining and Process Mining solutions that transform raw interaction data into discovered workflows, patterns, bottlenecks, and optimization opportunities.</li>
<li>Design, develop, and deploy ML systems and data pipelines for large-scale structured, unstructured, and event/interaction data.</li>
<li>Build predictive and pattern-discovery solutions using supervised and unsupervised learning, representation learning, sequence modeling, and LLM/GenAI approaches where appropriate.</li>
<li>Establish practical foundations for dataset construction, labeling strategy, offline/online evaluation, monitoring, feedback loops, and human-in-the-loop review where needed.</li>
<li>Own projects end-to-end, from problem framing and experimentation through production deployment and iteration. Collaborate closely with engineering on data instrumentation, pipeline design, deployment, and integration of production-ready services.</li>
<li>Communicate findings, tradeoffs, and technical concepts effectively to both technical and business stakeholders.</li>
</ul>
<h3 id="qualifications-and-requirements">Qualifications and Requirements</h3>
<ul style="list-style-type: disc;">
<li>5+ years of professional experience in Data Science, Machine Learning, or Applied ML roles.</li>
<li>Demonstrated experience operating as the sole or lead Data Scientist on a product or team — owning problems end-to-end without senior DS supervision.</li>
<li>Strong experience with supervised and unsupervised ML, modern ML/data tooling, and the judgment to select the right approach for the problem.</li>
<li>Practical familiarity with representation learning, sequence modeling, Transformers, LLMs, or GenAI systems where relevant to product use cases.</li>
<li>Experience handling large-scale structured, unstructured, event, or interaction datasets.</li>
<li>Advanced proficiency in Python and SQL, with hands-on experience using tools such as PyTorch, scikit-learn, pandas/Polars, experiment tracking, and production ML workflows.</li>
<li>Experience deploying ML models, data pipelines, or intelligent systems into production.</li>
<li>Familiarity with Task Mining, Process Mining, event-log analysis, behavioral analytics, workflow automation, or adjacent domains.</li>
<li>Advanced degree in Computer Science, Data Science, AI, Statistics, Mathematics, or a related field is a plus; equivalent practical experience is strongly valued.</li>
</ul>
<h3 id="what-we-are-looking-for">What We Are Looking For</h3>
<ul style="list-style-type: disc;">
<li>A founder’s mindset: full responsibility for outcomes, not just deliverables.</li>
<li>Comfort operating in high ambiguity: able to turn unclear product goals, noisy data, and incomplete requirements into an executable roadmap.</li>
<li>Strong business sense — connects technical work to commercial impact and measurable product value.</li>
<li>Pragmatic technical judgment — knows when to use advanced ML, when to simplify, and when better data, labeling, or evaluation is the real bottleneck.</li>
<li>Ability to build foundations for rapid scaling: reusable datasets, pipelines, metrics, evaluation frameworks, and modeling patterns future DS/ML hires can build on.</li>
<li>Highly proactive problem solver who acts without waiting for detailed instructions.</li>
<li>Excellent communication skills, with the confidence to push back constructively and propose direction.</li>
</ul>
<h3 id="nice-to-have">Nice to Have</h3>
<ul style="list-style-type: disc;">
<li>Previous experience as a first or early Data Scientist at a startup or new product line.</li>
<li>Direct experience with Task Mining, Process Mining, workflow intelligence, RPA, or productivity analytics.</li>
<li>Experience with LLMs and Generative AI applications, especially evaluation, structured outputs, semantic labeling, summarization, or human-in-the-loop workflows.</li>
<li>Experience working with privacy-sensitive behavioral, productivity, or user-interaction data.</li>
<li>Experience with product experimentation, causal inference, or measuring the impact of workflow/process interventions.</li>
<li>Knowledge of MLOps and distributed processing frameworks, such as Spark.</li>
<li>Experience with cloud environments, especially GCP.</li>
</ul>
<p></p>
<p></p><p><br></p><p></p>