Senior Research Scientist with 14+ years of experience developing and deploying software solutions for complex, novel, data-intensive problems in the HEP domain. Seeking to leverage my expertise in scientific computing, applied deep learning and HPC for challenging and impactful research directions. Proven ability to prioritize and execute, lead projects, iterate fast, deliver results and share knowledge in diverse teams.
Professional Experience
Senior Research Scientist, AI/ML NICPB/KBFI 07/2020 – Present
High-Energy Physics and Computation Group Tallinn, Estonia
- ML Model Development for Data Reconstruction: Developed novel GNN/Transformer models and data pipelines using PyTorch/TensorFlow on multi-GPU HPC systems (AMD and Nvidia). Deployed with ONNX to production-ready systems. Improved critical accuracy metrics by up to 30%. Led work on multiple peer-reviewed publications, including Nature Communications Physics.
- Software & Performance Optimization: Co-led the reconstruction software group for the CERN CMS collaboration. Optimized the CMSSW C++ codebase and data processing workflows, reducing CPU time for data-taking by 40% in key workflows. Developed the use CI/CD, modern debugging and regression analysis best practices.
- Project Lead & PI - Applied ML Research: Led applied ML research initiatives across different domains, primarily perception in point cloud & image data. Developed ML and data architecture & code, directed research strategy, managed project timelines, reported on deliverables. Focus on CMS and FCC applications.
- Mentorship & Team Leadership: Mentored and supervised three PhD researchers, one MSc researcher, and two BSc researchers. Provided technical guidance and support for ML, simulation, and data engineering tasks contributing to >8 joint peer-reviewed publications.
- Computing Operations: Managed the planning, funding acquisition, procurement, and deployment for a renewal of on-premises HPC infrastructure. DevOps and user support responsibilities for 24/7 operations.
Postdoctoral Researcher Caltech 07/2018 – 06/2020
Experimental High-Energy Physics Group Pasadena, CA, USA
- Data Analysis Pipeline Optimization: Re-engineered data analysis pipelines using CUDA, Python and C++, accelerating time-to-insight by 10x for large-scale columnar datasets.
- HPC DevOps & Reliability: Led DevOps for Caltech’s HPC center supporting critical CERN workloads. Ensured 24/7 system reliability and efficient operation of the batch queues and distributed storage (Ceph/Hadoop).
PhD Researcher ETH Zürich 09/2014 – 06/2018
Experimental High-Energy Physics Group Zürich, Switzerland
- ML for Particle Identification: Implemented and deployed improved ML methods using xgboost for particle identification within the CERN production software environment.
- Data Analysis & Discovery Contribution: Developed data analysis software (C++/Python/numpy) for CERN, contributing to the first observation of the ttH process and heavy-flavour jet identification. Managed research goals and deliverables.
Internship Lingvist Technologies 05/2017 – 07/2017
Data Science Team Tallinn, Estonia
- Predictive Modeling: Developed an LSTM-based model based on business requirements that significantly improved language learning recall analysis in open-ended vocabulary data streams.
Research Engineer NICPB/KBFI 01/2012 – 08/2014
High-Energy Physics and Computation Group Tallinn, Estonia
- CERN Data Analysis: Developed data analysis software (C++/Python/numpy) software development for the CMS experiment at CERN.
Education
- PhD, Experimental Particle Physics (ETH Medal), ETH Zürich (Thesis) 09/2014 – 07/2018
- M.Sc., Fundamental Physics (cum laude), University of Tartu, Estonia 09/2012 – 06/2014
- B.Sc., Physics, University of Tartu, Estonia 09/2008 – 06/2012
Technical skills
- Software Engineering: CI/CD (Github/Gitlab), debugging, optimization, regression analysis 14 years
- HPC: distributed storage (CEPH, Hadoop) and processing (Slurm), NoSQL datasets (ROOT, parquet), DevOps (ansible) 12 years
- Software Development: Python, C/C++ in production, legacy code maintenance and modernization 10 years
- ML: pytorch & tensorflow deployment, use of multi-GPU systems including HPC 7 years
Core competencies
- Scientific Computing & Software Engineering: familiarity with the CERN stack (C++, Python, libraries & distribution) 14 years
- Data Analysis and Engineering: contribution to multiple key measurements at CERN through large-scale data analysis 11 years
- Physics R&D: 10+ peer-reviewed research results 11 years
- Applied ML/AI R&D: 8+ peer-reviewed papers on applied AI methods 7 years
- Project Management: research funding acquisition/reporting, computing software and infrastructure 6 years
- Technical Leadership: successfully led PhD and MSci projects, co-led the CERN CMS reconstruction team 5 years
Industry projects
- Mu-Ray Tech: advisor 2025 – Present
- Taara Robotics: real-time multi-task segmentation and object identification networks for Jetson Orin NX DLA 2025
- GScan: accurate tomography reconstruction using 3D-CNNs for construction safety 2023
Language skills
- English full proficiency
- Russian, French limited working
- Korean elementary
- Estonian native language
Scientific publications and reports
- Seeba, N.-N. et al, “ParticleTransformer is all you need for reconstructing hadronic tau leptons”, arXiv:2606.18460 (2026)
- Mokhtar, F. et al, “Machine-learned particle flow as a foundation model for collider physics”, arXiv:2606.14373 (2026)
- CMS collaboration, “Full event interpretation with machine-learning-based particle-flow reconstruction in the CMS detector”, Eur. Phys. J. C (2026), 10.48550/arXiv.2601.17554
- Tani, L. et al, “Reconstructing hadronically decaying tau leptons with a jet foundation model”, SciPost Physics (2025), 10.21468/SciPostPhysCore.8.3.046
- Mokhtar, F. et al, “Fine-tuning machine-learned particle-flow reconstruction for new detector geometries in future colliders”, PRD (2025), 10.1103/PhysRevD.111.092015
- Põder, S. et al, “On the detection of stellar wakes in the Milky Way: a deep learning approach”, Astronomy and Astrophysics (2025), 10.1051/0004-6361/202451480
- Tani, L. et al, “A unified machine learning approach for reconstructing hadronically decaying tau leptons”, Computer Physics Communications 307 (2025), 10.1016/j.cpc.2024.109399
- Pata, J. et al, “Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors”, Nature Communications Physics 7 (2024), 10.1038/s42005-024-01599-5
- Lange, T. et al, “Tau lepton identification and reconstruction: a new frontier for jet-tagging ML algorithms”, Computer Physics Communications 298 (2024), 10.1016/j.cpc.2024.109095
- CMS Collaboration, “Progress towards an improved particle flow algorithm at CMS with machine learning”, ACAT (2022), CERN-CMS-DP-2022-061
- Lewicki, M. et al, “Dynamics of false vacuum bubbles with trapped particles”, Phys.Rev.D 108 (2023), https://doi.org/10.1103/PhysRevD.108.036023
- Põder, S. et al “A Bayesian estimation of the Milky Way’s circular velocity curve using Gaia DR3”, Astronomy and Astrophysics 676 (2023), 10.1051/0004-6361/202346474
- Wulff, E. et al, “Hyperparameter optimization of data-driven AI models on HPC systems”, J.Phys.Conf.Ser. 2438 (2023), 10.1088/1742-6596/2438/1/012092
- Bazarov, A. et al, “Sensitivity Estimation for Dark Matter Subhalos in Synthetic Gaia DR2 using Deep Learning”, Astronomy and Computing (2022), 10.1016/j.ascom.2022.100667
- Pata, J. for the CMS Collaboration, “Machine Learning for Particle Flow Reconstruction at CMS”, ACAT (2022), 10.1088/1742-6596/2438/1/012100
- Pata, J. et al, “MLPF: Efficient machine-learned particle-flow reconstruction using graph neural networks”, EPJC (2021), 10.1140/epjc/s10052-021-09158-w
- Pata, J. et al, “Data Analysis with GPU-Accelerated Kernels”, Proceedings of Science, ICHEP (2020) https://doi.org/10.22323/1.390.0908
- CMS Collaboration, “Observation of ttH production”, PRL (2018), 10.1103/PhysRevLett.120.231801
- CMS Collaboration, “Identification of heavy-flavour jets with the CMS detector in pp collisions at 13 TeV”, JINST (2018), 10.1088/1748-0221/13/05/P05011
Key research talks
- Panel discussion: “Cutting Through the Hype – Quantum and AI Technology Limits, Lessons, and Next Moves”, sTARTUp Day 2026 (Tartu, Estonia) 2026
- Particle flow reconstruction with a learnable, differentiable, efficient ML model, ZPW2026 (Zurich, Switzerland) 2026
- Invited talk on science and society, TeadusEST 2025 (Tartu, Estonia) 2025
- Invited talk, Taltech AI Retreat (Estonia) 2025
- CERN OpenLab workshop, invited talk on machine learning for data reconstruction 2025
- Invited talk, Estonian Academy of Sciences (Tallinn, Estonia) 2025
- Scalable neural networks for event reconstruction, ACAT (Stony Brook, NY, USA) 2024
- Neural networks and terascale datasets for particle-flow reconstruction, ML4Jets (Hamburg, Germany) 2023
- Overview of machine learning for calorimeter clustering and particle flow, Learning To Discover (Paris) 2022
- Machine learning for data reconstruction at the LHC, LIP seminar (Portugal), invited, virtual 2022
- Graph neural networks, QU Data Science Basics (Hamburg), invited 2021
- Machine learning for particle flow reconstruction at CMS, ACAT (Daejeon, South Korea), virtual 2021
- Measurements of ttH at CMS, Lake Louise (Canada) 2019