Founding ML Engineer CUDA ROCm C++
Місто :
, Odesa,
Компанія :
DevsData LLC
Зарплата :
Знайдено :
18 днів тому
Опис
Founding ML Engineer Salary: $100 000 USD+ Equity (highly negotiable for the right candidate) Hybrid role: 2-3 days at an office in Warsaw/Gdansk Full-time position B2B contract or Contract of Employment, negotiable ️Home office budget & relocation/traveling cost included A rapidly scaling startup, recently emerging from stealth mode and backed by a top-tier venture capital fund, is embarking on a mission to democratize AI across any hardware platform Our client's R&D team is building a highly efficient engine for deploying genAI models. This entails a wide array of tasks, ranging from fine-tuning GPU kernels to optimizing system performance. The Founding ML Engineer will play a pivotal role in driving significant enhancements in GPU performance while spearheading innovative AI and machine learning initiatives. To tackle this mission - they are seeking an expert-level engineer for either Kernel, Compiler, or Runtime Optimization, with a robust background in CUDA, ROCm, or Triton kernel optimization. This role presents an exceptional opportunity to shape the technical direction of the company and contribute to groundbreaking advancements in AI technology. Requirements: Deep understanding and experience in GPU performance optimizations. Proven track record of kernel optimizations on CUDA, ROCm, or other accelerators. Proficiency in programming languages such as C/C++ and Python. Experience with the training and deployment of ML models . Familiarity with distributed systems development or distributed ML workloads . Bachelor's, Master’s or PhD’s degree in Computer Science, Electrical Engineering, or a related field. Great understanding of English with strong communication and collaboration skills. An exceptional candidate will also have: Familiarity with OSS projects like FlashAttention, mlc-llm, vllm Experience with machine learning compilers or frameworks such as TVM, MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT. You would be: Analyzing the bottlenecks in ML training and inference Developing and optimizing computing kernels in CUDA, Triton or ROCm Working on the GPU performance optimizations to maximize performance Get to know DevsData We are a technology consulting company and a recruitment agency, delivering software solutions to clients from Europe and the US. We work 100% remotely, in an international team, including people from Asia, London, or San Francisco. We employ people with experience in international corporations as well as students of the best technical and business universities. Find out more: https://devsdata.com