Lead Software Engineer, AI/ML Model Inference Job in Cupertino

864 IT & Software Developer jobs in the US

Company Size

200-500

Company Type

Product

Exp Level

Senior

Job Type

Full-Time

Language

English

Visa sponsorship

Requirements

Must:

- Bachelor’s degree in computer science or a related field - 5+ years of non-internship professional experience in software development - 5+ years of experience in designing or architecting new and existing systems with a focus on design patterns, reliability, and scalability - Solid understanding of machine learning fundamentals, particularly in large language models (LLMs), including architecture, training, and inference lifecycles, with hands-on experience in model optimization - Proficiency in software development using C++ and Python, with experience in at least one of these languages required - Strong grasp of system performance, memory management, and principles of parallel computing - Expertise in debugging, profiling, and applying best practices in software engineering in large-scale systems

Technologies

AWS

Architect

Backbone

CUDA

GitHub

Hardware

Support

LLM

Machine Learning

PyTorch

Responsibilities

In this pivotal role, I will lead efforts to develop distributed inference support for PyTorch within the Neuron SDK. I will optimize these models to ensure optimal performance and maximize their efficiency on AWS Trainium and Inferentia silicon and servers. My responsibilities include: - Designing, developing, and fine-tuning machine learning models and frameworks for deployment on custom ML hardware accelerators - Participating in all phases of the ML system development lifecycle, including architecture design, implementation, performance profiling, hardware-specific optimizations, testing, and production deployment - Creating infrastructure for systematic analysis and onboarding of various models with diverse architectures - Designing and implementing high-performance kernels and features for ML operations, leveraging the Neuron architecture and programming models - Analyzing and optimizing system-level performance across multiple generations of Neuron hardware - Conducting detailed performance analysis using profiling tools to identify and address bottlenecks - Implementing optimizations such as fusion, sharding, tiling, and scheduling - Conducting comprehensive testing, including unit and end-to-end testing with continuous deployment through pipelines - Collaborating directly with customers to enable and optimize their ML models on AWS accelerators - Innovating optimization techniques in collaboration with cross-functional teams

Description

As part of the Inference Enablement and Acceleration team, I will contribute to pioneering efforts that enhance inference capabilities for Generative AI applications. My collaboration with a cross-functional team of applied scientists, system engineers, and product managers will allow me to debug performance issues, optimize memory usage, and influence the future of Neuron's inference stack throughout Amazon and the open-source community. I’ll be expected to build impactful solutions for our extensive customer base and actively participate in discussions on design, code reviews, and communication with both internal and external stakeholders. I thrive in a startup-like environment where the focus is on innovation and prioritizing important initiatives. Our team promotes a culture of builders, emphasizing collaboration, technical ownership, and continuous learning, while ensuring that new members are supported. We cherish knowledge-sharing and mentorship, aiming to foster a conducive environment for career growth and technical excellence. Join us to tackle some of the most fascinating and influential challenges in AI/ML infrastructure today.

Something wrong or incorrect with this job? Tell us in the chat 💬 on the right ➡️

IT & Software developer jobs in the USMachine-Learning Developer jobs in the USMachine-Learning Developer jobs San Jose, CA

You can find Machine Learning Engineer salaries in the United States here.

How many Machine Learning Engineer jobs are in the United States?

Currently, there are 864 ML, AI openings. Check also: TensorFlow jobs, Python jobs, Computer-Vision jobs - all with salary brackets.

Is the US a good place for Machine Learning Engineers?

The US is one of the best countries to work as a Machine Learning Engineer. It has a vibrant startup community, growing tech hubs and, most important: lots of interesting jobs for people who work in tech.

Which companies are hiring for Machine Learning Engineer jobs in the United States?

Sperasoft, bunny.net, Giesecke+Devrient, Webistry, WWC Professional Corporation, Allied Technical Services Inc, Diploma Healthcare Group among others, are currently hiring for ML, AI roles in the United States.

The company with most openings is Leidos as they are hiring for 137 different Machine Learning Engineer jobs in the United States. They are probably quite committed to find good Machine Learning Engineers.