864 IT & Software Developer jobs in the US

Annapurna Labs (U.S.) Inc. jobs

Lead Software Engineer, AI/ML Model Inference

$151,300 - 261,500
Annapurna Labs (U.S.) Inc.
Torre Avenue 10201, Cupertino
$151,300 - 261,500
Company Size icon
Company Size
200-500
Company Type icon
Company Type
Product
Exp Level icon
Exp Level
Senior
Job Type icon
Job Type
Full-Time
Language icon
Language
English
Visa sponsorship icon
Visa sponsorship
No

Requirements

Must:
- Bachelor’s degree in computer science or a related field - 5+ years of non-internship professional experience in software development - 5+ years of experience in designing or architecting new and existing systems with a focus on design patterns, reliability, and scalability - Solid understanding of machine learning fundamentals, particularly in large language models (LLMs), including architecture, training, and inference lifecycles, with hands-on experience in model optimization - Proficiency in software development using C++ and Python, with experience in at least one of these languages required - Strong grasp of system performance, memory management, and principles of parallel computing - Expertise in debugging, profiling, and applying best practices in software engineering in large-scale systems

Technologies

AI
Backbone
CUDA
GitHub
LLM
Machine Learning
PyTorch

Responsibilities

In this pivotal role, I will lead efforts to develop distributed inference support for PyTorch within the Neuron SDK. I will optimize these models to ensure optimal performance and maximize their efficiency on AWS Trainium and Inferentia silicon and servers. My responsibilities include: - Designing, developing, and fine-tuning machine learning models and frameworks for deployment on custom ML hardware accelerators - Participating in all phases of the ML system development lifecycle, including architecture design, implementation, performance profiling, hardware-specific optimizations, testing, and production deployment - Creating infrastructure for systematic analysis and onboarding of various models with diverse architectures - Designing and implementing high-performance kernels and features for ML operations, leveraging the Neuron architecture and programming models - Analyzing and optimizing system-level performance across multiple generations of Neuron hardware - Conducting detailed performance analysis using profiling tools to identify and address bottlenecks - Implementing optimizations such as fusion, sharding, tiling, and scheduling - Conducting comprehensive testing, including unit and end-to-end testing with continuous deployment through pipelines - Collaborating directly with customers to enable and optimize their ML models on AWS accelerators - Innovating optimization techniques in collaboration with cross-functional teams

Description


As part of the Inference Enablement and Acceleration team, I will contribute to pioneering efforts that enhance inference capabilities for Generative AI applications. My collaboration with a cross-functional team of applied scientists, system engineers, and product managers will allow me to debug performance issues, optimize memory usage, and influence the future of Neuron's inference stack throughout Amazon and the open-source community. I’ll be expected to build impactful solutions for our extensive customer base and actively participate in discussions on design, code reviews, and communication with both internal and external stakeholders. I thrive in a startup-like environment where the focus is on innovation and prioritizing important initiatives. Our team promotes a culture of builders, emphasizing collaboration, technical ownership, and continuous learning, while ensuring that new members are supported. We cherish knowledge-sharing and mentorship, aiming to foster a conducive environment for career growth and technical excellence. Join us to tackle some of the most fascinating and influential challenges in AI/ML infrastructure today.
Something wrong or incorrect with this job? Tell us in the chat 💬 on the right ➡️
You can find Machine Learning Engineer salaries in the United States here.

How many Machine Learning Engineer jobs are in the United States?

Currently, there are 864 ML, AI openings. Check also: TensorFlow jobs, Python jobs, Computer-Vision jobs - all with salary brackets.

Is the US a good place for Machine Learning Engineers?

The US is one of the best countries to work as a Machine Learning Engineer. It has a vibrant startup community, growing tech hubs and, most important: lots of interesting jobs for people who work in tech.

Which companies are hiring for Machine Learning Engineer jobs in the United States?

Sperasoft, bunny.net, Giesecke+Devrient, Webistry, WWC Professional Corporation, Allied Technical Services Inc, Diploma Healthcare Group among others, are currently hiring for ML, AI roles in the United States.

The company with most openings is Leidos as they are hiring for 137 different Machine Learning Engineer jobs in the United States. They are probably quite committed to find good Machine Learning Engineers.