1872 IT & Software Developer jobs in the US

Celestica International LP jobs

Senior GPU Server Validation Engineer

$82,000 - 122,000
Celestica International LP
Yonge Street 5140, Austin
$82,000 - 122,000
Company Size icon
Company Size
5k+
Company Type icon
Company Type
Product
Exp Level icon
Exp Level
Lead
Job Type icon
Job Type
Full-Time
Language icon
Language
English
Visa sponsorship icon
Visa sponsorship
No

Requirements

Must:
- Bachelors or Masters degree in Computer Science, Electrical Engineering, or related technical field. - Over 7 years of experience in hardware and/or software testing, with a minimum of 5 years focused on enterprise storage and server systems. - At least 3 years in a senior or lead technical position, providing mentorship to junior engineers or overseeing testing projects. - Proven leadership experience in guiding other engineers. - Deep knowledge of various storage technologies including NVMe, SAS/SATA SSDs/HDDs, RAID, and distributed file systems (e.g., Ceph, Lustre, GPFS). - Strong understanding of server architectures (x86, ARM, GPU servers), CPU/memory subsystems, PCIe, and power management. - Proficient in scripting languages (e.g., Python, Bash) for automation and data analysis. - Familiar with Linux operating systems (e.g., Ubuntu, CentOS, RHEL) and command-line tools. - Understanding of networking concepts (Ethernet, TCP/IP, InfiniBand) and network testing methods. - Experience with performance testing, reliability testing, stress testing, and fault injection methodologies. - Excellent problem-solving, analytical, and debugging skills. - Strong communication and interpersonal abilities for effective collaboration across diverse teams.

Technologies

AI
ARM
Ceph
Ethernet
Firmware
InfiniBand

Responsibilities

- Develop and execute comprehensive test plans and strategies for storage and server hardware, firmware, and software in the AI data center environment. - Lead the team in formulating, executing, and analyzing complex test cases, including functional, performance, reliability, stress, and endurance tests. - Mentor junior test engineers, promoting a culture of technical excellence and ongoing improvement. - Create and implement automated testing frameworks and scripts using languages like Python or Go to enhance testing efficiency and coverage. - Perform detailed performance analysis and identify bottlenecks in storage systems (e.g., NVMe, SSD, HDD arrays) and server platforms (e.g., CPU, GPU, PCIe). - Debug issues related to BMC functionality and its interactions with server hardware. - Build and maintain robust testbeds and infrastructure for continuous integration and validation. - Utilize open-source and commercial testing tools relevant to storage, server, and OpenBMC validation. - Collaborate closely with hardware design, software development, and AI engineering teams to ensure integrated testing throughout the product lifecycle. - Communicate testing progress, results, and critical issues effectively to stakeholders, including executive leadership. - Develop specialized testing methods to validate performance and reliability under demanding AI/ML workloads (e.g., large model training and inference at scale).

Description


At Celestica, we enable some of the worlds leading brands by partnering with them in various sectors including Aerospace, Defense, Communications, and HealthTech, among others. Our customer-focused approach ensures we address our clients most complex challenges effectively. As a key player in design, manufacturing, and supply chain solutions, we leverage our global expertise throughout the product development cycleโ€”from initial design to full-scale production and after-market services. Our headquarters is located in Toronto, with skilled teams across over 40 locations in 13 countries, and we are dedicated to imagining, developing, and delivering a better future in collaboration with our customers.
Something wrong or incorrect with this job? Tell us in the chat ๐Ÿ’ฌ on the right โžก๏ธ
You can find Machine Learning Engineer salaries in the United States here.

How many Machine Learning Engineer jobs are in the United States?

Currently, there are 1872 ML, AI openings. Check also: TensorFlow jobs, Python jobs, Computer-Vision jobs - all with salary brackets.

Is the US a good place for Machine Learning Engineers?

The US is one of the best countries to work as a Machine Learning Engineer. It has a vibrant startup community, growing tech hubs and, most important: lots of interesting jobs for people who work in tech.

Which companies are hiring for Machine Learning Engineer jobs in the United States?

Valnet Inc., Levven Electronics Ltd., Brain Box, Destination Algarve, Snaplii, Evolution United States, DataAnnotation among others, are currently hiring for ML, AI roles in the United States.

The company with most openings is Jobot as they are hiring for 170 different Machine Learning Engineer jobs in the United States. They are probably quite committed to find good Machine Learning Engineers.