What You Will Work On:
As an intern on the Performance and Quality (PAQ) team, you will work on analyzing, optimizing, and safeguarding the performance of large language models and generative AI systems. This includes identifying performance bottlenecks in LLM inference pipelines and developing automation frameworks to streamline performance testing. A key focus will be designing and implementing performance analysis workflows. This includes regression detection pipelines, speed-of-light analyses, and benchmarking across several inference frameworks. You may also contribute to optimizing model serving infrastructure, investigating memory and compute efficiency, or establishing performance baselines and alerting systems.
LOCATION: Candidates based in the United States are welcome to apply. To support growth and collaboration, all interns will work in a hybrid capacity at our Los Altos, CA office (minimum 2 days per week on-site) with relocation assistance provided for out-of-state candidates.
What You Will Learn:
- Hands-on experience profiling and optimizing LLM inference workloads at scale.
- How to design performance regression detection systems and integrate them into CI/CD workflows.
- Techniques for building performance analysis tooling, automated benchmarking pipelines, and observability infrastructure for AI systems.
- Hands-on experience with GPU/accelerator performance analysis, model inference optimization, and systems-level bottlenecks.
- Mentorship from experienced engineers working at the intersection of ML and systems performance.
What you bring to the table:
- Currently pursuing a Bachelor's, Master's, or PhD degree in Computer Science, Computer Engineering, or a related field, with graduation expected by Spring 2027 at the latest.
- Proficiency in Python; experience with systems performance analysis, C++, or systems-level programming is a strong plus.
- Familiarity with profiling tools, benchmarking methodologies, CI/CD systems, or performance optimization techniques. Experience with tools such as NVIDIA Nsight Systems (nsys), Nsight Compute, PyTorch Profiler, Linux perf, or Intel VTune is a plus.
- Experience and interest in designing and building automated performance analysis workflows.
- Strong problem-solving skills and a passion for building tools and robust workflows to improve system reliability and developer productivity.
What Modular brings to the table:
- Amazing Team. We are a progressive and agile team with some of the industry’s best engineering and product leaders.
- Competitive Compensation. We offer very strong compensation packages, including stock options. We want people to be focused on their best work and believe in tailoring compensation plans to meet the needs of our workforce.
- Team Building Events. We organize regular team onsites and local meetups in Los Altos, CA.
Working at Modular will enable you to grow quickly as you work alongside incredibly motivated and talented people who have high standards, possess a growth mindset, and a purpose to truly change the world.
The estimated base hourly range for this role is $47.00 - $65.00 USD.
The hourly rate for the successful applicant will depend on a variety of permissible, non-discriminatory job-related factors, which include but are not limited to education, training, work experience, business needs, or market demands. This range may be modified in the future.
For candidates who fall outside of the listed requirements, we nevertheless encourage you to apply as we may have openings that are lower/higher level than the ones advertised.