Performance Architect
Role details
Job location
Tech stack
Job description
In this position, you will develop AI Storage Solutions based advanced system architectures and complex simulation models for Sandisk's next generation products. You will need to initiate and analyze changes to the architecture of the product. Typical activities include designing, programming, debugging, and modifying simulation models to evaluate these changes and assess the performance, power, and endurance of the product. You will work closely with excellent colleague engineers, cope with complex challenges, innovate, and develop products that will change the data centric architecture paradigm., + Build SystemC performance models for AI Storage Solutions based products covering end-to-end from GPU/TPU/NPU/xPU, host interface, memory hierarchy, basedie controller, and AI Storage Solutions using various packaging technolgies
- Responsible for improving the AI/ML ASIC Architecture performance through hardware & software co-optimization, post-silicon performance analysis, and influencing the strategic product roadmap.
- Workload analysis and characterization of ASIC and competitive datacenter and AI solutions to identify opportunities for performance improvement in our products.
- Collaboration with Architecture team to resolve performance issues and optimize the performance and TCO of their AI Storage Solutions based datacenter technologies.
- Experience modeling one or some components of AI/ML accelerator ASICs such as AI Storage Solutions, PCIe/UCIe/CXL, NoC, DMA, Firmware Interactions, NAND, xPU, fabrics, etc
- Performance modeling and optimization for multi-trillion parameter LLM training/inference including Dense, Mixture of Experts (MoE) with multiple modalities (text, vision, speech)
- Model/optimize novel parallelization strategies across tensor, pipeline, context, expert and data parallel dimensions
- Architect memory-efficient training systems utilizing techniques like structured pruning, quantization (MX formats), continuous batching/chunked prefill, speculative decoding
- Incorporate and extend SOTA models such as GPT-4, Reasoning models like Deepseek-R1, and multi-modal architectures
- Collaborate with internal and external stakeholders/ML researchers to disseminate results and iterate at rapid pace In the AI Storage Solutions Performance Architecture Group, we build on our depth in microarchitecture expertise and simulation to analyze and optimize high-performance ASIC designs for critical areas such as AI/MLAccelerators, cloud computing, and high-performance computing.
Requirements
-
Bachelors or Masters or PhD in Computer/Electrical Engineering with 5+ years of relevant experience in Performance Modeling, Simulation, and Analysis using SystemC
-
At least 5+ years of experience with SystemC modeling
-
Good understanding of computer/graphics architecture, ML, LLM
-
Experience of simulation using System C and TLM, behavioral modeling and performance analysis PREFERRED:
-
Previous experience with storage systems, protocols, and NAND flash - advantage
-
Deep experience optimizing large-scale ML systems, GPU architectures
-
Strong track record of technical leadership in GPU performance and workload analysis
-
Expert knowledge of transformer architectures, attention mechanisms, and model parallelism techniques
-
Experience with GPU or TPU and system microarchitecture
-
Proficiency in principles and methods of microarchitecture, software, and hardware relevant to performance engineering
-
Capable of developing wide system view for complex AI/ML Accelerator ASIC systems
-
Proficiency with SoC and system performance analysis fundamentals, tools, and techniques including hardware performance monitors and PERF profiling
-
Familiar with IO subsystem microarchitecture performance modeling and background in NVMe/PCIe//UCIe/CXL/NVLink microarchitecture and protocols is a plus
-
Multi-disciplinary experience, including familiarity with Firmware and ASIC design
-
PyTorch, CUDA, TensorRT, OpenAI Triton, and ONNX
-
Distributed systems: Ray, Megatron-LM
-
Performance analysis tools: NSight Compute, nvprof, PyTorch Profiler
-
KV cache optimization, Flash Attention, Mixture of Experts
-
High-speed networking: InfiniBand, RDMA, NVLink
-
Expertise in CUDA programming, GPU memory hierarchies, and hardware-specific optimizations
-
Proven track record architecting distributed training systems handling large scale systems
-
Experience with datacenter and AI workload analysis and optimization
-
Experience with multi-core systems and multi-thread interactions
-
Experience analyzing and optimizing workloads
Benefits & conditions
- An employee's pay position within the salary range may be based on several factors including but not limited to (1) relevant education; qualifications; certifications; and experience; (2) skills, ability, knowledge of the job; (3) performance, contribution and results; (4) geographic location; (5) shift; (6) internal and external equity; and (7) business and organizational needs.
- The salary range is what we believe to be the range of possible compensation for this role at the time of this posting. We may ultimately pay more or less than the posted range and this range is only applicable for jobs to be performed in California, Colorado, New York or remote jobs that can be performed in California, Colorado and New York. This range may be modified in the future.
- You will be eligible to participate in Sandisk's Short-Term Incentive (STI) Plan, which provides incentive awards based on Company and individual performance. Depending on your role and your performance, you may be eligible to participate in our annual Long-Term Incentive (LTI) program, which consists of restricted stock units (RSUs) or cash equivalents, pursuant to the terms of the LTI plan. Please note that not all roles are eligible to participate in the LTI program, and not all roles are eligible for equity under the LTI plan. RSU awards are also available to eligible new hires, subject to Sandisk's Standard Terms and Conditions for Restricted Stock Unit Awards.
- We offer a comprehensive package of benefits including paid vacation time; paid sick leave; medical/dental/vision insurance; life, accident and disability insurance; tax-advantaged flexible spending and health savings accounts; employee assistance program; other voluntary benefit programs such as supplemental life and AD&D, legal plan, pet insurance, critical illness, accident and hospital indemnity; tuition reimbursement; transit; the Applause Program, employee stock purchase plan, and the Sandisk's Savings 401(k) Plan.
- Note: No amount of pay is considered to be wages or compensation until such amount is earned, vested, and determinable. The amount and availability of any bonus, commission, benefits, or any other form of compensation and benefits that are allocable to a particular employee remains in the Company's sole discretion unless and until paid and may be modified at the Company's sole discretion, consistent with the law.