- Home
- Remote Jobs
- GPU Systems Engineer (CUDA)
GPU Systems Engineer (CUDA)
Job summary
Work model
Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications.
As we continue to grow, we're looking for a skilled GPU Systems Engineer (CUDA) to join our dynamic team and contribute to our mission of transforming business processes through technology.
This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential.
Position Details
- Location: 100% Remote (Continental United States)
- Position Type: In-house Bright Vision Technologies SOW engagement (no third-party client or vendor)
- Experience: 6+ years
- Salary: 100k - 150k
- Employment Type: Full-time, direct W2 (no C2C, no 1099, no third-party)
- Sponsorship: No new H1B sponsorship available. H1B transfers welcomed for qualified candidates.
Employment Terms & Visa Policy
This is a 100% remote, full-time, direct W2 position with Bright Vision Technologies. This role is part of our in-house Statement of Work (SOW) engagement. We do not engage in C2C, 1099, or third-party arrangements.
Candidates must be willing to work directly as a full-time W2 employee. While no new H1B sponsorship is available, we support H1B transfers for qualified candidates. A technical coding assessment is mandatory.
Job Summary
We are seeking a GPU Systems Engineer with deep expertise in CUDA programming, GPU architecture, and high-performance computing to design and optimize compute-intensive workloads. This role focuses on extracting maximum performance from GPU platforms for AI training, inference, scientific computing, and high-throughput data processing.
Key Responsibilities
- Design and implement high-performance CUDA kernels for compute-intensive workloads.
- Profile and optimize GPU code using Nsight Systems, Nsight Compute, and CUDA profilers.
- Tune memory access patterns, occupancy, register usage, and shared memory utilization.
- Develop highly optimized libraries for linear algebra, attention, and other ML primitives.
- Optimize multi-GPU and multi-node training using NCCL, RDMA, and high-performance networking.
- Implement custom operators and fused kernels in PyTorch, JAX, or Triton.
- Collaborate with ML engineers to identify performance bottlenecks.
- Develop benchmarks and regression tests to safeguard performance.
- Evaluate new GPU architectures and advise on adoption strategy.
- Implement mixed-precision and quantized compute paths.
Required Qualifications
- Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field.
- 6+ years of experience in GPU programming and performance engineering.
- Deep expertise in CUDA C/C++ and GPU programming models.
- Strong understanding of modern GPU architectures, memory hierarchies, and execution models.
- Hands-on experience profiling and optimizing GPU workloads in production.
- Familiarity with NCCL, MPI, and high-performance interconnect technologies.
- Experience integrating custom kernels into ML frameworks.
- Strong C++ skills and familiarity with modern systems programming practices.
- Solid grounding in linear algebra and numerical methods.
Preferred Qualifications
- Experience with Triton, CUTLASS, or other GPU kernel authoring frameworks.
- Familiarity with TensorRT, FasterTransformer, or vLLM internals.
- Exposure to compiler infrastructure such as LLVM or MLIR.
- Open-source contributions to GPU or ML performance libraries.
- Experience with large-scale distributed training infrastructure.
How To Apply
For immediate consideration, please send your resume to [email protected] or contact us at (908) 676-4399. Learn more at www.bvteck.com.
Bright Vision Technologies is an equal opportunity employer. We do not discriminate on the basis of any protected attribute.