GPU Systems Engineer (CUDA)

Job summary

Plano
Software Developer

Work model

Fully remote
Only US
5 days ago
Job description

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications.

As we continue to grow, we're looking for a skilled GPU Systems Engineer (CUDA) to join our dynamic team and contribute to our mission of transforming business processes through technology.

This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential.

Position Details

  • Location: 100% Remote (Continental United States)
  • Position Type: In-house Bright Vision Technologies SOW engagement (no third-party client or vendor)
  • Experience: 6+ years
  • Salary: 100k - 150k
  • Employment Type: Full-time, direct W2 (no C2C, no 1099, no third-party)
  • Sponsorship: No new H1B sponsorship available. H1B transfers welcomed for qualified candidates.

Employment Terms & Visa Policy

This is a 100% remote, full-time, direct W2 position with Bright Vision Technologies. This role is part of our in-house Statement of Work (SOW) engagement. We do not engage in C2C, 1099, or third-party arrangements.

Candidates must be willing to work directly as a full-time W2 employee. While no new H1B sponsorship is available, we support H1B transfers for qualified candidates. A technical coding assessment is mandatory.

Job Summary

We are seeking a GPU Systems Engineer with deep expertise in CUDA programming, GPU architecture, and high-performance computing to design and optimize compute-intensive workloads. This role focuses on extracting maximum performance from GPU platforms for AI training, inference, scientific computing, and high-throughput data processing.

Key Responsibilities

  • Design and implement high-performance CUDA kernels for compute-intensive workloads.
  • Profile and optimize GPU code using Nsight Systems, Nsight Compute, and CUDA profilers.
  • Tune memory access patterns, occupancy, register usage, and shared memory utilization.
  • Develop highly optimized libraries for linear algebra, attention, and other ML primitives.
  • Optimize multi-GPU and multi-node training using NCCL, RDMA, and high-performance networking.
  • Implement custom operators and fused kernels in PyTorch, JAX, or Triton.
  • Collaborate with ML engineers to identify performance bottlenecks.
  • Develop benchmarks and regression tests to safeguard performance.
  • Evaluate new GPU architectures and advise on adoption strategy.
  • Implement mixed-precision and quantized compute paths.

Required Qualifications

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field.
  • 6+ years of experience in GPU programming and performance engineering.
  • Deep expertise in CUDA C/C++ and GPU programming models.
  • Strong understanding of modern GPU architectures, memory hierarchies, and execution models.
  • Hands-on experience profiling and optimizing GPU workloads in production.
  • Familiarity with NCCL, MPI, and high-performance interconnect technologies.
  • Experience integrating custom kernels into ML frameworks.
  • Strong C++ skills and familiarity with modern systems programming practices.
  • Solid grounding in linear algebra and numerical methods.

Preferred Qualifications

  • Experience with Triton, CUTLASS, or other GPU kernel authoring frameworks.
  • Familiarity with TensorRT, FasterTransformer, or vLLM internals.
  • Exposure to compiler infrastructure such as LLVM or MLIR.
  • Open-source contributions to GPU or ML performance libraries.
  • Experience with large-scale distributed training infrastructure.

How To Apply

For immediate consideration, please send your resume to [email protected] or contact us at (908) 676-4399. Learn more at www.bvteck.com.

Bright Vision Technologies is an equal opportunity employer. We do not discriminate on the basis of any protected attribute.