About UsAt Valiance, we are building next-generation AI solutions to solve high-impact business problems. As part of our AI/ML team, you’ll work on deploying cutting-edge Gen AI models, optimizing performance, and enabling scalable experimentation.Role OverviewWe are looking for a skilled MLOps Engineer with hands-on experience in deploying open-source Generative AI models on cloud and on-prem environments. The ideal candidate should be adept at setting up scalable infrastructure, observability, and experimentation stacks while optimizing for performance and cost.ResponsibilitiesDeploy and manage open-source Gen AI models (e.g., LLaMA, Mistral, Stable Diffusion) on cloud and on-prem environmentsSet up and maintain observability stacks (e.g., Prometheus, Grafana, OpenTelemetry) for monitoring Gen AI model health and performanceOptimize infrastructure for latency, throughput, and cost-efficiency in GPU/CPU-intensive environmentsBuild and manage an experimentation stack to enable rapid testing of various open-source Gen AI modelsWork closely with ML scientists and data teams to streamline model deployment pipelinesMaintain CI/CD workflows and automate key stages of the model lifecycleLeverage NVIDIA tools (Triton Inference Server, TensorRT, CUDA, etc.) to improve model serving performance (preferred)Required Skills & QualificationsStrong experience in deploying ML/Gen AI models using Kubernetes, Docker, and CI/CD toolsProficiency in Python, Bash scripting, and infrastructure-as-code tools (e.g., Terraform, Helm)Experience with ML observability and monitoring stacksFamiliarity with cloud services (GCP, AWS, or Azure) and/or on-prem environmentsExposure to model tracking tools like MLflow, Weights & Biases, or similarBachelor’s/Master’s in Computer Science, Engineering, or related fieldNice to HaveHands-on experience with NVIDIA ecosystem (Triton, CUDA, TensorRT, NGC)Familiarity with serving frameworks like vLLM, DeepSpeed, or Hugging Face Transformers
Job Title
Machine Learning Engineer