Site Reliability Engineer Job Type - Permanent Salary/Rate - Based on demonstrated experience and qualifications *All applicants must me legally eligible to work in Canada* The Site Reliability Engineering team is responsible for building and maintaining the companies cloud-based infrastructure. This involves the use of modern tools and techniques to provision, deploy, configure, and monitor a wide array of technologies, from server less web applications to database clusters, traditional web-server farms to AKS and more. As a member of SRE team you will help to lead the planning, designing and implementation of many such services. You will work closely with various business units throughout the organization, mostly development team and networking team, to whom the team provides infrastructure in a rigorously defined set of services. The successful candidate must possess and be able to demonstrate a solid foundation of technical knowledge, meaning a full understanding of foundational concepts of Windows, Cloud infrastructure, Infrastructure as Code (IaC), general networking (TCP/IP), and programming/scripting. What we are looking for: 3+ years of experience as an SRE supporting production infrastructure. 5+ years of overall software engineering experience in a development environment. Bachelors degree in computer science and/or a wide range of relevant work experience. Extensive experience with Azure Devops and Windows systems. Experience with container orchestration platforms such as Kubernetes. Experience using IaaC tools such as Terraform, Docker, Helm, Packer, Ansible, ARM. Experience with configuration management tools such as Ansible, YAML and Terraform. Experience managing observability tools such as Grafana, Kibana and Prometheus. Experience with enterprise-grade software. Experience with software development. Experience with microservices architecture. At least two years of experience managing Kubernetes production systems. Experience with Power shell and Shell scripting Key Responsibilities Design and implement Kubernetes clusters according to business requirements, including scalability and security. Build and maintain Docker container for use in the AKS environment. Develop and maintain monitoring system to ensure the health and availability of SQL DBs, AKS clusters, file shares, service bus, web apps, etc. for production/Dev/Staging environments. Build and own infrastructure through code and work closely with development/systems/networking teams to automate CI/CD pipelines to remove repetitive manual process to simplify operational needs. Manage and optimize existing CI/CD pipelines. Design, architect and develop cloud native solution using services like AKS, Azure SQL, Azure functions, service bus, data factory on Azure cloud platform. and AWS experience(S3) Create and maintain technical documentation and build books. Deploy application packages and new workloads to production environment. Streamline and maintain QA and DEV environments that allows our developers and quality assurance teams to work more effectively and efficiently. Perform regular DR drills and maintain DRP by collaborating with systems and development teams. Identify and diagnose deficiencies with existing systems, frameworks, tools, and processes, and recommend creative solutions based on best practices and industry standards. Create dashboards that provide visibility into production metrics.
Job Title
Lead Devops Engineer