Skip to Main Content

Job Title


Site Reliability Engineer (Azure)


Company : Optym


Location : Bengaluru, Karnataka


Created : 2025-03-22


Job Type : Full Time


Job Description

Company Overview: Founded in 2000, Optym is building SaaS solutions for the transportation and logistics industry and making it more efficient. Optym’s software solutions are used by leading railroads, airlines and trucking companies, and have created a cumulative business value of over $1 billion for its clients. With its headquarters based in Dallas, Texas, and centers of excellence located in India. Optym’s team consists of 250+ professionals. Optym has about 50 highly specialized professionals in US and is expecting a major growth in the next five years. Optym offers competitive wages, excellent benefits, a great working environment, and the culture of entrepreneurship and ownership. Optym offers a generous profit and equity sharing plan with the potential to increase your compensation substantially salary based on the success of Optym. Responsibilities Monitoring Systems : Continuously monitoring the health and performance of software systems using various monitoring tools. Proactive monitoring of infrastructure and report anomalies timely Documenting observations and providing recommendations to optimize the infrastructure for performance, reliability, and cost efficiency. Alerts : Responding to alerts about system issues and taking necessary actions to resolve them. Define new alerts for proactive actions and reduce the response time Incident Management : Coordinating with different team members and stakeholders to handle and resolve incidents related to production software systems. Performing incident RCA and provides corrective action with documentation Understanding server and service configuration best practices; debugging and resolving issues Automation : Assisting in the automation implementation to reduce manual intervention and improve system reliability. System Changes Assistance : Collaborating with development teams to design systems that are reliable, scalable, and maintainable. Assisting in Azure cloud Administration activities globally by coordinating with developers, peers, and other stakeholders in the organization Providing deployment support and troubleshooting of application Infrastructure. Documentation : Maintaining and improving documentation for systems and processes. Requirements B. Tech/ B.E. in Computer Science, BCA, IT, or related field 2-4 years of experience overall with minimum 1-year hands-on experience on Azure Minimum two years’ experience in Production support Has the ability to work independently at a fast pace, as well as in a team environment on a variety of project settings Continuously learn new skills where required. Possesses effective communication skills as this role requires extensive communication across different domains Able to have flexible working hours and work with globally distributed teams On-Call Support: Address critical incidents and maintenance activity outside of regular business hours Mandatory Skills Hands-on experience Azure cloud infrastructure administration & support. Kubernetes, Docker, PowerShell, Python, Azure ARM templates Proactive Infrastructure Monitoring via tools like Azure Monitor, Datadog, etc. Proactive Application monitoring via Application insights, Datadog etc. Triaging issue with Azure Support and Application Team Should be able to work with Linux/Unix environment Experience maintaining Azure DevOps CI/CD pipelines. Preferred Certifications Microsoft Azure Administrator- Exam AZ-103/104 Certified Kubernetes Administrator Desirable Skills Server Environment: Windows Server 2012 and above, CentOS 7 and above Virtualization environment: Basic knowledge of VMware vSphere and vCenter Experience in managing full three tier application stacks from the OS up through custom applications Database Administration and troubleshooting databases like MS SQL, PostgreSQL