DescriptionThe Site Reliability Engineer (SRE) is responsible for maintaining high standards of quality customer service and support. In this role, you will be providing front-line customer support for our flagship product, Metworx. The Metworx product is delivered as a Platform-as-a-Service to our clients and provides a stable, scalable, and reproducible computing environment which abstracts away many of the technical and regulatory complexities involved in running systems for high performance computing and scientific analysis. This allows our users to focus on science and not infrastructure. The platform leverages many AWS services under the hood, including EC2 and ParallelCluster.● Serve as the front-line technical resource for troubleshooting and resolving customer issues related to the Company's Linux-based AWS platform.● Provide exceptional technical support to internal and external stakeholders, ensuring timely resolution of issues within established SLAs.● Document and escalate complex issues to senior technical resources as needed while striving to independently resolve more advanced issues over time.● Monitor and respond to technical incidents, identify root causes, and collaborate with internal teams to implement long-term solutions.● Write and maintain knowledge base articles and training materials for end users and internal teams.● Collaborate closely with the Product, Quality Assurance, Engineering, and Operations teams to ensure alignment and a seamless user experience.● Document product use cases, enhancements, and bug fixes; advocate for product improvement based on user feedback.● Participate in on-call rotations to provide 24/7 operational support.● Maintain strong relationships with customers and stakeholders, striving for exceptional satisfaction and engagement.● Become familiar with the secure use and management of the AWS control plane to ensure compliance with security and data privacy standards as expertise develops.● Oversee the availability and performance of production and development environments, ensuring alignment with SLAs and industry best practices.Requirements● 3+ years of experience in technical customer support or service desk environments, with a focus on technical product support.● 5+ years of experience in cloud computing and infrastructure management.● Strong knowledge of Amazon Web Services (AWS), including containerized applications (EKS, ECS, ECR, Elastic Beanstalk).● Proficiency in Linux administration, including user management, software installation, and file system management.● Familiarity with networking concepts and DNS.● Proficiency with versioning tools (Git, svn).● Excellent oral and written English communication skills with a customer-centric perspective.● Strong troubleshooting and critical thinking skills.● Ability to work both independently and collaboratively in a team environment.● High attention to detail and organizational skills.Preferred Qualifications:● AWS certification (any level).● 4-year college degree in a technical or quantitative science field, or equivalent work experience.● Experience supporting end users in a service desk or technical customer support environment.● Familiarity with virtualized infrastructure management and security best practices.Education and Experience:● Bachelor’s degree in a technical or quantitative science field or equivalent work experience.● Additional training or certifications in virtualized infrastructure-related tasks are preferred.● Evidence of proficiency in SaaS or PaaS product support, programming languages, and virtual infrastructure management.Key Attributes:● Strong customer service orientation and the ability to advocate for end-user satisfaction.● Disposition to engage directly with customers in a professional and empathetic manner.● Technical acumen to troubleshoot and resolve complex issues independently while escalating appropriately when necessary.● Passion for continuous learning and staying updated on the latest industry practices and technologies.● Eagerness to grow into advanced responsibilitiesBenefitsThis is a client-facing support role for a highly technical product offered as a Platform-as-a-Service (PaaS) to our clients. Successful applicants will possess a combination of technical skills and communication skills to support our international clients using English as the common language. There are multiple rotating shifts available, as we will be supporting international customers between the general hours of 5 am-9 pm Indian Standard Time (IST)This role is responsible for supporting a proprietary, Linux-based and AWS-hosted Platform-as-a-Service offering, ensuring both operational excellence and outstanding customer satisfaction. Initially focused on front-line technical product support, the SRE will have opportunities to grow into more advanced positions and responsibilities. The ideal candidate will possess the technical expertise and interpersonal skills necessary to engage directly with endusers, resolving complex issues while continuously improving platform reliability and performance.Site Reliability Engineer I focuses on providing front-line technical product support, and troubleshooting customer issues. Site Reliability Engineer II demonstrates the same technical capability while also taking on advanced infrastructure responsibilities, leading infrastructure initiatives, and mentoring junior team members.Location : Kolkata OnsiteSalary Range : 8 LPA to 10 LPARotating shifts : 5 am-9 pm Indian Standard Time (IST)
Job Title
Site Reliability Engineer