Possess a strong coding background with expertise in any language like Python, Go, or Java.
Ability to troubleshoot code errors efficiently, identifying root causes for effective communication with development teams.
Advocate for and implement best practices in code development, prioritizing maintainability, scalability, and reliability.
Keep current with industry trends and technologies, introducing innovative solutions to enhance system reliability and performance.
Leverage proven design patterns to connect databases, middleware’s, and other components, ensuring robust and fault-tolerant system interactions.
Actively contribute to the evolution of the organizations design patterns by researching and proposing enhancements based on industry trends.
Gather and analyze metrics from operating systems as well as critical application services to assist in quick identification of issues and faults.
Partner with development to improve the reliability of application services and release procedures.
Participate in system design consulting, platform management and capacity planning.
Should have deep understanding on Observability enablement for the different application stacks.
Complete control and understanding over production environments to up speed the issue identification and mitigation.
Define SLA, SLO and SLI for the product align with the organization requirements.
Administration of Linux Servers [Ubuntu & Amazon Linux].
Sound knowledge Incident management, On-call process and SDLC.
Expertise in toil reduction using multiple scripting languages – Terraform, Bash, Python.
Knowledge in RDMS and NoSQL observability, high availability and scalability.
Administration of AWS & Azure cloud service providers viz. EC2, RDS, S3, ECS, SNS, SES, CloudWatch, CDN, WAF, CloudFront, CloudTrail, R53, VPC, Routing, API Gateway, Lambda, IAM Roles, SG, Elastic Cache, Memcached, DynamoDB, CodeDeploy, CodeBuild, serverless etc.
Experience with container schedulers & orchestration implementation, configuration and administration of Kubernetes, Docker Swarm, OpenShift or AWS EKS/ECS.
Sound knowledge of Micro-service architecture & patterns.
Experience in IaC tools for automated provisioning stacks on cloud as well as on-premises using Terraform, Pulumi or CloudFormation.
Experience in implementing and driving Monitoring & Observability platforms – Elastic stack, Grafana, Prometheus, Graphite and APM tools - New Relic/AppDynamics/Dynatrace.
Good knowledge on impact assessment, release strategies, deployment methodologies, incident management and change management.
Demonstranten written and verbal communication skills, as well as the ability to work with multiple teams and stakeholders.