System Analyst
Job Description
- Must have experience in
- Network monitoring
- DB and Cache monitoring
- Kubernetes monitoring
- Security events monitoring
- Web server monitoring
- Critical server level monitoring(Golden metrics) for distributed
systems.
- Hands on experience in following monitoring tools :
- ELK
- Grafana
- Nagios
- Cacti
- RKE/Rancher
- Splunk
- Cloudtrail
- Hands on experience with the following APM tools :
- NewRelic
- AppDynamics
- Datadog
- Experienced in the concept of Continuous Monitoring(CM).
- SME in combining multiple data sources to get a clear picture of production systems
- SME for creating alerts related to platform security in above tools.
- Strong knowledge of Linux and Windows environments.
- Strong knowledge of cloud environments.
Good to Have (Technical)
- Scripting for automation
- Python
- Bash
- Containerization
- Docker (Basic knowledge)
- Container Orchestration
- Kubernetes (Basic knowledge)
- Infrastructure as Code
- Terraform (Basic knowledge)
Non-Functional Requirements
- Experienced in designing, implementing, deploying monitoring and alert management
stacks. - Experienced in designing incident response playbooks with clear escalation paths.
- Learn baseline patterns from various sources to build better alert thresholds.
- Be an individual contributor driving the continuous monitoring for customers.
- Good at documenting the underlying monitoring stack and any incident response playbooks.
- Good at collaborating with teams throughout IT to ensure that the proper use of the
monitoring system is understood.