Site Reliability
Engineer (Automation and DevOps)
Location:
Dublin, Ireland
Key
Responsibilities
· Plan, manage, and
oversee all aspects of a production environment
· Define strategies for
application performance monitoring and optimisation in a production environment
· Respond to incidents
· Improvise platform
based on feedback and measure the reduction of incidents over time
· Support deployment of
code into multiple lower environments
· Support current
processes with an emphasis on automating everything as soon as possible
· Design, develop and
standardise a monitoring and alerting mechanism for the supported applications
· Take a holistic
approach to problem-solving, by connecting the dots during a production event
through the various technology stack that makes up the platform, to optimising
meantime to recover
· Engage in and improve
the whole lifecycle of services - from inception and design, through
deployment, operation and refinement
· Analyse ITSM
activities of the platform and provide feedback loop to Development teams on
operational gaps or resiliency concerns
· Support services
before they go live through activities such as system design consulting,
capacity planning and launch reviews
· Support the
application CI/CD pipeline for promoting software into higher environments
through validation and operational gating, and lead in DevOps automation and
best practices
· Maintain services
once they are live by measuring and monitoring availability, latency and
overall system health
· Scale systems
sustainably through mechanisms like automation and evolving systems by pushing
for changes that improve reliability and velocity
· Work with a global
team spread across tech hubs in multiple geographies and time zones
· Ability to share
knowledge and explain processes and procedures to others
· Share knowledge and
mentor Junior resources
· Ability to perform
on-call duties on a rotational basis
· Occasional off-hours
work required
Skills
Required
Must
have:
· Linux
· Mainframe
· Shell scripting
· ITIL / ITSM
· Application troubleshooting
· SQL
· Any monitoring tool (Splunk
/ Dynatrace preferred)
· Jenkins - CI/CD
· Groovy scripting / YAML
(basic)
· Git (basic) / Bitbucket
(basic)
Good
to have:
· Ansible / Chef
· Event framework
architecture