Site Reliability Engineer

20 Days Old

Join to apply for the Site Reliability Engineer role at GSS - Global Screening Services

About GSS

Hello. Welcome to GSS! We're transforming the global financial system with cutting-edge technology, including artificial intelligence and collaboration with top financial institutions. Our platform sets new standards in compliance screening for sanctions, making international payments faster, smoother, and friction-free. Join us in revolutionising the industry and making a real impact!

About The Role

This is an exciting opportunity to join our growing Operations team managing Kubernetes clusters in Production and, through a DevOps culture, empower development teams with observability insights they can use to innovate faster.

We are looking for a Site Reliability Engineer, or production experienced DevOps Engineer, who has working experience building observability for cloud native SaaS products and driving operational excellence.

You will be responsible for delivering our monitoring infrastructure, shaping observability, and responding to incidents as well as ensuring the platform is performant and reliable. You will be a key member of the team, liaising with product teams, embedding SRE principles and building the observability platform for the next stage of growth at GSS. You will have direct input into the direction of Technical Operations, solving problems, supporting developers and optimising the platform through code.

Plus, enjoy a collaborative, flexible, and innovative work culture where your ideas are valued.

What You’ll Do

Key responsibilities in this role will include (but not be limited to):

Leveraging core SRE values - measuring (SLI/SLO/SLA), testing, and eliminating toil via automation with appropriate Disaster Recovery planning
Refining KPIs to enable data-driven decision making for availability and reliability
Proactively analysing monitoring data to ensure production services are running optimally and cost-efficiently
Proactively tracking capacity, quotas and other performance indicators to plan for growth
Working with development teams to ensure new features are maintainable, have well defined SLIs, achievable SLOs, are properly monitored, and evaluated for failure scenarios
Enabling development teams through DevOps culture and the effective use of observability tools. Promote best practice, present KT sessions, help troubleshoot and resolve business affecting issues
Building on our existing monitoring tools to deliver a comprehensive, optimised observability platform for logging, metrics and tracing to ensure suitable alerting scope
Writing maintainable code to augment operations, scaling, resilience and observability
Debugging production issues, mitigating swiftly and preventing reoccurrence
Maintaining runbooks for manual tasks and replacing those runbooks with automation wherever viable
Supporting junior members of the team to adopt best practice
Participating in 24x7 on-call rotation, incident response, escalation, RCA and blameless post-mortems

Ideal Experience

What you’ll need

At least 3 years’ experience within a production, SaaS company (preferably event-driven)
Be a self-starter that relishes responsibility. Take strategic direction and own end to end delivery of solutions.
Expert knowledge of SRE fundamentals and a commitment to best practice
Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch
Experience analysing and building data telemetry, querying (PromQL), modelling, pipelines and dashboards to provide concise, focused insights and alerts for distributed systems
Strong experience with Python and/or GoLang
Java (SpringBoot and Micrometer) useful
Demonstrable experience working with AWS services like SQS, EKS, RDS, VPC, EC2, Cloudwatch (X-Ray, Metrics and Logs), Lambda
Solid knowledge of Linux systems and bash scripting
Strong knowledge of networking and common protocols (TCP, DNS, TLS, HTTP)
Experience with DevOps principles and tooling such as Infrastructure as Code (Terraform) and CI/CD (GitHub Actions, Jenkins)
Knowledge of stream processing technologies like Kafka would be useful
Experience working with ITSM systems like JSM, Zendesk or ServiceNow
Experience building/maintaining automated incident management workflows
Experience developing with containers and container orchestration (Docker & Kubernetes)
Working knowledge and experience with Agile software development practices
Strong communication, collaboration and documentation skills with proven experience working cross-functionally
Ability to think about distributed systems in terms of failure modes and bottlenecks
BSc/MSc in Computer Science, a related technical discipline, or equivalent experience
Financial Services experience (or similar regulated industry) a bonus, but not essential
Experience participating in Incident Response

What You Get In Return

Impactful Work: Be part of a growing startup where your contributions make a real difference.

Generous Leave: Enjoy 30 days of holiday (plus bank holidays).

Comprehensive Benefits: Including a generous pension scheme, private medical insurance, and life assurance.

️ Wellbeing Perks: Access to EAP, YuLife, holistic wellbeing programs, and a Virtual GP for your health and happiness.

️ Flexibility: Hybrid working environment (we are open to remote working for some roles, please check with us at application) with a ‘work abroad’ policy for up to 4 weeks a year.

Learning: Access to Udemy, a learning platform with thousands of top-rated courses to develop both tech and business skills.

Ready to revolutionise finance and have fun doing it? Join GSS where we live by our values: Be Respectful, Be Bold and Take Ownership. Come join us and take your career to new heights!

Diversity statement

We are an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to, among other things, race, religion, gender, sexual orientation, gender identity, national origin, age or disability.

Seniority level

Seniority level
Mid-Senior level

Employment type

Employment type
Full-time

Job function

Job function
Engineering and Information Technology
Industries
Software Development

Referrals increase your chances of interviewing at GSS - Global Screening Services by 2x

Sign in to set job alerts for “Site Reliability Engineer” roles.

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 3 weeks ago

Greater London, England, United Kingdom 2 months ago

London, England, United Kingdom 9 hours ago

London, England, United Kingdom 5 days ago

London, England, United Kingdom 6 days ago

London, England, United Kingdom 1 week ago

South Croydon, England, United Kingdom 1 week ago

City Of London, England, United Kingdom 1 week ago

London, England, United Kingdom 6 days ago

London, England, United Kingdom 1 day ago

London, England, United Kingdom 2 months ago

London, England, United Kingdom 2 weeks ago

Greater London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 6 days ago

London, England, United Kingdom 2 months ago

Site Reliability Engineer, Traffic Platform

London, England, United Kingdom 1 week ago

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 1 week ago

London, England, United Kingdom 2 weeks ago

City Of London, England, United Kingdom £80,000.00-£100,000.00 3 weeks ago

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 4 months ago

London, England, United Kingdom 1 day ago

London, England, United Kingdom 1 week ago

London, England, United Kingdom 6 days ago

London, England, United Kingdom 5 days ago

London, England, United Kingdom 3 weeks ago

Site Reliability Engineer – Field Operations

London, England, United Kingdom 1 week ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr

Apply

Location:: London, England, United Kingdom
Salary:: £150,000 - £200,000
Category:: Engineering

Start a New Search

Site Reliability Engineer

Seniority level

Seniority level

Employment type

Employment type

Job function

Job function

Industries

Sign in to set job alerts for “Site Reliability Engineer” roles.

Site Reliability Engineer, Traffic Platform

Site Reliability Engineer – Field Operations

We found some similar jobs based on your search

Site Reliability Engineer

Site Reliability Engineer

Junior Site Reliability Engineer

Senior Site Reliability Engineer - Midnight

Site Reliability Engineer - Multi Cloud

Site Reliability Engineer