Site Reliability Engineer
20 Days Old
Join to apply for the Site Reliability Engineer role at GSS - Global Screening Services
Join to apply for the Site Reliability Engineer role at GSS - Global Screening Services
About GSS
Hello. Welcome to GSS! We're transforming the global financial system with cutting-edge technology, including artificial intelligence and collaboration with top financial institutions. Our platform sets new standards in compliance screening for sanctions, making international payments faster, smoother, and friction-free. Join us in revolutionising the industry and making a real impact!
About GSS
Hello. Welcome to GSS! We're transforming the global financial system with cutting-edge technology, including artificial intelligence and collaboration with top financial institutions. Our platform sets new standards in compliance screening for sanctions, making international payments faster, smoother, and friction-free. Join us in revolutionising the industry and making a real impact!
About The Role
This is an exciting opportunity to join our growing Operations team managing Kubernetes clusters in Production and, through a DevOps culture, empower development teams with observability insights they can use to innovate faster.
We are looking for a Site Reliability Engineer, or production experienced DevOps Engineer, who has working experience building observability for cloud native SaaS products and driving operational excellence.
You will be responsible for delivering our monitoring infrastructure, shaping observability, and responding to incidents as well as ensuring the platform is performant and reliable. You will be a key member of the team, liaising with product teams, embedding SRE principles and building the observability platform for the next stage of growth at GSS. You will have direct input into the direction of Technical Operations, solving problems, supporting developers and optimising the platform through code.
Plus, enjoy a collaborative, flexible, and innovative work culture where your ideas are valued.
What You’ll Do
Key responsibilities in this role will include (but not be limited to):
- Leveraging core SRE values - measuring (SLI/SLO/SLA), testing, and eliminating toil via automation with appropriate Disaster Recovery planning
- Refining KPIs to enable data-driven decision making for availability and reliability
- Proactively analysing monitoring data to ensure production services are running optimally and cost-efficiently
- Proactively tracking capacity, quotas and other performance indicators to plan for growth
- Working with development teams to ensure new features are maintainable, have well defined SLIs, achievable SLOs, are properly monitored, and evaluated for failure scenarios
- Enabling development teams through DevOps culture and the effective use of observability tools. Promote best practice, present KT sessions, help troubleshoot and resolve business affecting issues
- Building on our existing monitoring tools to deliver a comprehensive, optimised observability platform for logging, metrics and tracing to ensure suitable alerting scope
- Writing maintainable code to augment operations, scaling, resilience and observability
- Debugging production issues, mitigating swiftly and preventing reoccurrence
- Maintaining runbooks for manual tasks and replacing those runbooks with automation wherever viable
- Supporting junior members of the team to adopt best practice
- Participating in 24x7 on-call rotation, incident response, escalation, RCA and blameless post-mortems
What you’ll need
- At least 3 years’ experience within a production, SaaS company (preferably event-driven)
- Be a self-starter that relishes responsibility. Take strategic direction and own end to end delivery of solutions.
- Expert knowledge of SRE fundamentals and a commitment to best practice
- Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch
- Experience analysing and building data telemetry, querying (PromQL), modelling, pipelines and dashboards to provide concise, focused insights and alerts for distributed systems
- Strong experience with Python and/or GoLang
- Java (SpringBoot and Micrometer) useful
- Demonstrable experience working with AWS services like SQS, EKS, RDS, VPC, EC2, Cloudwatch (X-Ray, Metrics and Logs), Lambda
- Solid knowledge of Linux systems and bash scripting
- Strong knowledge of networking and common protocols (TCP, DNS, TLS, HTTP)
- Experience with DevOps principles and tooling such as Infrastructure as Code (Terraform) and CI/CD (GitHub Actions, Jenkins)
- Knowledge of stream processing technologies like Kafka would be useful
- Experience working with ITSM systems like JSM, Zendesk or ServiceNow
- Experience building/maintaining automated incident management workflows
- Experience developing with containers and container orchestration (Docker & Kubernetes)
- Working knowledge and experience with Agile software development practices
- Strong communication, collaboration and documentation skills with proven experience working cross-functionally
- Ability to think about distributed systems in terms of failure modes and bottlenecks
- BSc/MSc in Computer Science, a related technical discipline, or equivalent experience
- Financial Services experience (or similar regulated industry) a bonus, but not essential
- Experience participating in Incident Response
Impactful Work: Be part of a growing startup where your contributions make a real difference.
Generous Leave: Enjoy 30 days of holiday (plus bank holidays).
Comprehensive Benefits: Including a generous pension scheme, private medical insurance, and life assurance.
️ Wellbeing Perks: Access to EAP, YuLife, holistic wellbeing programs, and a Virtual GP for your health and happiness.
️ Flexibility: Hybrid working environment (we are open to remote working for some roles, please check with us at application) with a ‘work abroad’ policy for up to 4 weeks a year.
Learning: Access to Udemy, a learning platform with thousands of top-rated courses to develop both tech and business skills.
Ready to revolutionise finance and have fun doing it? Join GSS where we live by our values: Be Respectful, Be Bold and Take Ownership. Come join us and take your career to new heights!
Diversity statement
We are an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to, among other things, race, religion, gender, sexual orientation, gender identity, national origin, age or disability.
Seniority level
Seniority level
Mid-Senior level
Employment type
Employment type
Full-time
Job function
Job function
Engineering and Information TechnologyIndustries
Software Development
Referrals increase your chances of interviewing at GSS - Global Screening Services by 2x
Sign in to set job alerts for “Site Reliability Engineer” roles.
London, England, United Kingdom 2 weeks ago
London, England, United Kingdom 3 weeks ago
Greater London, England, United Kingdom 2 months ago
London, England, United Kingdom 9 hours ago
London, England, United Kingdom 5 days ago
London, England, United Kingdom 6 days ago
London, England, United Kingdom 1 week ago
South Croydon, England, United Kingdom 1 week ago
City Of London, England, United Kingdom 1 week ago
London, England, United Kingdom 6 days ago
London, England, United Kingdom 1 day ago
London, England, United Kingdom 2 months ago
London, England, United Kingdom 2 weeks ago
Greater London, England, United Kingdom 2 weeks ago
London, England, United Kingdom 2 weeks ago
London, England, United Kingdom 6 days ago
London, England, United Kingdom 2 months ago
Site Reliability Engineer, Traffic Platform
London, England, United Kingdom 1 week ago
London, England, United Kingdom 2 weeks ago
London, England, United Kingdom 1 week ago
London, England, United Kingdom 2 weeks ago
City Of London, England, United Kingdom £80,000.00-£100,000.00 3 weeks ago
London, England, United Kingdom 2 weeks ago
London, England, United Kingdom 4 months ago
London, England, United Kingdom 1 day ago
London, England, United Kingdom 1 week ago
London, England, United Kingdom 6 days ago
London, England, United Kingdom 5 days ago
London, England, United Kingdom 5 days ago
London, England, United Kingdom 3 weeks ago
Site Reliability Engineer – Field Operations
London, England, United Kingdom 1 week ago
London, England, United Kingdom 1 week ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr- Location:
- London, England, United Kingdom
- Salary:
- £150,000 - £200,000
- Category:
- Engineering
We found some similar jobs based on your search
-
New Yesterday
Site Reliability Engineer
-
Manchester, England, United Kingdom
-
£100,000 - £125,000
- Engineering
Sectigo Manchester, England, United Kingdom Site Reliability Engineer Sectigo Manchester, England, United Kingdom 20 hours ago Be among the first 25 applicants Job Description We are looking for a Site Reliability Engineer to join our growing global team at Sectigo. Job Description We are looking for a Site Reliability Engineer to join...
More Details -
-
1 Days Old
Site Reliability Engineer
-
Bristol, England, United Kingdom
-
£100,000 - £125,000
- Engineering
Site Reliability EngineerHybrid – Bristol (with occasional travel to other sites & possible 24/7 callout when on rota | £80,000 – £110,000 DOEJoin a Team Built on Technical ExcellenceAt TwinStream, we’re not just technologists—we're mission-driven engineers solving some of the UK government’s most complex cross-domain challenges.Founded...
More Details -
-
1 Days Old
Junior Site Reliability Engineer
-
London, England, United Kingdom
-
£150,000 - £200,000
- Engineering
Social network you want to login/join with: Junior Site Reliability Engineer, London Client: Trust In SODA Location: London, United Kingdom Job Category: Other - EU work permit required: Yes Job Views: 6 Posted: 16.06.2025 Expiry Date: 31.07.2025 Job Description: Role: Junior Site Reliability Engineer Sector:...
More Details -
-
1 Days Old
Senior Site Reliability Engineer - Midnight
-
United Kingdom
-
£80,000 - £100,000
- Engineering
Senior Site Reliability Engineer - Midnight Senior Site Reliability Engineer - Midnight Who are we? IOG, is a technology company focused on Blockchain research and development. We are renowned for our scientific approach to blockchain development, emphasizing peer-reviewed research and formal methods to ensure security, scalability,...
More Details -
-
1 Days Old
Site Reliability Engineer - Multi Cloud
-
Dartford, England, United Kingdom
-
£100,000 - £125,000
- Engineering
Social network you want to login/join with: Site Reliability Engineer - Multi Cloud, Dartford col-narrow-left Client: iO Associates - UK/EU Location: Dartford, United Kingdom Job Category: Other - EU work permit required: Yes col-narrow-right Job Views: 5 Posted: 16.06.2025 Expiry Date: 31.07.2025 col-wide Job Description: Site...
More Details -
-
1 Days Old
Site Reliability Engineer
-
Scotland, United Kingdom
-
£100,000 - £125,000
- Engineering
Curve Dental is looking for a skilled and driven individual to join our team! Curve provides Dental Practices with award winning software and high-level customer support. Our software allows dentists to manage their full business including patient scheduling, billing, imaging and record keeping. Beyond the day to day business we are...
More Details -