Site Reliability Engineer
New Yesterday
Overview
Gizmo is an AI startup on a mission to make learning so easy that anyone can learn anything. We\'re building Duolingo for anything - a platform that uses gamification and social mechanics to make learning fun.
With over 1 million monthly active users and $4M in annual recurring revenue, we\'re already one of the fastest-growing startups in the UK. Backed by leading investors, we recently raised $22M in Series A funding to accelerate our vision of helping 1 billion people learn.
Role Overview
Reporting to the CTO, you will own capacity, performance and reliability for Gizmo\'s full-stack platform as daily traffic climbs from hundreds of thousands to millions of users. You\'ll write code across the stack, but your charter is classic SRE: defend SLOs, eliminate toil, and raise the ceiling on scale before it becomes a hard limit.
Responsibilities
- Define SLIs/SLOs for latency, availability and error rate; codify error budgets and partner with product teams on trade-offs.
- Perform load-testing, capacity modelling and up-front scalability design for PostgreSQL, OpenSearch, Redis, Hasura and CF Workers; produce data-driven scaling plans.
- Extend metrics, structured logging and tracing; establish alert rules that page only on user-visible impact; build actionable runbooks.
- Join the on-call rotation, lead blameless post-mortems, drive remediation work to closure and track MTTR/MTBF improvements.
- Automate repetitive ops on Kubernetes and CI/CD; keep "toil" 50% of your time by pushing fixes into code.
- Coach full-stack engineers on query optimisation, schema design and back-pressure techniques; document patterns and anti-patterns by creating an SRE playbook
Qualifications
- Hands-on scale experience: you have run relational stores at 100 k+ TPS or 1 M+ concurrent users (e.g., multi-tenant PostgreSQL, sharded MySQL).
- Strong backend fundamentals around concurrency, caching, indexing and distributed systems trade-offs.
- Proven track record of setting SLOs, building dashboards (Prometheus/Grafana, OpenTelemetry, etc.) and tuning alerts.
- Comfort with Kubernetes, IaC and cloud-native patterns; can debug from network to application layer.
- Self-starter with a maker mindset. We\'re looking for ex-founders or individuals with start-up experience.
- Start-up bias for action: you prioritise high-leverage fixes, ship iteratively and own outcomes end-to-end.
- Collaborative and feedback-driven; you welcome post-mortem culture and continuous improvement.
- Driven by impact - you prioritise work that moves the needle!
- Nice-to-haves: experience with Hasura internals, Cloudflare Workers edge optimisation, or operating OpenSearch at scale.
Benefits
- Highly competitive salary.
- You'll own a piece of what you're building - equity included.
- Hybrid working model with 4 days in our East London office, ideally located between Shoreditch High Street, Old Street, and Liverpool Street stations.
- The opportunity to become one of the earliest employees in one of the UK’s fastest-growing startups.
- Private health insurance
- Location:
- City Of London, England, United Kingdom
- Salary:
- £100,000 - £125,000
- Job Type:
- FullTime
- Category:
- Engineering
We found some similar jobs based on your search
-
New Yesterday
Google Product Site Reliability Engineer
-
City Of London, England, United Kingdom
-
£100,000 - £125,000
- Engineering
Overview JOB TITLE: Google Product Site Reliability Engineer LOCATION(S): London HOURS: Full-time – 35 hours per week WORKING PATTERN: Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at our Lond...
More Details -
-
New Yesterday
Site Reliability Engineer with Python
-
City Of London, England, United Kingdom
-
£100,000 - £125,000
- Engineering
Overview Site Reliability Engineer with Python Our Client looking to bring on a site reliability engineer to help deploy, manage, troubleshoot, and enhance our complex cloud-based set of internal tools and externally managed services for a variety o...
More Details -
-
New Yesterday
Principal Site Reliability Engineer
-
City Of London, England, United Kingdom
-
£100,000 - £125,000
- Engineering
Overview Principal SiteReliability Engineer - Contract 6-12 Months+ Are you a highly experienced Principal Site Reliability Engineer with a passion for optimising systems and leading technical initiatives? You'll be instrumental in shaping the relia...
More Details -
-
New Yesterday
Site Reliability Engineer
-
City Of London, England, United Kingdom
-
£100,000 - £125,000
- Engineering
Overview Gizmo is an AI startup on a mission to make learning so easy that anyone can learn anything. We\'re building Duolingo for anything - a platform that uses gamification and social mechanics to make learning fun. With over 1 million monthly ac...
More Details -
-
New Yesterday
Site Reliability Engineer III
-
Belfast, Northern Ireland, United Kingdom
-
£100,000 - £125,000
- Engineering
Overview CME Group is seeking an SRE III to help, build, operate and scale systems in our Markets portfolio. Markets SREs work on products and applications related to CME’s Globex trading platform. Our systems deliver an exceptional combination of l...
More Details -
-
New Yesterday
Site Reliability Engineer
-
City Of London, England, United Kingdom
-
£100,000 - £125,000
- Engineering
Overview What if your coding skills could do more than just build software, what if they could make entire systems faster, smarter, and unbreakable? This hybrid role blends strong technical work with meaningful teamwork, offering flexibility alongsi...
More Details -