Staff DevOps Engineer | Research Infrastructure Operations
4 Days Old
Meet DeepL
DeepL is a global communications platform powered by Language AI. Since 2017, we’ve been on a mission to break down language barriers. Our human-sounding translations and intelligent writing suggestions are designed with enterprise security in mind. Today, they enable over 100,000 businesses to transform communications, reach new markets, and improve productivity. And, empower millions of individuals worldwide to make sense of the world and express their ideas.
Our goal is to become the global leader in Language AI, building products that drive better communication, foster connections, and make a real-life impact. To achieve this, we need talented individuals like you to join our exciting journey. If you\u2019re ready to work with a dynamic team and build your career in the fast-moving AI space, DeepL is your next destination.
What sets us apart
What sets us apart is our blend of modern technology, competitive benefits, and an open, welcoming work culture that enables our people to thrive. When we share what it\u2019s like to work at DeepL, the reactions are overwhelmingly positive. This may be because of our products that have helped countless people worldwide or our shared mission to improve communication for individuals and businesses, bringing cultures closer together. What we know for sure is this: being part of DeepL means joining a team dedicated to innovation and employee well-being. Discover what our teams have to say about life at DeepL on LinkedIn, Instagram and our Blog.
Meet the team behind this journey
Within the Infrastructure Operations and Security (IOPS) department, our data center unit manages all infrastructure systems across our remote sites. As a key member of the Research Infrastructure Operations (RIO) team, you will architect, design and operate our High-Performance Computing (HPC) infrastructure, making a fundamental contribution to our AI development.
You will work hands-on with our various Nvidia clusters, comprising thousands of GPUs. Given the scale and complexity of our workloads, it\u2019s not just about maintaining our systems, it\u2019s about elevating them. You will use your expertise in tooling and automation to improve the efficiency, reliability and performance of our infrastructure, taking our operations to the next level.
In this role, you will also coordinate with on-site staff and work closely with various teams within our organization. Joining our team means becoming part of a skilled group of engineers ready to support and kick-start your journey with us.
Your responsibilities
Design, plan, and implement automation for the maintenance and troubleshooting of our bare-metal GPU infrastructure
Benchmark and optimize the performance of our GPU infrastructure systems
Team up with researchers and developers to troubleshoot and fine-tune applications for HPC environments
Work on various projects and help keep our sites in a consistent, up to date and optimized state, on all aspects from firmware to architectural deployment plans
Support the team in case of unexpected issues, coordinate escalation to specialized teams when needed
Make your job easier by automating as much as possible using our advanced toolchain
Develop and implement custom monitoring checks to gain insights and respond to technical issues
Work with different hardware vendors in a top-notch, high-performance environment
About you
Extensive experience in management and troubleshooting of GPU compute clusters at scale
Proficiency in containerization and container orchestration technologies such as Docker and Kubernetes
Software engineering expertise and fluency in at least one programming language, preferably in Go.
Expertise in patch and OS management at scale
Experienced in Linux performance benchmarking, tuning, and troubleshooting
Familiarity with distributed storage solutions like Lustre and Ceph
Knowledgeable in networking technologies and protocols, including Ethernet and ideally Infiniband
Proactive and solution-oriented mindset
Excellent problem-solving skills
Initiative-driven and able to take ownership
What we offer
Diverse and internationally distributed team: joining our team means becoming part of a large, global community with people of more than 90 nationalities. We\u2019re more than just colleagues; we\u2019re a group of professionals with a shared mission to connect diverse cultures. Our global presence is growing we\u2019ve doubled in size nearly every year, with our employees based in the UK, Germany, the Netherlands, Poland, the US, and Japan, and we continue to expand our network.
Open communication, regular feedback: as a language-focused company, we value the importance of clear, honest communication. We value smooth collaboration, direct and actionable feedback, and believe that leading with empathy and growth mindset makes us better together.
Hybrid work, flexible hours: we offer a hybrid work schedule, with team members coming into the office twice a week. This allows you to engage directly with your team and experience the unique energy of our workspace, while still enjoying the flexibility and comfort of working from home. With flexible working hours and trust in your productivity, we are in sync with your team’s general locations and time zones to foster effective and seamless collaboration.
Regular in-person team events: we bond over vibrant events that are as unique as our team, from local team and business unit gatherings, to new-joiner onboardings, to company-wide events that bring us all together literally.
Monthly full-day hacking sessions: every month, we have Hack Fridays, where you can spend your time diving into a project you\u2019re passionate about and get the opportunity to work with other teamswalue your initiatives, impact, and creativity.
30 days of annual leave: we value your peace of mind. With 30 days off (excluding public holidays) and access to mental health resources, we make sure you\u2019re as strong mentally as you are professionally.
Virtual Shares: An ownership mindset in every role. We believe everyone should share in our success, and that’s why every employee receives Virtual Shares, linking your contribution directly to DeepL’s growth and rewarding you with a stake in our future.
Competitive benefits: just as our team spans the globe, so does our benefits package. We\u2019ve crafted it to reflect the diversity of our team and tailored it to align with your unique location, to ensure you feel supported every step of the way.
If this role and our mission resonate with you, but you\u2019re hesitant because you don\u2019t check all the boxes, don\u2019t let that hold you back. At DeepL, it\u2019s all about the value you bring and the growth we can foster together. Go ahead, apply let\u2019s discover your potential together. We can\u2019t wait to meet you!
We are an equal opportunity employer
You are welcome at DeepL for who you are we appreciate authenticity here. Our product is for everyone, and so is our workplace. The more voices we have represented and amplified in our business, the more we will all succeed, contribute, and think forward! So bring us your personal experience, your perspectives, and your background. It\u2019s in our diversity that we will find the power to break down language barriers in the world.
- Location:
- City Of London, England, United Kingdom
- Salary:
- £80,000 - £100,000
- Job Type:
- FullTime
- Category:
- IT & Technology
We found some similar jobs based on your search
-
New Today
Reservoir Engineer
-
Sunbury-On-Thames, England, United Kingdom
-
£125,000 - £150,000
- Engineering
Our purpose is to deliver energy to the world, today and tomorrow. For over 100 years, bp has focused on discovering, developing, and producing oil and gas in the nations where we operate. We are one of the few companies globally that can provide gov...
More Details -
-
New Today
Senior Software Engineer, Data
-
England, United Kingdom
-
£125,000 - £150,000
- IT & Technology
AGITProp is an AI-driven quantitative research firm that continues to push the boundaries of advanced modelling — from algorithmic trading to factor modelling and other cutting-edge applications. Quant firms have leveraged AI and ML for years, but th...
More Details -
-
New Today
Head of Governance & Integrity, GET STAFFED ONLINE RECRUITMENT LIMITED
-
City Of London, England, United Kingdom
-
£125,000 - £150,000
- Management & Operations
Overview Head of Governance & Integrity, GET STAFFED ONLINE RECRUITMENT LIMITED Join to apply for the Head of Governance & Integrity, GET STAFFED ONLINE RECRUITMENT LIMITED role at Guardian Jobs. Role Location: London | Salary: £43,000 - £51,000...
More Details -
-
New Today
Head of Finance Operations
-
England, United Kingdom
-
£125,000 - £150,000
- Finance
Job Description: Job information Head of Finance Operations from the Company RS Group , this latest Head of Finance Operations job vacancy is located in the city Warrington WA located in the country United Kingdom . This latest job opening i...
More Details -
-
New Today
Machine Learning Engineer
-
England, United Kingdom
-
£125,000 - £150,000
- Engineering
This range is provided by SGI. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Placing Top Fintech talent in the UK&I at SGI Machine learning Quantitative Engineer London - Hybrid working Key Re...
More Details -
-
New Today
Senior Software Engineer - Golang - Banking
-
Sheffield, England, United Kingdom
-
£125,000 - £150,000
- IT & Technology
Background and Role We are looking to onboard a Senior Software Engineer (Platform Engineer) on a contract basis for a large global bank. The Senior Software Engineer will join a small, dedicated 4-person team within Infrastructure in the Chief Techn...
More Details -