Overview

Our customer is a unicorn and one of the global leading PSP’s that is processing hundreds of millions of payment transactions a day across all continents with near real time requirements based in

Amsterdam / NL.

They serve the world’s brightest companies like Meta, Uber, eBay, H&M, Microsoft and Spotify.
As they are reshaping the payments landscape we’re looking for Senior SRE Engineers!

About the company & role:

You will be building the rails of a self-service data platform, creating an ecosystem that is bigger than the sum of its parts. By blending Site Reliability Engineering, Software Engineering, Systems Engineering, and Data Engineering, you will power the many data, machine learning, and GenAI products running across the company.

You’ll be joining a dedicated team of 9 engineers—split between kubernetes cluster management and the core services running on top of them. They work in a flexible, Kanban-style environment, sitting right in the middle of our users. This proximity gives us a direct feedback loop, allowing them to build impactful solutions for both the “happy flow” and the “sad flow.” Beyond operations, you’ll have the opportunity to design, build, and scale infrastructure from the ground up on their on-premise environments—solving problems typically handled by managed cloud providers yourself. If you thrive on tackling real-life challenges, reducing manual toil through automation, and want unparalleled growth opportunities in SWE, Systems, or Data Engineering, this is your team.

What you’ll do:

● Design & Build On-Premise (kubernetes) Infrastructure: Architect and scale modern, cloud-like services from the ground up on our on-premise infrastructure, managing core foundational layers including DNS, TLS, certification management, load balancers, and deep troubleshooting.

● Cluster Provisioning & Reliability: Build, maintain, and scale new Kubernetes clusters and Big Data services. You will maintain agreed SLOs, ensure high availability, and support end-users by keeping them unblocked.

● Mixed Workload Balancing: Prevent resource starvation by ensuring massive batch compute and ML training jobs do not consume resources required by critical, user-facing GenAI inference services and API gateways.

● Advanced Scheduling & Hardware Management: Enforce strict priority, preemption, and specialized scheduling policies (such as gang scheduling). Orchestrate diverse hardware profiles, managing GPU node pools, drivers, device plugins, and resource slicing to support intensive ML/AI processing.

● Storage & Network Optimization: Scale stateful workloads, Persistent Volumes (PVs), and high-throughput networking interfaces to handle massive data gravity and mitigate I/O bottlenecks.

● FinOps & Security: Implement intelligent autoscaling and interruptible instance management to control bursty infrastructure costs. Apply strict resource quotas, RBAC, and network policies to prevent “noisy neighbor” disruptions and guarantee secure isolation across different tenant teams.

● Automation & Operations: Dedicate time to the development of new features, applying releases, and building automations that eliminate unacceptable toil. Participate in an expanding 24×7 on-call roster to support the platform.

Who you are:

● Experienced Platform/SRE Professional: You have a strong background in System Administration and Kubernetes management, with proven experience building and operating distributed systems.

● Technical Expertise: You have hands-on experience with K8s, Linux, and foundational networking (DNS, TLS, Load Balancing, ArgoCD, GitOps).

● Tooling & Ecosystems: You are highly proficient with configuration management and/or networking tools (Ansible, Puppet, Cilium, HAProxy, Nginx) and/or distributed storage and data systems (Hadoop, Minio, Ozone, Ceph, Mayastor).

● Observability Mindset: You have experience implementing and managing alerting and monitoring to keep complex systems healthy.

● Good to have: A background in Software Engineering, specialized networking, or GPU management. Familiarity with data ecosystem tools like Airflow and HDFS is highly appreciated.

● Ambitious & Collaborative: You are eager to grow (whether on an IC, Tech Leadership, or People Leadership track) and enjoy working closely with your users and team members to solve complex, scale-driven problems.

Salary and benefits:

The salary is amongst the best in Amsterdam but depends on what you bring to the table in terms of skills, personality and experience.

Some of the secondary benefits: a permanent contract, 20% guaranteed bonus in stock for 4 years, 12K annual Adyen+ allowance (can be used as extra salary), healthcare, holiday allowance, holidays, laptop, travel allowance, relocation package for Internationals, training & conferences, 30% ruling, VISA sponsorship, luxury in office lunch, snacks/drinks, work from home setup, etc.

sHR.

Strategic Human Resources

Senior Site Reliability Engineer/Platform Engineer – Data Platforms Amsterdam, hybrid (3 days in office) (Relocation and Visa sponsorship included)

Full Time

sHR Consultancy 47

Netherlands

Posted 1 month ago

Overview

For application send your CV to “info@shr-consulting.com”

Senior Site Reliability Engineer/Platform Engineer – Data Platforms Amsterdam, hybrid (3 days in office) (Relocation and Visa sponsorship included)

Full Time sHR Consultancy 47 Netherlands Posted 1 month ago

Overview

For application send your CV to “info@shr-consulting.com”

Login

Full Time

sHR Consultancy 47

Netherlands

Posted 1 month ago