Senior Site Reliability Engineer
Eto Labs
Location
HQ
Employment Type
Full time
Location Type
Remote
Department
LanceDB Team
Compensation
- $160K – $220K • Offers Equity
Compensation & Benefits
LanceDB is committed to equitable and competitive compensation. Actual compensation is determined based on factors such as skills, experience, location, and the interview process.
In addition to base salary, new hires receive compelling equity grants and access to a comprehensive benefits package including:
Medical, dental, vision, and life insurance
401(k) retirement plan
Flexible Spending Accounts (FSA) and Health Savings Accounts (HSA)
Commuter benefits
Generous paid time off
About LanceDB
LanceDB is a developer-friendly, open-source data lake for multimodal AI. From hyper-scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large-scale AI datasets, LanceDB is the best foundation for your AI application, and powers some of the most groundbreaking applications and challenging requirements today.
About the Role
We’re seeking a seasoned Senior Site Reliability Engineer to play a major role in the operation and reliability of LanceDB’s cloud infrastructure. You’ll help run deployments, upgrades, and maintenance across our cloud environments while building automation and tooling that will make these processes scale seamlessly.
In this role, you’ll drive key parts of our operational excellence, performing hands-on operations early on while steadily increasing automation and reliability coverage across our systems. You’ll collaborate closely with the broader engineering team to ensure LanceDB’s services stay fast, reliable, and secure as we grow.
Responsibilities
Operate and help maintain production cloud infrastructure (AWS, Azure, GCP).
Lead and participate in deployments, upgrades, monitoring, and incident response.
Automate operational workflows using Terraform, Ansible, and CloudFormation.
Build and manage observability systems (Prometheus, Grafana, Datadog).
Collaborate with engineering teams to embed SRE and reliability best practices.
Improve and document operational processes as we scale globally.
Requirements
10+ years in SRE, DevOps, or Cloud Infrastructure roles.
Strong experience operating and supporting production systems in the cloud (AWS, Azure, GCP).
Deep knowledge of infrastructure-as-code and automation tools.
Familiarity with Kubernetes, Helm, and CI/CD pipelines.
A hands-on engineer who enjoys improving operations and building reliability tooling.
Passion for observability, system performance, and continuous improvement.
Why Join Us
You’ll join a world-class team of open-source builders (co-authors of pandas, and contributors to HDFS, Arrow, Iceberg, and HBase) working on cutting-edge AI infrastructure. You’ll collaborate on systems that power next-generation AI workloads while shaping how LanceDB operates and scales production environments.
Compensation Range: $160K - $220K