We’re looking for an experienced Platform/Site Reliability Engineer to help evolve and expand our engineering foundation. In this role, you’ll ensure our systems remain robust, scalable, and efficient, while creating the tooling and automation that empower our development teams to move faster and more effectively.
This position is central to shaping our platform roadmap, driving best practices, and implementing solutions that support both developer experience and operational excellence.
Key Responsibilities
Infrastructure & DevOps
Architect, build, and maintain resilient infrastructure that supports diverse engineering initiatives.
Guide adoption of scalable patterns to improve reliability and cost efficiency.
Deployment & Release Management
Refine CI/CD pipelines with AWS CDK to accelerate safe and automated delivery.
Develop tooling for deployments and database migrations to reduce friction in release processes.
Enhance visibility into delivery cycles and streamline rollout workflows.
Reliability & Observability
Design and support monitoring frameworks, log aggregation, and alerting systems.
Proactively identify and resolve issues to maintain uptime and service quality.
Internal Developer Experience
Build productivity tools that shorten feedback loops and automate repetitive tasks.
Champion practices that improve engineering velocity across teams.
Security & Governance
Embed strong security practices into infrastructure and operational processes.
Support compliance initiatives across standards such as SOC, ISO, and GDPR.
Requirements
We’re Looking For
7+ years total professional experience, with 5+ years focused on reliability, infrastructure, or platform roles. Experience in startup environments is a plus.
Strong background in AWS, with deep knowledge of container-based services (Fargate, Kubernetes).
Proven success improving CI/CD workflows with AWS CDK, including automation for deployments and migrations.
Familiarity with modern observability platforms (e.g. Datadog, Prometheus, Grafana).
Solid expertise in designing systems for high availability and horizontal scalability.
Strong coding and scripting skills in languages such as Python, Bash, or TypeScript.
Understanding of infrastructure security best practices and regulatory compliance requirements.
Collaborative mindset, able to partner effectively across engineering teams.
Our Technology Environment
Infrastructure: AWS (Fargate, Redis, PostgreSQL, SQS, CDK), GitHub, Retool
Backend: Django REST Framework, Celery
Frontend: Next.js, Tailwind CSS
AI/LLM Tools: OpenAI, Claude, AWS Bedrock