Site Reliability Engineer
Keep KALO IQ up. 100M creator records, 200K active brands, the data 200+ paying customers depend on every day.
Apply nowRole details
We publish the band up front. We do not negotiate down for asking nicely. We do not pay less because you live in a cheaper city. Same role, same pay.
About KALO IQ
KALO IQ is the influencer marketing platform US brands use to find, vet and run campaigns with 100 million hand-verified American creators. Zero bots. Zero fake followers. Verified by humans not scrapers.
We were founded in December 2015 in Beverly Hills. Today we serve 200+ paying brands across DTC, eCommerce and SaaS. The team is 50 people, remote first, with one in-person retreat a year.
Our values: US first always. Ship the week not the quarter. Verify by human. Write things down. No-ego doers. Numbers over opinions. Read more on our careers page.
About the role
Keep KALO IQ up. 100M creator records, 200K active brands, the data 200+ paying customers depend on every day. This is a senior individual contributor role. You will own outcomes not tasks. You will work async with a small team that ships every week.
Who you will work with
This role reports to Michael Thompson, our CTO. You will work closely with Michael Thompson (CTO) directly. Day to day with the two senior backend engineers and the data engineer. Owner of the on-call rotation and the incident review process.
What you will do
- Own uptime. Our SLO is 99.92% for the API, 99.95% for the dashboard. We are at 99.87% right now. Get us to target by Q3.
- Rebuild the backup pipeline. Today: nightly snapshots to S3 with 30-day retention. Target: hourly snapshots, cross-region replication, automated 4-hour RTO test every month.
- Run the on-call rotation. 4 engineers, one week each. You will set the runbooks, you will be the one we call when the on-call gets stuck.
- Lead incident reviews. We do one a month minimum. Blameless, timeboxed to 60 minutes, ends with 3 action items that ship within 14 days.
- Cut infrastructure cost. We spend $28,400 a month on AWS today. There is fat. We have 4 unused RDS instances and 1 oversized Elasticache cluster. Find the rest.
Skills and experience we look for
- 6+ years running production infrastructure at a SaaS company with paying customers. Not a startup pre-revenue. A real product with real downtime cost.
- Deep AWS. EC2, RDS, S3, CloudFront, Route 53. Bonus if you have done one full migration to or from another cloud and lived to tell the story.
- Real incident response experience. You have led an incident at 2am. You know what to do when the dashboards are also down.
- You can write a runbook a teammate will follow under stress. Short sentences. One action per step. You have rewritten the runbook of someone you wanted to fire.
- You think about backups the way pilots think about checklists. Boring, methodical, never skipped. You have done at least one real restore from cold storage.
Our tech stack
- AWS (EC2, RDS Postgres 15, S3, CloudFront, Route 53, ECS Fargate)
- Terraform for infrastructure, GitHub Actions for CI/CD
- Datadog for monitoring, PagerDuty for on-call, Sentry for application errors
- Postgres logical replication for the read replica fleet, plus pgBackRest for backups
- Cloudflare in front of CloudFront for DDoS plus WAF
How we hire
- Application. Tell us why this role and KALO IQ specifically. Skip the template cover letter. We read every application. Two paragraphs is plenty.
- Recruiter screen. 30-minute call with the hiring manager. We talk through your background, the role and what you want next.
- Take-home task. A small paid take-home, scoped to 4 hours of work. You keep the IP. We pay $400 USD whether or not you advance.
- Working session. 90-minute live session with 2 teammates. We work through a real problem we are currently solving. No whiteboard puzzles.
- Founder chat. 45 minutes with James Carter (CEO) or another member of the leadership team depending on the role. Two-way conversation about KALO IQ direction.
- References and offer. Two reference calls. Offer within 5 business days of the working session.
Total elapsed time: 14 to 21 days. We respect your time. We will tell you no fast if we are not advancing you.
Apply for Site Reliability Engineer
Send your application to [email protected]. Include:
- Two short paragraphs on why this role and KALO IQ specifically
- Resume or LinkedIn link, your call
- One link to work you are proud of (writing, code, campaign or analysis depending on the role)
We read every application. We respond within 5 business days, every time.
Email your applicationHeads up. KALO IQ only posts open roles at /careers/. We never ask for payment or personal data over Telegram or WhatsApp. If a recruiter contacts you from a non-kaloiq.com address claiming to represent us, it is a scam.
Perks and benefits
Competitive salary
Open salary bands. We publish the range on every job. We do not negotiate down.
Work remote
Live where you do your best work. We do not track hours. We track shipped work.
4-day work weeks
The fifth day is yours. Catch up, rest or take a long weekend.
Health insurance
Full medical, dental and vision for US team. International stipend matches local cost.
Home office stipend
$1,200 on signup. $400 a year after. Pick your chair, desk or monitor.
Learning fund
$1,800 a year for courses, conferences or coaching. No approval under $300.
New laptop
MacBook Pro or equivalent on day one. Refreshed every 3 years.
AI tools stipend
$84 a month for any AI tools you use. ChatGPT, Claude, Cursor, Lovable.
Flexible time off
No set limit. Minimum 22 days required. We mean it.
Profit sharing
When we hit annual revenue target every full timer gets a share.
Parental leave
16 weeks fully paid for all new parents. Plus 4 weeks ramp back at 80%.
Annual team retreat
One in-person retreat a year, 6 days, all expenses covered.