Hanifur Rahman

Senior Backend Engineer / 7+ years / Rails / Go / Postgres / AWS

I build systems that hold when production gets loud.

High data volumes, unreliable APIs, memory constraints, concurrent writes on the same rows. I am a systems-focused full-stack engineer based in Cumilla, Bangladesh. A working log of systems shipped and the failures that taught me to ship them.

Field Notes

Seven years writing production systems. Four of them at an e-commerce intelligence platform where I built distributed collectors in Go, Ruby, and Python. Event pipelines into Snowflake. Terraform over ECS Fargate. A Chrome extension that talked to a serverless backend.

The last three have been different. I was brought on as the sole engineer for a production advertising SaaS platform. The codebase now runs a 239 GB Postgres database, a near-real-time ingestion pipeline off Amazon Marketing Stream, an automation engine that replaced manual bid work, and an AI analyst with multi-agent guardrails.

What I think about most: the quiet failure modes. Race conditions that leave no trace. Memory walls on Heroku workers. API rate limits that arrive without warning. Materialized views that take longer to refresh than the data stays fresh. I care about whether a system will still work at 3am on the six hundredth day.

Currently
Lead dev, advertising SaaS
Based
Cumilla, Bangladesh
Work
Remote, worldwide
Stack lean
Rails, Go, Python, AWS

Case Briefs

2023 / present Lead engineer Rails / Postgres / Sidekiq / Claude

Advertising Automation at Scale

A production multi-tenant SaaS platform for Amazon advertisers.

Owned the architecture end to end: rules engine, near-real-time ingestion, an automation system that translates a business philosophy into tiered bid rules, and an AI analyst that generates reports without leaking data across tenants.

clients ──▶ web (Rails) │ ┌────────┴────────┐ ▼ ▼ Sidekiq (200+ jobs) ActionCable (real-time UI) │ ┌────────┴────────────────┐ ▼ ▼ ▼ Postgres SQS events Claude agents 239 GB 43M entries Guard / Router 25 MVs hourly rollup Responder
239 GB
Postgres
120 M
Bid changes
43 M
SQS events
104 K
Rule executions

Key Decisions

  • Replaced manual bid work with a rule engine using custom formulas, scheduling, and dayparting. The system now runs 4,300 rules on its own.
  • Cut dashboard load from timeout to sub-second using 25 materialized views refreshed via shadow-swap (build temp view, atomic RENAME) with only a brief lock window during swap.
  • Traded 24 to 48 hour batch reports for near-real-time SQS ingestion with hourly aggregation. Same-day bid optimization became possible.
  • Built a two-column bid state machine with row-level locking to stop silent drift between dayparting and rule execution.
  • Shipped an AI analyst on a Guard / Router / Responder pipeline (input filter, intent classifier, generator) with per-tenant data scoping, streaming HTML, PPTX, and XLSX.

Outcome

A two-person team now operates a multi-tenant platform that previously required manual bid management across thousands of campaigns. Campaign teams shifted from spreadsheet ops to strategy work. Reporting decisions that used to wait two days happen the same hour.

2019 / 2023 Software engineer Go / Ruby / Python / Terraform

E-Commerce Intelligence Pipelines

Distributed data collection and analytics across Amazon, Walmart, and Shopify.

Four years writing collectors that crawled marketplaces without falling over. The kind of systems where half the work is handling what the upstream fails to tell you: rate limits, proxy exhaustion, malformed HTML, banned IPs.

collectors (Go, Ruby, Python) │ ▼ SQS (FIFO) / SNS │ ┌─────┴────────┐ ▼ ▼ Kinesis Firehose S3 │ ▼ Snowflake (market share)
10
Collectors
5
Pipelines
4
IaC projects
3
Languages

Key Decisions

  • Built 10 data collectors for buybox, SERP, inventory, reviews, and pricing. Each handled proxy rotation, retry with backoff, and partial failure.
  • Designed event-driven ingestion with SQS FIFO, SNS, and Kinesis Firehose into S3 and Snowflake.
  • Wrote a market share analytics system. Five pipelines, schedules from every five minutes to weekly.
  • Shipped four Terraform projects covering ECS Fargate blue green deploys, Lambda, RDS Aurora, ElastiCache, DynamoDB.
  • Built a Chrome extension (Manifest v3) with a serverless backend for Amazon Seller Central opportunity discovery.

Outcome

Daily marketplace data flowed into Snowflake market-share analytics consumed by enterprise e-commerce clients. The collectors held against rate limits, IP bans, and malformed pages without manual babysitting.

Problem Log

Short case notes on production problems I have solved. The kind that are never in a blog post because they are only visible when you have both the trace and the table.

2024.11

The Dayparting / Rule Race

Two systems, both editing bid values. Dayparting lowers them at night. Rules adjust them during the day. When they ran at the same second, rules read the temporarily lowered bid as baseline, recalculated wrong, and dayparting restored it without knowing. The fix: a two-column state machine with row-level locking and a foreign key tracking temporary ownership. Silent drift became detectable and reversible.

2024.12

Streaming 239 GB Through a 512 MB Worker

Amazon reports come as gzipped JSON, often hundreds of megabytes per file. Loading them into Ruby objects blew memory. Traditional ActiveRecord patterns were not survivable on a Heroku dyno. The combination that worked: chunked downloads to disk, streaming decompression, Oj SAJ event parsing, change-detection upserts, and explicit GC between batches. Batch sizes tuned from 100 K down to 10 K through real OOMs.

2026.03

When Claude's Bash Tool Hangs on Heroku

An undocumented failure mode. The Claude agent I built ran fine locally but hung indefinitely inside Heroku worker dynos. The cause was the absence of a TTY. After several attempts to patch around it, I enforced a total ban on the Bash tool via prompt engineering and rebuilt the execution path around an MCP rails-runner tool. PPTX and XLSX generation, file uploads, and code execution all routed through the new path.

2025.01

The Deadlock That Would Not Die

Large upsert batches kept hitting Postgres deadlocks under concurrent export and report ingestion. Each retry was worse than the last. I added an exponential backoff loop (1, 2, 4, 8, 16 seconds) with five retries, combined with row-level locking and a fallback that sets status back to queued for Sidekiq to pick up. The backoff did not prevent the deadlocks. It made them not matter.

Working Stack

Backbone is Ruby on Rails with Postgres at the center. Sidekiq for background work with distributed locking, rate limiting, and deadlock-aware retries. Real-time through ActionCable and Turbo Streams, frontend layer in Stimulus and Tailwind.

For data-heavy services I reach for Go (collectors, high-throughput workers) and Python when the ecosystem wins (LLM pipelines, scraping).

Infrastructure runs on AWS. Comfortable across ECS Fargate, Lambda, SQS, SNS, Kinesis Firehose, RDS, DynamoDB, OpenSearch, and S3. All managed with Terraform, shipped through Docker.

Warehouse work in Snowflake. Cache and queues in Redis. Legacy projects in MongoDB and PHP.

AI work is currently Claude-heavy (multi-agent pipelines, guardrails, streaming) with OpenAI for earlier projects. I have built routing layers, classifier caches, and report generators with tight tenant isolation.

Education

B.Sc. in Computer Science and Engineering
Bangladesh University of Business and Technology
2015 / 2019

Hiring for a system
that has to hold?

Open to senior, staff, and lead engineering roles. Also consulting on production systems at scale. Replies within 24 hours.

hanifurrahmansabbir@gmail.com