The shortlist

When we set up the annotation practice, we evaluated four platforms seriously: Labelbox, Scale AI, Roboflow, and CVAT. Each has obvious strengths. We picked CVAT and we deploy it ourselves on our own infrastructure. Here's the reasoning, in case you're making the same decision.

Cost: not the headline number, the long tail

Labelbox and Scale are priced per-seat or per-feature. For a small team labeling 10,000 frames a year, the math works out. For our project mix — 8-12 annotators in rotation, 2-5 million features a year, multiple custom schemas — the cumulative cost over a 3-year horizon got into six figures. CVAT self-hosted is free software. The cost is the engineering time to deploy and maintain it, which for us is fractional because we already run Docker, Kubernetes, and PostgreSQL infrastructure for other client work.

More importantly: managed platforms include features we don't need (their built-in workforce, billing systems, marketing analytics) and don't include features we do need (custom QA logic against client GIS layers, integration with our PostGIS pipelines). For our workload, we'd be paying for the wrong half.

Control: where the data lives

Several of our clients have data residency requirements that can't be met by sending imagery to a third-party SaaS. Defense-adjacent work, utility infrastructure photos that include security-sensitive locations, environmental data under chain-of-custody requirements for litigation. These projects need the annotation environment to live inside our infrastructure (or theirs), not in someone else's cloud.

Self-hosted CVAT lets us deploy per-project: one CVAT instance per client, isolated network, client-controlled access. For some sensitive projects we deploy CVAT directly into the client's VPC and operate it from there. Managed platforms can sometimes accommodate this; it's much more complex with them than spinning up a Docker compose with CVAT and PostgreSQL.

Schema flexibility: custom validation

CVAT's data model lets us extend the schema with custom attributes. A road sign label isn't just a class and a box — it can carry MUTCD code, visibility score, occlusion percentage, sign-face material, condition rating, and any other attribute we need for the project. We define the attribute schema once and CVAT enforces it on every annotation.

More importantly, we can wire custom validation hooks into CVAT's workflow. When an annotator submits a label, our hook can check it against an authoritative GIS layer in PostGIS and flag the label for senior review if it disagrees. That's the spatial QA we talk about elsewhere — it works because CVAT is open enough to let us insert that logic.

Workflow: review states that match how we actually work

CVAT has explicit job states: annotation, validation, completed. Each annotator has a job; each job has a reviewer; review can send a job back with notes. That maps directly onto how a multi-person annotation pipeline actually runs.

Some managed platforms have richer workflow features (parallel review pipelines, dynamic routing, escalation chains). For most of our projects, the simple state model is the right amount of complexity. When we need more, we add it as scripts that call CVAT's REST API.

Output: formats that don't lose information

CVAT exports in COCO, YOLO, Pascal VOC, CVAT XML, KITTI, LFW, Datumaro, Mapillary, ICDAR, MOTS — and we've added GeoJSON export through a custom plugin. Most exports preserve the full attribute set, including custom attributes. When we deliver to a client, we deliver in their format with no information loss.

Conversion between formats is one of the silent ways annotation projects degrade. Each conversion drops some attributes. CVAT's broad native support means we usually don't need to convert at all.

What CVAT does poorly

Being honest: there are things CVAT does worse than managed platforms.

Onboarding new annotators is harder. Managed platforms have polished tutorial systems and certification workflows. CVAT has documentation. We make up for this by running our own onboarding curriculum — about 8 hours of training per new annotator before they touch a client project.

Built-in workforce management is minimal. Managed platforms know how to assign work, track productivity, and bill annotators. CVAT doesn't try to do this. We use it alongside other tools (we use ClickUp for project management) instead of expecting CVAT to be everything.

The UI for very large polygon annotation is clunky. For dense semantic segmentation at megapixel scale, CVAT can lag. We work around this by chunking large frames into tiles, and for the rare projects that need heavy segmentation, we'll evaluate alternatives per-project.

When we'd recommend something else

If you're a small team labeling a few thousand frames in mostly-standard classes with no integration requirements, a managed platform (Labelbox or Roboflow) is the right answer. The TCO is lower than self-hosting CVAT once you account for engineering time, and you get a polished UX.

If you're doing one-off research projects or academic work, Roboflow's free tier is hard to beat.

If you're doing a high-volume autonomous-vehicle-style perception project where Scale AI's specialized workforce is the moat, Scale is the right answer.

For everything else in the GIS / infrastructure / regulatory annotation world we operate in, CVAT self-hosted has been the right call for us. Five years in, no regrets.


Want to see CVAT running on your data? We can spin up an instance for a pilot project, your imagery, your schema, in under a week. Fixed pilot fee. You see the platform, the workflow, and the output before deciding whether to scale.