Tator Data Concepts

Tator is at its heart a data management platform. Our UI or our REST API provides the outer most layer of the onion that users utilize to complete their tasks. At the core of the onion is the database and the architectural decisions for why we store data the way we do. In any software project the architecture is often a journey, some things were intentionally laid out a certain way, other things merely are. This article discusses how things are in tator at this juncture to help people who want to understand tator more fully cut the onion without causing tears.

Definitions

Almost like a breakfast hash, the data in Tator is a mix of many different ingredients. To help illustrate the flavor profile of these different elements first let us define the terms:

Localizations is the term we use to define metadata that contains frame geography. Examples include bounding boxes, lines, dots, or even polygons. Localizations can only exist on single frame in a single media.

States are a more complex metadata type because of the number of possible associations allowed. A state can be used to associate multiple Localizations; commonly referred to as a Track. This usage is common for counting occurrences of physical objects in a video. The same object is tracked across multiple video frames and localizations. Another type of State is frame associated states. Because frames all happen at a time, these type of states can be used to infer sensor readings that change over time. An example is GPS location, depth sensors, or even temperature. The last type of State exists across one or more media. This can be used to associate media in to a group or for longer lasting sensor data.

Both types of metadata have a mixture of primary or system-defined columns and user-defined columns which Tator refers to as attributes. Within Tator system-defined columns are stored as primary columns in the SQL database, often these relationships are foreign keys and benefit from primary column status. Others, such as created time or modified time are generic enough to be handled as primary columns for almost all data. User defined attributes all live in a JSONB object. This allows users themselves to define attributes that are associated with each Localization or State. An example might be a Species field on a bounding box, or a notes attribute on a media.

Implementation Details

Users are able to modify the attribute columns available via the rest/AttributeType endpoint. Because all user defined attributes are in JSONB fields, this allows for columns to be added and removed from data without doing a system-wide SQL migration.

For searchability purposes, Tator generates table indices for user defined attributes. This indices are scoped with WHERE clauses to the appropriate project and datatype. This allows for indices to be reasonably small and result in fast per-project performance. Index management routines are handled in api/main/search.py. Indices for integers and floats use built-in postgres BTREE indices. String attributes make use of a trigram extension library to store GIST indices in both original and lower-case forms. Geoposition attributes are indexed using types made available via PostGIS. Lastly, vector datatypes are indexed using the vector extension available for postgres.