Skip to main content

Migrate data

One of the overarching goals of Tator is to make it easy to move data around, supporting workflows such as data acquisition and annotation in the field followed by upload to a cloud or on-premise deployment of Tator upon return to the office. This is facilitated by a series of cloning utilities in tator-py that can copy any Tator object not only between projects or between sections, but between different hosts.

In this tutorial we will use a tator-py example script called migrate.py to clone an entire project on the same host, then we will repeat the process for different hosts.

Using migrate.py

migrate.py was written specifically for the case of cloning entire projects and subsequently transferring data between those projects as new data is collected. It includes command line flags for disabling migration of certain objects such as memberships and sections, but the script is meant to be used with projects having nearly identical metadata configurations. The cloning utilities used by migrate.py are suitable for cloning at the object level.

Clone to a new project

Suppose we have a project in Tator with generically useful data, such as an open source machine learning dataset. Such a project could serve as a template for other projects, allowing experimentation with algorithms, dataset augmentation, etc. With migrate.py, we can create a new project that clones the project media without copying the underlying files, and copy all of the annotations. Project metadata configuration, sections, and even memberships can be cloned to the new project. The script can be invoked as follows from the tator-py root directory:

python3 examples/migrate.py \
--host https://cloud.tator.io \
--token $MY_TOKEN \
--project 1 \
--new_project_name "My Cloned Project" \
--dest_organization 1

This will clone all objects from project 1 to a new project named My Cloned Project. The script will first compute all data that will be cloned using idempotency checks, then ask for confirmation to proceed.

Clone to an existing project

For use cases in which new data exists in the origin project that is desired in the cloned project, we can specify a destination project ID. Idempotency checks ensure that data is not replicated in the destination project. We can optionally limit the media sections to be cloned using --sections. There are various options to skip certain object types as well using --skip_*.

python3 examples/migrate.py \
--host https://cloud.tator.io \
--token $MY_TOKEN \
--project 1 \
--dest_project 2 \
--sections "Section 1" "Section 2"
--skip_memberships
--skip_leaves

Clone to a different host

For cases in which data is contained on a different host, such as an edge deployment that was used to temporarily host data at the site of data acquisition, we can add flags for destination credentials.

python3 examples/migrate.py \
--host https://edge.tator.io \
--token $MY_EDGE_TOKEN \
--project 1 \
--dest_host https://cloud.tator.io \
--dest_token $MY_TOKEN \
--dest_project 2 \
--sections "Section 1" "Section 2"
--skip_memberships
--skip_leaves

Idempotency checks work in the same way for different hosts as they do on the same host.