Installation

Linux

With Anaconda

We recomend using Anaconda or some other virtual environment manager to handle dependencies. To install Conda - see here.

After installing conda, create and activate an environment using:

conda create -n <name of your environment> python=3.11

conda activate <name of your environment>

With your environment active [you'll see (<name of your environment>) at the left of your terminal], run the pip command:

pip install zympy

Getting Started

Directory Structure

Zympy datasets have three primary folders; images - labels - meta. Each instance data is defined using universal unique identifier values (UUID), in our case these are 8 character values of mixed integer-string characters, e.g. aA76li11-u163t8F0. Every instance will be identified by the uuid defining the dataset it was generated with (the first 8 characters), followed by the individual instance uuid value (the final 8 characters), sperated by a ' - ' (dash).

1a36ecdd <-- The Dataset UUID
├── images
│   ├── 1a36ecdd-3df894c1.png <-- The Instance UUID
│   ├── 1a36ecdd-98dddb4c.png
│   └── 1a36ecdd-381a6aac.png
├── labels
│   ├── bounding_box
│   │   ├── 1a36ecdd-3df894c1
│   │   │    ├── 2D
│   │   │    │   └── data.json
│   │   │    └── 3D
│   │   │        └── data.json
│   │   ├── 1a36ecdd-98dddb4c
│   │   └── 1a36ecdd-381a6aac
│   ├── contour
│   │   ├── 1a36ecdd-3df894c1
│   │   │   └── data.json
│   │   ├── 1a36ecdd-98dddb4c
│   │   └── 1a36ecdd-381a6aac
│   ├── pose
│   │   ├── 1a36ecdd-3df894c1
│   │   │   └── data.json
│   │   ├── 1a36ecdd-98dddb4c
│   │   └── 1a36ecdd-381a6aac
│   └── segmentation
│       ├── 1a36ecdd-3df894c1
│       │   ├── 1a36ecdd-3df894c1.png
│       │   └── data.json
│       ├── 1a36ecdd-98dddb4c
│       └── 1a36ecdd-381a6aac
└── meta
    ├── 1a36ecdd-3df894c1
    │   └── data.json
    ├── 1a36ecdd-98dddb4c
    └── 1a36ecdd-381a6aac

Zympy API

The public python module has several sub-modules available to help you get to training as fast as possible. These are organized by:

zympy.zympy_io

Contains helper functions to load data into memory, i.e:

Retrieve all the instance names contained in a dataset
Load dataset-level or instance-level meta data
Load images by instance name
Load labels by instance name

zympy.mask

Contains helper functions to create masks from the labels, each of which are composable, i.e:

Bounding box masks
Contour masks
Segmentation Masks
Pose Masks

zympy.filter

Contains helper functions to filter the dataset for instances that meet some criteria. This is targeted towards enabling curriculum learning in vision model training, for example you may filter a dataset by % occlusion of a certain object of interest - exposing the network to intances with low or no occlusion early on, and gradually increase the difficulty over multiple epochs. i.e Filter by:

Object UUID presence within the instance
Object position or orientation
Camera position or orientation
Total lighting energy in the image
Object occlusion %
...

zympy.format

Contains helper functions to convert zympy datasets into common external formats, i.e:

Convert the dataset to YOLO conventions (v5, v8, v11)

zympy.analyze

Contains helper functions to compute statistics about the dataset, i.e:

Object pose distributions
Camera pose distributions

get_instance_names

Get a list of all instance names in a dataset directory. Returns validated and invalid instance name lists.

get_instance_names(dataset_path: str, verbose: bool = True) -> Tuple[List[str], List[str]]

Parameters

dataset_path (str): Absolute path to the dataset directory
verbose (bool): If True, logs validation errors for instances that fail validation checks

Returns

Tuple[List[str], List[str]]: First list contains validated instances, second list contains invalid instances

Example

from zympy.zympy_io import get_instance_names
validated, invalid = get_instance_names('/path/to/dataset', verbose=True)
print(validated)
print(invalid)

load_dataset_meta

Load and validate a DataSetMeta pydantic model from the dataset directory.

load_dataset_meta(dataset_path: str) -> DataSetMeta

Parameters

dataset_path (str): Root path to the dataset directory.

Returns

DataSetMeta: class containing metadata and pointers to validated InstanceMeta pydantic subclasses.

Example

from zympy.zympy_io import load_dataset_meta
dataset_meta = load_dataset_meta('/path/to/dataset')
print(dir(dataset_meta)) # print out all attributes and methods

load_instance_meta

Load the metadata JSON for a specific instance or instances from a dataset into a validated pydantic model.

load_instance_meta(instance_name: List[str], dataset_path: str) -> List[InstanceMeta]

Parameters

instance_name (List[str]): The name(s) or ID(s) of the instance(s) to load.
dataset_path (str): Root path to the dataset directory.

Returns

List[InstanceMeta]: Validated metadata models containing object descriptors, camera, lights, etc.

Example

from zympy.zympy_io import load_instance_meta
instances = load_instance_meta(['abcd1234-efgh5678'], '/path/to/dataset')
print(dir(instances)) # print out all attributes and methods

load_instance_image

Load an image for a given instance from the dataset.

load_instance_image(instance_name: str, dataset_path: str) -> np.ndarray

Parameters

instance_name (str): The name or ID of the instance to load.
dataset_path (str): Root path to the dataset directory.

Returns

np.ndarray: Image array loaded from .png file.

Example

from zympy.zympy_io import load_instance_image
image = load_instance_image('abcd1234-efgh5678', '/path/to/dataset')
print(image.shape)

load_instance_labels

Load all labels associated with the instance from the dataset.

load_instance_labels(instance_name: str, dataset_path: str) -> Dict

Parameters

instance_name (str): The name or ID of the instance to load.
dataset_path (str): Root path to the dataset directory.

Returns

Dict: A dictionary mapping of UUID values to labels, e.g.

{
	str(uuid): {
		'bounding_box': [[x0, y0], [x1, y1]],
		'contour': [[x0, y0], [x1, y1], ..., [xn, yn]],
		'segmentation':{
			'array': numpy.ndarray,
			'index_map': Dict
		}
		'pose': Dict
	},
	str(uuid): {...},
	...
}

Example

from zympy.zympy_io import load_instance_labels
labels = load_instance_labels('abcd1234-efgh5678', '/path/to/dataset')
print(labels)

load_bounding_box_label

Load the bounding box label for a specific instance from the dataset, supporting 2D or 3D boxes.

load_bounding_box_label(instance_name: str, dataset_path: str, box_type: Literal['2D','3D']='2D') -> Dict

Parameters

instance_name (str): The name or ID of the instance to load.
dataset_path (str): Root path to the dataset directory.
box_type (Literal['2D','3D']): Define the bounding box type to load ('2D' or '3D').

Returns

Dict: A dictionary mapping UUIDs to bounding box coordinates, e.g. `: [[x0, y0], [x1, y1]]`.

Example

from zympy.zympy_io import load_bounding_box_label
bbox = load_bounding_box_label('abcd1234-efgh5678', '/path/to/dataset', box_type='2D')
print(bbox)

load_contour_label

Load the contour label for a specific instance from the dataset.

load_contour_label(instance_name: str, dataset_path: str) -> Dict

Parameters

instance_name (str): The name or ID of the instance to load.
dataset_path (str): Root path to the dataset directory.

Returns

Dict: A dictionary mapping UUIDs to contour coordinates, e.g. `: [[[x0, y0], [x1, y1], [x2, y2] ...], [[x0, y0], [x1, y1], [x2, y2] ...]]`.

Example

from zympy.zympy_io import load_contour_label
contour_label = load_contour_label('abcd1234-efgh5678', '/path/to/dataset')
print(contour_label)

load_pose_label

Load the pose label for a specific instance from a dataset. The first vector in the pose label is the traslation vector to the origin of the object, the second vector is the quaternion describing its orientation

load_pose_label(instance_name: str, dataset_path: str) -> Dict

Parameters

instance_name (str): The name or ID of the instance to load.
dataset_path (str): Root path to the dataset directory.

Returns

Dict: A dictionary mapping UUIDs to pose data with translation vector and orientation quaternion, e.g.
```
{str(uuid): [[x, y, z], [w, x, y, z]], str(uuid): ...,}
```

Example

from zympy.zympy_io import load_pose_label
pose_label = load_pose_label('abcd1234-efgh5678', '/path/to/dataset')
print(pose_label)

load_segmentation_label

Load the segmentation label for a specific instance from a dataset.

load_segmentation_label(instance_name: str, dataset_path: str) -> Tuple[Dict, np.ndarray]

Parameters

instance_name (str): The name or ID of the instance to load.
dataset_path (str): Root path to the dataset directory.

Returns

Dict: A dictionary mapping part UUID values to segmentation array values.
np.ndarray: The segmentation array containing pixel data.

Example

from zympy.zympy_io import load_segmentation_label
import cv2
seg_map, seg_array = load_segmentation_label('abcd1234-efgh5678', '/path/to/dataset')
print(seg_map)
cv2.imshow('segmentation', seg_array)
cv2.waitKey(0)

create_empty_rgba

Create an RGBA array populated with `(0, 0, 0, 0)`. Uses a reference image to match its width and height, or specify dimensions manually.

create_empty_rgba(reference_image: np.ndarray = None, mask_dimensions: Tuple = None) -> np.ndarray

Parameters

reference_image (numpy.ndarray): Optional reference image whose dimensions will be used for the mask.
mask_dimensions (Tuple[int, int]): Desired output dimensions if no reference image is provided.

Returns

empty_rgba (numpy.ndarray): An empty transparent RGBA array of the specified dimensions.

Example

from zympy.mask import create_empty_rgba
mask = create_empty_rgba(reference_image=image)
print(mask.shape)

create_bounding_box_2D_mask

Construct a transparent mask with bounding boxes annotated.

create_bounding_box_2D_mask(bounding_boxes: Dict, mask_dimensions: tuple[int, int], active_uuids: set[str] = None, line_thickness: int = DEFAULT_LINE_THICKNESS, color: Tuple = None, color_seed: int = COLOR_SEED) -> np.ndarray

Parameters

bounding_boxes (Dict): Dictionary containing bounding box labels. Format:

{
  str(uuid): [[x0, y0], [x1, y1]],
  str(uuid): [[x0, y0], [x1, y1]],
  ...
}

active_uuids (set[str], optional): If provided, only annotate bounding boxes whose UUIDs are in this set.
mask_dimensions (tuple[int, int]): Pixel dimensions `(height, width)` of the output mask.
line_thickness (int): Bounding box edge thickness in pixels.
color (tuple[int, int, int, int], optional): RGBA color. If None, random colors will be generated based on UUID and `color_seed`.
color_seed (int): Seed shift value for random color generation.

Returns

mask (numpy.ndarray): Generated transparent mask with bounding boxes drawn.

Example

from zympy.mask import create_bounding_box_2D_mask
mask = create_bounding_box_2D_mask(bounding_boxes, mask_dimensions=(256, 256))

draw_bounding_box_2D

Draw bounding boxes directly onto an image using normalized coordinates.

draw_bounding_box_2D(image: np.ndarray, bbox_normalized_corners: List[Tuple[float, float, float, float]], color: tuple[int, int, int, int] = None, line_thickness: int = DEFAULT_LINE_THICKNESS) -> np.ndarray

Parameters

image (numpy.ndarray): The input RGBA image `(H x W x 4)`.
bbox_normalized_corners (List[Tuple[float, float, float, float]]): Bounding boxes in normalized `[0,1]` coordinates, formatted either as `(x0, y0, x1, y1)` or `((x0, y0), (x1, y1))`.
color (tuple[int, int, int, int], optional): RGBA color for the box edges.
line_thickness (int): Thickness of bounding box edges in pixels.

Returns

image (numpy.ndarray): Image with drawn bounding boxes.

Example

# Draw a box with a random color

from zympy.mask import draw_bounding_box_2D, create_empty_rgba
import cv2
image = create_empty_rgba(mask_dimensions=(400, 400))
drawn = draw_bounding_box_2D(image, [(0.1, 0.1, 0.4, 0.4)])
cv2.imshow('bounding box', drawn)
cv2.waitKey(0)

to_yolo

Converts a ZymPy-format dataset to the directory and label structure required by YOLO (v5, v8, v11) for object detection training. Partitions image instances into training/validation sets, generates label files, builds the YOLO directory structure, and writes a `data.yaml` configuration mapping class indices to UUID-style names.

to_yolo(yolo_version: Literal[5, 8, 11], zympy_dataset_path: str, yolo_dataset_path: str, yolo_dataset_name: str = None, instance_names: List[str] = None, train_val_split: float = 0.8, training_instance_names: List[str] = None, validation_instance_names: List[str] = None, label_type: Literal['bounding_box', 'segmentation'] = 'bounding_box', class_list: List[str] = None, suppress_memory_prompts: bool = False) -> bool

Parameters

yolo_version (Literal[5, 8, 11]): Target YOLO version to format the dataset for.
zympy_dataset_path (str): Path to the root of the ZymPy-format dataset containing images and metadata.
yolo_dataset_path (str): Path to the root output directory where the YOLO-formatted dataset will be written.
yolo_dataset_name (str, optional): Name to assign to the YOLO dataset folder. Defaults to the ZymPy dataset name if None.
instance_names (List[str], optional): List of instance UUIDs to include; will be partitioned via `train_val_split` if provided.
train_val_split (float, optional): Fraction of instances used for training; remainder goes to validation.
training_instance_names (List[str], optional): Explicit list of training instance names (overrides `train_val_split`).
validation_instance_names (List[str], optional): Explicit list of validation instance names (overrides `train_val_split`).
label_type (Literal['bounding_box', 'segmentation'], optional): Type of label to format. Defaults to 'bounding_box'.
class_list (List[str], optional): Specific UUID strings representing object classes to include. If None, includes all classes.
suppress_memory_prompts (bool, optional): Suppresses prompts related to memory usage when copying large files.

Returns

success (bool): Returns True if the YOLO dataset was created successfully.

Example

from zympy.format import to_yolo

success = to_yolo(
    yolo_version=5,
    zympy_dataset_path='/data/zympy_dataset',
    yolo_dataset_path='/data/yolo_dataset',
    train_val_split=0.8
)
print(success)  # True if successful

Documentation

NOTE: The public zympy module is scheduled for initial release August 1, 2025

Installation

Linux

With Anaconda

Getting Started

Directory Structure

Zympy API

zympy.zympy_io

zympy.mask

zympy.filter

zympy.format

zympy.analyze

zympy.zympy_io

get_instance_names

Parameters

Returns

Example

load_dataset_meta

Parameters

Returns

Example

load_instance_meta

Parameters

Returns

Example

load_instance_image

Parameters

Returns

Example

load_instance_labels

Parameters

Returns

Example

load_bounding_box_label

Parameters

Returns

Example

load_contour_label

Parameters

Returns

Example

load_pose_label

Parameters

Returns

Example

load_segmentation_label

Parameters

Returns

Example

zympy.mask

create_empty_rgba

Parameters

Returns

Example

create_bounding_box_2D_mask

Parameters

Returns

Example

draw_bounding_box_2D

Parameters

Returns

Example

zympy.filter

zympy.format

to_yolo

Parameters

Returns

Example