Documentation
NOTE: The public zympy module is scheduled for initial release August 1, 2025
Installation
Linux
With Anaconda
We recomend using Anaconda or some other virtual environment manager to handle dependencies. To install Conda - see here.
After installing conda, create and activate an environment using:
conda create -n <name of your environment> python=3.11
conda activate <name of your environment>
With your environment active [you'll see (<name of your environment>) at the left of your terminal], run the pip command:
pip install zympy
Getting Started
Directory Structure
Zympy datasets have three primary folders; images - labels - meta. Each instance data is defined using universal unique identifier values (UUID), in our case these are 8 character values of mixed integer-string characters, e.g. aA76li11-u163t8F0. Every instance will be identified by the uuid defining the dataset it was generated with (the first 8 characters), followed by the individual instance uuid value (the final 8 characters), sperated by a ' - ' (dash).
1a36ecdd <-- The Dataset UUID ├── images │ ├── 1a36ecdd-3df894c1.png <-- The Instance UUID │ ├── 1a36ecdd-98dddb4c.png │ └── 1a36ecdd-381a6aac.png ├── labels │ ├── bounding_box │ │ ├── 1a36ecdd-3df894c1 │ │ │ ├── 2D │ │ │ │ └── data.json │ │ │ └── 3D │ │ │ └── data.json │ │ ├── 1a36ecdd-98dddb4c │ │ └── 1a36ecdd-381a6aac │ ├── contour │ │ ├── 1a36ecdd-3df894c1 │ │ │ └── data.json │ │ ├── 1a36ecdd-98dddb4c │ │ └── 1a36ecdd-381a6aac │ ├── pose │ │ ├── 1a36ecdd-3df894c1 │ │ │ └── data.json │ │ ├── 1a36ecdd-98dddb4c │ │ └── 1a36ecdd-381a6aac │ └── segmentation │ ├── 1a36ecdd-3df894c1 │ │ ├── 1a36ecdd-3df894c1.png │ │ └── data.json │ ├── 1a36ecdd-98dddb4c │ └── 1a36ecdd-381a6aac └── meta ├── 1a36ecdd-3df894c1 │ └── data.json ├── 1a36ecdd-98dddb4c └── 1a36ecdd-381a6aac
Zympy API
The public python module has several sub-modules available to help you get to training as fast as possible. These are organized by:
zympy.zympy_io
Contains helper functions to load data into memory, i.e:
- Retrieve all the instance names contained in a dataset
- Load dataset-level or instance-level meta data
- Load images by instance name
- Load labels by instance name
zympy.mask
Contains helper functions to create masks from the labels, each of which are composable, i.e:
- Bounding box masks
- Contour masks
- Segmentation Masks
- Pose Masks
zympy.filter
Contains helper functions to filter the dataset for instances that meet some criteria. This is targeted towards enabling curriculum learning in vision model training, for example you may filter a dataset by % occlusion of a certain object of interest - exposing the network to intances with low or no occlusion early on, and gradually increase the difficulty over multiple epochs. i.e Filter by:
- Object UUID presence within the instance
- Object position or orientation
- Camera position or orientation
- Total lighting energy in the image
- Object occlusion %
- ...
zympy.format
Contains helper functions to convert zympy datasets into common external formats, i.e:
- Convert the dataset to YOLO conventions (v5, v8, v11)
zympy.analyze
Contains helper functions to compute statistics about the dataset, i.e:
- Object pose distributions
- Camera pose distributions
zympy.zympy_io
get_instance_names
Get a list of all instance names in a dataset directory. Returns validated and invalid instance name lists.
get_instance_names(dataset_path: str, verbose: bool = True) -> Tuple[List[str], List[str]]Parameters
- dataset_path (str): Absolute path to the dataset directory
- verbose (bool): If True, logs validation errors for instances that fail validation checks
Returns
- Tuple[List[str], List[str]]: First list contains validated instances, second list contains invalid instances
Example
from zympy.zympy_io import get_instance_names
validated, invalid = get_instance_names('/path/to/dataset', verbose=True)
print(validated)
print(invalid)
load_dataset_meta
Load and validate a DataSetMeta pydantic model from the dataset directory.
load_dataset_meta(dataset_path: str) -> DataSetMetaParameters
- dataset_path (str): Root path to the dataset directory.
Returns
- DataSetMeta: class containing metadata and pointers to validated InstanceMeta pydantic subclasses.
Example
from zympy.zympy_io import load_dataset_meta
dataset_meta = load_dataset_meta('/path/to/dataset')
print(dir(dataset_meta)) # print out all attributes and methods
load_instance_meta
Load the metadata JSON for a specific instance or instances from a dataset into a validated pydantic model.
load_instance_meta(instance_name: List[str], dataset_path: str) -> List[InstanceMeta]Parameters
- instance_name (List[str]): The name(s) or ID(s) of the instance(s) to load.
- dataset_path (str): Root path to the dataset directory.
Returns
- List[InstanceMeta]: Validated metadata models containing object descriptors, camera, lights, etc.
Example
from zympy.zympy_io import load_instance_meta
instances = load_instance_meta(['abcd1234-efgh5678'], '/path/to/dataset')
print(dir(instances)) # print out all attributes and methods
load_instance_image
Load an image for a given instance from the dataset.
load_instance_image(instance_name: str, dataset_path: str) -> np.ndarrayParameters
- instance_name (str): The name or ID of the instance to load.
- dataset_path (str): Root path to the dataset directory.
Returns
- np.ndarray: Image array loaded from .png file.
Example
from zympy.zympy_io import load_instance_image
image = load_instance_image('abcd1234-efgh5678', '/path/to/dataset')
print(image.shape)
load_instance_labels
Load all labels associated with the instance from the dataset.
load_instance_labels(instance_name: str, dataset_path: str) -> DictParameters
- instance_name (str): The name or ID of the instance to load.
- dataset_path (str): Root path to the dataset directory.
Returns
- Dict: A dictionary mapping of UUID values to labels, e.g.
{ str(uuid): { 'bounding_box': [[x0, y0], [x1, y1]], 'contour': [[x0, y0], [x1, y1], ..., [xn, yn]], 'segmentation':{ 'array': numpy.ndarray, 'index_map': Dict } 'pose': Dict }, str(uuid): {...}, ... }
Example
from zympy.zympy_io import load_instance_labels
labels = load_instance_labels('abcd1234-efgh5678', '/path/to/dataset')
print(labels)
load_bounding_box_label
Load the bounding box label for a specific instance from the dataset, supporting 2D or 3D boxes.
load_bounding_box_label(instance_name: str, dataset_path: str, box_type: Literal['2D','3D']='2D') -> DictParameters
- instance_name (str): The name or ID of the instance to load.
- dataset_path (str): Root path to the dataset directory.
- box_type (Literal['2D','3D']): Define the bounding box type to load ('2D' or '3D').
Returns
- Dict: A dictionary mapping UUIDs to bounding box coordinates, e.g. `
: [[x0, y0], [x1, y1]]`.
Example
from zympy.zympy_io import load_bounding_box_label
bbox = load_bounding_box_label('abcd1234-efgh5678', '/path/to/dataset', box_type='2D')
print(bbox)
load_contour_label
Load the contour label for a specific instance from the dataset.
load_contour_label(instance_name: str, dataset_path: str) -> DictParameters
- instance_name (str): The name or ID of the instance to load.
- dataset_path (str): Root path to the dataset directory.
Returns
- Dict: A dictionary mapping UUIDs to contour coordinates, e.g. `
: [[[x0, y0], [x1, y1], [x2, y2] ...], [[x0, y0], [x1, y1], [x2, y2] ...]]`.
Example
from zympy.zympy_io import load_contour_label
contour_label = load_contour_label('abcd1234-efgh5678', '/path/to/dataset')
print(contour_label)
load_pose_label
Load the pose label for a specific instance from a dataset. The first vector in the pose label is the traslation vector to the origin of the object, the second vector is the quaternion describing its orientation
load_pose_label(instance_name: str, dataset_path: str) -> DictParameters
- instance_name (str): The name or ID of the instance to load.
- dataset_path (str): Root path to the dataset directory.
Returns
- Dict: A dictionary mapping UUIDs to pose data with translation vector and orientation quaternion, e.g.
{str(uuid): [[x, y, z], [w, x, y, z]], str(uuid): ...,}
Example
from zympy.zympy_io import load_pose_label
pose_label = load_pose_label('abcd1234-efgh5678', '/path/to/dataset')
print(pose_label)
load_segmentation_label
Load the segmentation label for a specific instance from a dataset.
load_segmentation_label(instance_name: str, dataset_path: str) -> Tuple[Dict, np.ndarray]Parameters
- instance_name (str): The name or ID of the instance to load.
- dataset_path (str): Root path to the dataset directory.
Returns
- Dict: A dictionary mapping part UUID values to segmentation array values.
- np.ndarray: The segmentation array containing pixel data.
Example
from zympy.zympy_io import load_segmentation_label
import cv2
seg_map, seg_array = load_segmentation_label('abcd1234-efgh5678', '/path/to/dataset')
print(seg_map)
cv2.imshow('segmentation', seg_array)
cv2.waitKey(0)
zympy.mask
create_empty_rgba
Create an RGBA array populated with `(0, 0, 0, 0)`. Uses a reference image to match its width and height, or specify dimensions manually.
create_empty_rgba(reference_image: np.ndarray = None, mask_dimensions: Tuple = None) -> np.ndarrayParameters
- reference_image (numpy.ndarray): Optional reference image whose dimensions will be used for the mask.
- mask_dimensions (Tuple[int, int]): Desired output dimensions if no reference image is provided.
Returns
- empty_rgba (numpy.ndarray): An empty transparent RGBA array of the specified dimensions.
Example
from zympy.mask import create_empty_rgba
mask = create_empty_rgba(reference_image=image)
print(mask.shape)
create_bounding_box_2D_mask
Construct a transparent mask with bounding boxes annotated.
create_bounding_box_2D_mask(bounding_boxes: Dict, mask_dimensions: tuple[int, int], active_uuids: set[str] = None, line_thickness: int = DEFAULT_LINE_THICKNESS, color: Tuple = None, color_seed: int = COLOR_SEED) -> np.ndarrayParameters
- bounding_boxes (Dict): Dictionary containing bounding box labels. Format:
{ str(uuid): [[x0, y0], [x1, y1]], str(uuid): [[x0, y0], [x1, y1]], ... } - active_uuids (set[str], optional): If provided, only annotate bounding boxes whose UUIDs are in this set.
- mask_dimensions (tuple[int, int]): Pixel dimensions `(height, width)` of the output mask.
- line_thickness (int): Bounding box edge thickness in pixels.
- color (tuple[int, int, int, int], optional): RGBA color. If None, random colors will be generated based on UUID and `color_seed`.
- color_seed (int): Seed shift value for random color generation.
Returns
- mask (numpy.ndarray): Generated transparent mask with bounding boxes drawn.
Example
from zympy.mask import create_bounding_box_2D_mask
mask = create_bounding_box_2D_mask(bounding_boxes, mask_dimensions=(256, 256))
draw_bounding_box_2D
Draw bounding boxes directly onto an image using normalized coordinates.
draw_bounding_box_2D(image: np.ndarray, bbox_normalized_corners: List[Tuple[float, float, float, float]], color: tuple[int, int, int, int] = None, line_thickness: int = DEFAULT_LINE_THICKNESS) -> np.ndarrayParameters
- image (numpy.ndarray): The input RGBA image `(H x W x 4)`.
- bbox_normalized_corners (List[Tuple[float, float, float, float]]): Bounding boxes in normalized `[0,1]` coordinates, formatted either as `(x0, y0, x1, y1)` or `((x0, y0), (x1, y1))`.
- color (tuple[int, int, int, int], optional): RGBA color for the box edges.
- line_thickness (int): Thickness of bounding box edges in pixels.
Returns
- image (numpy.ndarray): Image with drawn bounding boxes.
Example
# Draw a box with a random color
from zympy.mask import draw_bounding_box_2D, create_empty_rgba
import cv2
image = create_empty_rgba(mask_dimensions=(400, 400))
drawn = draw_bounding_box_2D(image, [(0.1, 0.1, 0.4, 0.4)])
cv2.imshow('bounding box', drawn)
cv2.waitKey(0)
zympy.filter
zympy.format
to_yolo
Converts a ZymPy-format dataset to the directory and label structure required by YOLO (v5, v8, v11) for object detection training. Partitions image instances into training/validation sets, generates label files, builds the YOLO directory structure, and writes a `data.yaml` configuration mapping class indices to UUID-style names.
to_yolo(yolo_version: Literal[5, 8, 11], zympy_dataset_path: str, yolo_dataset_path: str, yolo_dataset_name: str = None, instance_names: List[str] = None, train_val_split: float = 0.8, training_instance_names: List[str] = None, validation_instance_names: List[str] = None, label_type: Literal['bounding_box', 'segmentation'] = 'bounding_box', class_list: List[str] = None, suppress_memory_prompts: bool = False) -> boolParameters
- yolo_version (Literal[5, 8, 11]): Target YOLO version to format the dataset for.
- zympy_dataset_path (str): Path to the root of the ZymPy-format dataset containing images and metadata.
- yolo_dataset_path (str): Path to the root output directory where the YOLO-formatted dataset will be written.
- yolo_dataset_name (str, optional): Name to assign to the YOLO dataset folder. Defaults to the ZymPy dataset name if None.
- instance_names (List[str], optional): List of instance UUIDs to include; will be partitioned via `train_val_split` if provided.
- train_val_split (float, optional): Fraction of instances used for training; remainder goes to validation.
- training_instance_names (List[str], optional): Explicit list of training instance names (overrides `train_val_split`).
- validation_instance_names (List[str], optional): Explicit list of validation instance names (overrides `train_val_split`).
- label_type (Literal['bounding_box', 'segmentation'], optional): Type of label to format. Defaults to 'bounding_box'.
- class_list (List[str], optional): Specific UUID strings representing object classes to include. If None, includes all classes.
- suppress_memory_prompts (bool, optional): Suppresses prompts related to memory usage when copying large files.
Returns
- success (bool): Returns True if the YOLO dataset was created successfully.
Example
from zympy.format import to_yolo
success = to_yolo(
yolo_version=5,
zympy_dataset_path='/data/zympy_dataset',
yolo_dataset_path='/data/yolo_dataset',
train_val_split=0.8
)
print(success) # True if successful
