dgs.models.dataset.dataset.VideoDataset.transform_crop_resize¶

static VideoDataset.transform_crop_resize() → torchvision.transforms.v2.Compose¶

Given one single image, with its corresponding bounding boxes and key-points, obtain a cropped image for every bounding box with localized key-points.

This transform expects a custom structured input as a dict.

>>> structured_input: dict[str, any] = {
    "image": tv_tensors.Image,
    "box": tv_tensors.BoundingBoxes,
    "keypoints": torch.Tensor,
    "output_size": ImgShape,
    "mode": str,
}

Returns:

A composed torchvision function that accepts a dict as input.

After calling this transform function, some values will have different shapes:

image: Now contains the image crops as tensor of shape [N x C x H x W].
bboxes: Zero, one, or multiple bounding boxes for this image as tensor of shape [N x 4]. And the bounding boxes got transformed into the XYWH format.
coordinates: Now contains the joint coordinates of every detection in local coordinates in shape [N x J x 2|3].

dgs.models.dataset.dataset.VideoDataset.transform_crop_resize¶

Table of Contents

Previous topic

Next topic

This Page