dgs.models.dataset.posetrack21.PoseTrack21_Image.transform_crop_resize¶
- static PoseTrack21_Image.transform_crop_resize() torchvision.transforms.v2.Compose ¶
Given one single image, with its corresponding bounding boxes and key-points, obtain a cropped image for every bounding box with localized key-points.
This transform expects a custom structured input as a dict.
>>> structured_input: dict[str, any] = { "image": tv_tensors.Image, "box": tv_tensors.BoundingBoxes, "keypoints": torch.Tensor, "output_size": ImgShape, "mode": str, }
- Returns:
A composed torchvision function that accepts a dict as input.
After calling this transform function, some values will have different shapes:
- image
Now contains the image crops as tensor of shape
[N x C x H x W]
.- bboxes
Zero, one, or multiple bounding boxes for this image as tensor of shape
[N x 4]
. And the bounding boxes got transformed into the XYWH format.- coordinates
Now contains the joint coordinates of every detection in local coordinates in shape
[N x J x 2|3]
.