dgs.utils.image.CustomToAspect

class dgs.utils.image.CustomToAspect(*args: Any, **kwargs: Any)[source]

Custom torchvision Transform that modifies the image, bboxes, and coordinates simultaneously to match a target aspect ratio.

Notes

It is expected that Resize() is called after this transform, to not only match the aspect ratio but also the overall size.

This transforms’ default mode is zero-padding.

The following modes are available for resizing:

distort

Skips CustomToAspect entirely and therefore does not change the original aspect ratio at all. This will result in a distorted image when using Resize(), iff the aspect ratios of the old and new shape aren’t close.

edge-pad

Uses Pad() to extend the image to the correct aspect ratio. The value used for padding_mode of Pad() is edge.

inside-crop

Uses the target aspect ratio to extract a sub-image out of the original. Basically is a center crop with one dimension being as large as possible while maintaining the aspect ratio.

outside-crop

Is only available for the CustomCrop() model, but will be passed through. Instead of cropping at the exact bounding box, match the aspect ratio by widening one of the dimensions

fill-pad

Uses Pad() to extend the image to the correct aspect ratio. The value used for padding_mode of Pad() is constant and the fill value has to be provided within the kwargs.

mean-pad

Uses Pad() to extend the image to the correct aspect ratio. The value used for padding_mode of Pad() is constant with a fill value as the RGB mean of the image.

reflect-pad

Uses Pad() to extend the image to the correct aspect ratio. The value used for padding_mode of Pad() is reflect.

symmetric-pad

Uses Pad() to extend the image to the correct aspect ratio. The value used for padding_mode of Pad() is symmetric.

zero-pad

Uses Pad() to extend the image to the correct aspect ratio. The value used for padding_mode of Pad() is constant with a value of zero.

Methods

__init__(*args: Any, **kwargs: Any) None
forward(*args, **kwargs) dict[str, any][source]

Modify the image, bboxes and coordinates to have a given aspect ratio (shape)

Use module in Compose and pass structured dict as argument. This function will then obtain a dictionary as first and most likely only argument.

Keyword Arguments:
  • image – One single image as tv_tensors.Image of shape [B x C x H x W]

  • box – Zero, one, or multiple bounding boxes per image. With N detections and a batch size of B, the bounding boxes have a shape of [B*N x 4]. Also, keep in mind that bboxes has to be a two-dimensional tensor, because every image in this batch can have a different number of detections. The ordering of the bounding boxes will stay the same.

  • keypoints – Joint-coordinates as key-points with coordinates in relation to the original image. With N detections per image and a batch size of B, the coordinates have a max shape of [B*N x J x 2|3]. Either batch and detections are stacked in one dimension, because every image in this batch can have a different number of detections, or there is not batched dimension at all. The ordering of the coordinates will stay the same.

  • output_size – (h, w) as target height and width of the image

  • mode – See class description.

  • aspect_round_decimals – (int, optional) Before comparing them, round the aspect ratios to the number of decimals. Default DEF_VAL.images.aspect_round_decimals.

  • fill – (Union[int, float, List[float]], optional) See parameter fill of torchvision.transforms.v2.Pad(). Only applicable if mode is ‘fill-pad’. In that instance, fill has to be set and is no longer optional / ignored.

Returns:

Structured dict with updated and overwritten image(s), bboxes and coordinates. All additional input values are passed down as well.

Attributes

modes

validators

H

Original height of the image

W

Original width of the image

original_aspect

Original aspect ratio of the image as width / height

h

New height of the image

w

New width of the image

target_aspect

Target aspect ratio of the image as width / height