dgs.utils.image.CustomCropResize.forward

CustomCropResize.forward(*args, **kwargs) dict[str, any][source]

Extract bounding boxes out of one or multiple images and resize the crops to the target shape.

For bboxes and coordinates, N has to be at least 1.

Either there is exactly one image or exactly as many stacked images as there are bounding boxes. If there is one image, then there can be an arbitrary number (N) of bboxes and key points, which will all be extracted from this single source image. If there are exactly N equally sized images, with N bounding boxes and N key points, every box will be extracted from exactly one image.

Note

If you want to extract 3 bounding boxes from img1 and 2 from img2, either call this method twice, or create an image as a stacked or expanded version of img1 and img2. The second method will only work, iff img1 and img2 have the same shape!

Note

The bboxes have to be one BoundingBoxes object, therefore, all boxes have to have the same format and canvas size.

Keyword Arguments:
  • images – A list of torchvision images either as byte or float image. All images have a shape of [1 x C x H x W].

  • box – tv_tensor.BoundingBoxes in XYWH box_format of shape [N x 4], with N detections.

  • keypoints – The joint coordinates in global frame as [N x J x 2|3]

  • mode – The mode for resizing. Similar to the modes of CustomToAspect, except there is one additional case ‘outside-crop’ available. ‘outside-crop’ uses the data of the surrounding original image instead of padding the image with zeros, extracting more of the image than the bounding-box.

  • output_size – The target height and width of the image as tuple (height, width).

  • aspect_mode (str, optional) – If mode is not ‘outside-crop’, use this transformation mode to resize intermediate images to be stackable. Default DEF_VAL.images.aspect_mode.

Returns:

Will overwrite the content of the ‘image’ and ‘keypoints’ keys with the values of the newly computed cropped image and the local coordinates.

The returned image is a single image with a shape of [N x C x h x w].

The shape of the coordinates will stay the same.

The bounding boxes will not change at all and will therefore still be in global coordinates.