Documentation Status Linting and Testing


Dynamically Gated Similarities

You can found the extended Documentation here.


You can find a visual Pipeline on LucidChart or downloadable as PDF.

Folder Structure

└─── configs
│        Multiple configuration.yaml files for running DGS or different submodules.
└─── docs
│        Source files for the documentation via sphinx and autodoc.
└─── data
│        folder containing the datasets, for structure see './data/dataset.rst' for more info.
└─── dependencies
│        References to git submodules e.g. to torchreid and my custom AlphaPose Fork.
└─── dgs
│    │    The source code of the algorithm.
│    │
│    └
│    │        Some default configuration if not overridden by config.yaml
│    │        This file will soon be replaced by 'dgs_values.yaml' .
│    └    dgs_values.yaml
│    │        Some default values if not overridden by config.yaml
│    │
│    └─── models
│    │        The building blocks for the DGS algorithm. Most models should be extendable fairly
│    │        straight-forward to implement custom sub-modules.
│    │
│    └─── utils
│             File-handling, IO, classes for State and Track handling, constants,
│             functions for torch module handling  visualization, and overall image handling
└─── pre_trained_models
│        storage for downloaded or custom pre-trained models
└─── tests
│        tests for dgs module
└─── .gitmodules      - The project uses git submodules to include different libraries.
└─── .pylintrc        - Settings for the pylint linter.
└─── LICENSE          - MIT License
└─── pyproject.toml   - Information about this project and additional build parameters.
└─── requirements.txt - Use pip to install the requirements,
│                       see './docs/installation.rst' for more information.

Abbreviations and Definitions

It is expected that all joints have 2D coordinates, but extending the code to 3D should be possible with minor adjustments. If joints have three-dimensions in the given code, it is expected, that the third dimension is the joint visibility.

Images in PyTorch and torchvision expect the dimensions as: [B x C x H x W]. Matplotlib and PIL use another structure: [B x H x W x C]. In which format the image tensor is, depends on the location in the code. Most general functions in torchvision expect uint8 (byte) tensors, while the torch Modules expect a float (float32) image, to be able to compute gradients over images. Some single images might not have the first dimension [C x H x W], even though most parts of the code expect a given Batch size.

With the State object, a general class for passing data between modules is created. Therefore, modules, where child-modules might have different outputs, generally use this State object instead of returning possibly non descriptive tensors. This can be seen in the SimilarityModule class and its children. SimilarityModules can be quite different, the pose similarity (e.g. ObjectKeypointSimilarity ) does need the key-point coordinates to compute the OKS, while the visual similarity (e.g. TorchreidVisualSimilarity ) needs the image crops to compute embeddings.




Number of joint-key-points in the given model (e.g. coco=17)


Number of channels of the current image (e.g. RGB=3)


Current batch-size, can be 0 in some cases


Number of detections in the current frame


Number of tracks at the current time


Number of “historical” frames in a dataset. The dataset has length \(L+1\)


Height and Width of the current image, as image shape: \((H, W)\)


Specific given height or width, as image shape: \((h, w)\)


Size of the heatmap, equals size of the cropped resized image


Embedding size, denoted for visual or pose based shape

Keep on reading

Class and Method definitions


Tracking via Dynamically Gated Similarities

Indices and tables