Skip to content

Lifting API Reference

The lifting module exposes tasks that convert 2D image data into 3D representations — depth maps, point clouds, and meshes.

All point clouds emitted by lifting tasks use OpenGL/viewer camera space: X+ right, Y+ up, and Z- forward into the scene.


DepthEstimation

The primary entry point for the depth estimation task. Instantiate once and call .run() with a DepthEstimationCommand.

vizion3d.lifting.DepthEstimation

Facade for the Depth Estimation task.

This class serves as the primary entry point for triggering monocular depth estimation inference via direct Python import.

Example
from vizion3d.lifting import (
    DepthEstimation,
    DepthEstimationAdvanceConfig,
    DepthEstimationCommand,
)

cmd = DepthEstimationCommand(
    image_input=b"...",
    return_point_cloud=True,
    advanced_config=DepthEstimationAdvanceConfig(
        fx=615.0, fy=615.0, cx=320.0, cy=240.0
    ),
)
result = DepthEstimation().run(cmd)
Source code in vizion3d/lifting/__init__.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
class DepthEstimation:
    """
    Facade for the Depth Estimation task.

    This class serves as the primary entry point for triggering monocular depth
    estimation inference via direct Python import.

    Example:
        ```python
        from vizion3d.lifting import (
            DepthEstimation,
            DepthEstimationAdvanceConfig,
            DepthEstimationCommand,
        )

        cmd = DepthEstimationCommand(
            image_input=b"...",
            return_point_cloud=True,
            advanced_config=DepthEstimationAdvanceConfig(
                fx=615.0, fy=615.0, cx=320.0, cy=240.0
            ),
        )
        result = DepthEstimation().run(cmd)
        ```
    """

    experimental: bool = False

    def run(self, command: DepthEstimationCommand) -> DepthEstimationResult:
        """
        Dispatches the provided command through the CQRS bus to the registered handler.

        Args:
            command (DepthEstimationCommand): The inference parameters and flags.

        Returns:
            DepthEstimationResult: The resultant depth map and optional generated files.
        """
        return command_bus.dispatch(command)

run(command)

Dispatches the provided command through the CQRS bus to the registered handler.

Parameters:

Name Type Description Default
command DepthEstimationCommand

The inference parameters and flags.

required

Returns:

Name Type Description
DepthEstimationResult DepthEstimationResult

The resultant depth map and optional generated files.

Source code in vizion3d/lifting/__init__.py
38
39
40
41
42
43
44
45
46
47
48
def run(self, command: DepthEstimationCommand) -> DepthEstimationResult:
    """
    Dispatches the provided command through the CQRS bus to the registered handler.

    Args:
        command (DepthEstimationCommand): The inference parameters and flags.

    Returns:
        DepthEstimationResult: The resultant depth map and optional generated files.
    """
    return command_bus.dispatch(command)

DepthEstimationCommand

Input contract for the depth estimation task. All inference parameters are declared here.

vizion3d.lifting.commands.DepthEstimationCommand dataclass

Bases: Command[DepthEstimationResult]

Command payload to trigger a depth estimation inference task.

Attributes:

Name Type Description
image_input str | bytes

The input image. Pass a file-path string or raw image bytes. The handler auto-detects which form is supplied.

model_backend str

Depth Anything V2 checkpoint URL or local path. Defaults to the vizion3D release checkpoint (depth_anything_v2_vitb.pth), downloaded on first use and cached under ~/.cache/vizion3d/models/. Set VIZION3D_MODEL_CACHE to override the cache directory.

return_depth_image bool

When True, the result includes a 16-bit grayscale open3d.geometry.Image (dtype uint16). Depth Anything V2 outputs inverse relative depth (higher = closer), so higher uint16 values correspond to closer pixels — closer = brighter. Requires Open3D (Python 3.12).

return_raw_depth bool

When True, the result includes the raw depth array as a float32 numpy array of shape (H, W). Values are relative (not metric) for monocular depth — unmodified output from the model.

return_point_cloud bool

When True, the result includes an open3d.geometry.PointCloud unprojected from the RGB-D image using the camera intrinsics in advanced_config. Point coordinates are in metres. Requires Open3D (Python 3.12).

advanced_config DepthEstimationAdvanceConfig

Camera intrinsics for point cloud unprojection. All fields auto-derive from image dimensions when left as None — only supply values for a real calibrated camera, e.g. advanced_config=DepthEstimationAdvanceConfig(fx=615.0, fy=615.0).

Source code in vizion3d/lifting/commands.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
@dataclass
class DepthEstimationCommand(Command[DepthEstimationResult]):
    """
    Command payload to trigger a depth estimation inference task.

    Attributes:
        image_input: The input image. Pass a file-path string or raw image bytes.
            The handler auto-detects which form is supplied.
        model_backend: Depth Anything V2 checkpoint URL or local path.  Defaults
            to the vizion3D release checkpoint (`depth_anything_v2_vitb.pth`),
            downloaded on first use and cached under `~/.cache/vizion3d/models/`.
            Set `VIZION3D_MODEL_CACHE` to override the cache directory.

        return_depth_image: When `True`, the result includes a 16-bit grayscale
            `open3d.geometry.Image` (dtype `uint16`).  Depth Anything V2 outputs
            inverse relative depth (higher = closer), so higher uint16 values
            correspond to closer pixels — closer = brighter.
            Requires Open3D (Python 3.12).
        return_raw_depth: When `True`, the result includes the raw depth array
            as a float32 numpy array of shape `(H, W)`.  Values are relative
            (not metric) for monocular depth — unmodified output from the model.
        return_point_cloud: When `True`, the result includes an
            `open3d.geometry.PointCloud` unprojected from the RGB-D image using
            the camera intrinsics in `advanced_config`. Point coordinates are in metres.
            Requires Open3D (Python 3.12).
        advanced_config: Camera intrinsics for point cloud unprojection. All fields
            auto-derive from image dimensions when left as ``None`` — only supply
            values for a real calibrated camera, e.g.
            ``advanced_config=DepthEstimationAdvanceConfig(fx=615.0, fy=615.0)``.
    """

    image_input: str | bytes
    model_backend: str = DEFAULT_DEPTH_MODEL_URL
    return_depth_image: bool = True
    return_raw_depth: bool = True
    return_point_cloud: bool = False
    advanced_config: DepthEstimationAdvanceConfig = field(
        default_factory=DepthEstimationAdvanceConfig
    )

DepthEstimationAdvanceConfig

Camera intrinsics and depth range settings. Pass an instance of this model as advanced_config on DepthEstimationCommand to override the PrimeSense defaults used for point cloud unprojection. See Camera Intrinsics Matrix for a full explanation of fx, fy, cx, and cy.

vizion3d.lifting.models.DepthEstimationAdvanceConfig

Bases: BaseModel

Camera intrinsics for depth estimation.

All fields default to None, which causes the handler to auto-derive them from the input image dimensions (fx = fy ≈ 0.85 × width for ~63° FOV; cx/cy at image centre). Supply explicit values for real calibrated cameras.

Attributes:

Name Type Description
fx float | None

Horizontal focal length in pixels. None = auto-derived from image width. A larger value means a narrower FOV and more perspective compression.

fy float | None

Vertical focal length in pixels. None = auto-derived (same as fx). Usually equal to fx for square pixels.

cx float | None

Principal point x — the pixel column of the optical axis. None = image width / 2.

cy float | None

Principal point y — the pixel row of the optical axis. None = image height / 2.

Source code in vizion3d/lifting/models.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class DepthEstimationAdvanceConfig(BaseModel):
    """
    Camera intrinsics for depth estimation.

    All fields default to ``None``, which causes the handler to auto-derive them
    from the input image dimensions (fx = fy ≈ 0.85 × width for ~63° FOV;
    cx/cy at image centre). Supply explicit values for real calibrated cameras.

    Attributes:
        fx: Horizontal focal length in pixels. ``None`` = auto-derived from image
            width. A larger value means a narrower FOV and more perspective compression.
        fy: Vertical focal length in pixels. ``None`` = auto-derived (same as fx).
            Usually equal to ``fx`` for square pixels.
        cx: Principal point x — the pixel column of the optical axis. ``None`` =
            image width / 2.
        cy: Principal point y — the pixel row of the optical axis. ``None`` =
            image height / 2.
    """

    fx: float | None = None
    fy: float | None = None
    cx: float | None = None
    cy: float | None = None

DepthEstimationResult

Output contract returned by DepthEstimation.run(). All fields are always present; optional geometry fields are None when the corresponding return_* flag was not set. Returned point clouds use OpenGL/viewer camera space: X+ right, Y+ up, Z- forward.

vizion3d.lifting.models.DepthEstimationResult

Bases: BaseModel

Result payload returned after a depth estimation inference task.

Attributes:

Name Type Description
depth_map list[list[float]]

Raw floating-point depth array, shape [H][W]. Values are relative (not metric) for monocular models — closer objects have higher values for inverse-depth outputs.

min_depth float

Minimum value in depth_map.

max_depth float

Maximum value in depth_map. Guaranteed max_depth >= min_depth.

backend_used str

Resolved model identifier that processed the request (local file path).

depth_image Image | None

16-bit grayscale open3d.geometry.Image (dtype uint16), present by default (suppress with return_depth_image=False). Depth Anything V2 outputs inverse relative depth, so higher uint16 values correspond to closer pixels — closer = brighter.

raw_depth ndarray | None

Raw float32 depth array, shape (H, W), present by default (suppress with return_raw_depth=False). Values are relative — not metric — for monocular depth estimation.

point_cloud PointCloud | None

Coloured open3d.geometry.PointCloud unprojected from the RGB-D image, present when return_point_cloud=True. Coordinates use the OpenGL/viewer convention (X+ right, Y+ up, Z- forward) and are in metres — multiply distances by point_cloud_scale (always 1.0) to confirm the unit.

point_cloud_scale float

Scale factor for the point cloud coordinate space. Multiply any distance measured between two points in the returned point cloud by this value to get the equivalent distance in metres. Always 1.0 — Open3D produces point cloud coordinates directly in metres.

Source code in vizion3d/lifting/models.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
class DepthEstimationResult(BaseModel):
    """
    Result payload returned after a depth estimation inference task.

    Attributes:
        depth_map: Raw floating-point depth array, shape `[H][W]`. Values are
            relative (not metric) for monocular models — closer objects have
            higher values for inverse-depth outputs.
        min_depth: Minimum value in `depth_map`.
        max_depth: Maximum value in `depth_map`. Guaranteed `max_depth >= min_depth`.
        backend_used: Resolved model identifier that processed the request
            (local file path).
        depth_image: 16-bit grayscale `open3d.geometry.Image` (dtype `uint16`),
            present by default (suppress with `return_depth_image=False`).
            Depth Anything V2 outputs inverse relative depth, so higher uint16 values
            correspond to closer pixels — closer = brighter.
        raw_depth: Raw float32 depth array, shape `(H, W)`, present by default
            (suppress with `return_raw_depth=False`).  Values are relative —
            not metric — for monocular depth estimation.
        point_cloud: Coloured `open3d.geometry.PointCloud` unprojected from the
            RGB-D image, present when `return_point_cloud=True`. Coordinates use
            the OpenGL/viewer convention (X+ right, Y+ up, Z- forward) and are in
            metres — multiply distances by `point_cloud_scale` (always `1.0`) to
            confirm the unit.
        point_cloud_scale: Scale factor for the point cloud coordinate space.
            Multiply any distance measured between two points in the returned
            point cloud by this value to get the equivalent distance in metres.
            Always `1.0` — Open3D produces point cloud coordinates directly in metres.
    """

    depth_map: list[list[float]]
    min_depth: float
    max_depth: float
    backend_used: str
    depth_image: O3dImage | None = None
    raw_depth: np.ndarray | None = None
    point_cloud: O3dPointCloud | None = None
    point_cloud_scale: float = 1.0

    model_config = ConfigDict(arbitrary_types_allowed=True)