Lifting API Reference

The lifting module exposes tasks that convert 2D image data into 3D representations — depth maps, point clouds, and meshes.

All point clouds emitted by lifting tasks use OpenGL/viewer camera space: X+ right, Y+ up, and Z- forward into the scene.

DepthEstimation

The primary entry point for the depth estimation task. Instantiate once and call .run() with a DepthEstimationCommand.

`vizion3d.lifting.DepthEstimation`

Facade for the Depth Estimation task.

This class serves as the primary entry point for triggering monocular depth estimation inference via direct Python import.

Example

from vizion3d.lifting import (
    DepthEstimation,
    DepthEstimationAdvanceConfig,
    DepthEstimationCommand,
)

cmd = DepthEstimationCommand(
    image_input=b"...",
    return_point_cloud=True,
    advanced_config=DepthEstimationAdvanceConfig(
        fx=615.0, fy=615.0, cx=320.0, cy=240.0
    ),
)
result = DepthEstimation().run(cmd)

Source code in vizion3d/lifting/__init__.py

class DepthEstimation:
    """
    Facade for the Depth Estimation task.

    This class serves as the primary entry point for triggering monocular depth
    estimation inference via direct Python import.

    Example:
        ```python
        from vizion3d.lifting import (
            DepthEstimation,
            DepthEstimationAdvanceConfig,
            DepthEstimationCommand,
        )

        cmd = DepthEstimationCommand(
            image_input=b"...",
            return_point_cloud=True,
            advanced_config=DepthEstimationAdvanceConfig(
                fx=615.0, fy=615.0, cx=320.0, cy=240.0
            ),
        )
        result = DepthEstimation().run(cmd)
        ```
    """

    experimental: bool = False

    def run(self, command: DepthEstimationCommand) -> DepthEstimationResult:
        """
        Dispatches the provided command through the CQRS bus to the registered handler.

        Args:
            command (DepthEstimationCommand): The inference parameters and flags.

        Returns:
            DepthEstimationResult: The resultant depth map and optional generated files.
        """
        return command_bus.dispatch(command)

`run(command)`

Dispatches the provided command through the CQRS bus to the registered handler.

Parameters:

Name	Type	Description	Default
`command`	`DepthEstimationCommand`	The inference parameters and flags.	required

Returns:

Name	Type	Description
`DepthEstimationResult`	`DepthEstimationResult`	The resultant depth map and optional generated files.

Source code in vizion3d/lifting/__init__.py

def run(self, command: DepthEstimationCommand) -> DepthEstimationResult:
    """
    Dispatches the provided command through the CQRS bus to the registered handler.

    Args:
        command (DepthEstimationCommand): The inference parameters and flags.

    Returns:
        DepthEstimationResult: The resultant depth map and optional generated files.
    """
    return command_bus.dispatch(command)

DepthEstimationCommand

Input contract for the depth estimation task. All inference parameters are declared here.

`vizion3d.lifting.commands.DepthEstimationCommand` `dataclass`

Bases: Command[DepthEstimationResult]

Command payload to trigger a depth estimation inference task.

Attributes:

Name	Type	Description
`image_input`	`str \| bytes`	The input image. Pass a file-path string or raw image bytes. The handler auto-detects which form is supplied.
`model_backend`	`str`	Depth Anything V2 checkpoint URL or local path. Defaults to the vizion3D release checkpoint (`depth_anything_v2_vitb.pth`), downloaded on first use and cached under `~/.cache/vizion3d/models/`. Set `VIZION3D_MODEL_CACHE` to override the cache directory.
`return_depth_image`	`bool`	When `True`, the result includes a 16-bit grayscale `open3d.geometry.Image` (dtype `uint16`). Depth Anything V2 outputs inverse relative depth (higher = closer), so higher uint16 values correspond to closer pixels — closer = brighter. Requires Open3D (Python 3.12).
`return_raw_depth`	`bool`	When `True`, the result includes the raw depth array as a float32 numpy array of shape `(H, W)`. Values are relative (not metric) for monocular depth — unmodified output from the model.
`return_point_cloud`	`bool`	When `True`, the result includes an `open3d.geometry.PointCloud` unprojected from the RGB-D image using the camera intrinsics in `advanced_config`. Point coordinates are in metres. Requires Open3D (Python 3.12).
`advanced_config`	`DepthEstimationAdvanceConfig`	Camera intrinsics for point cloud unprojection. All fields auto-derive from image dimensions when left as `None` — only supply values for a real calibrated camera, e.g. `advanced_config=DepthEstimationAdvanceConfig(fx=615.0, fy=615.0)`.

Source code in vizion3d/lifting/commands.py

@dataclass
class DepthEstimationCommand(Command[DepthEstimationResult]):
    """
    Command payload to trigger a depth estimation inference task.

    Attributes:
        image_input: The input image. Pass a file-path string or raw image bytes.
            The handler auto-detects which form is supplied.
        model_backend: Depth Anything V2 checkpoint URL or local path.  Defaults
            to the vizion3D release checkpoint (`depth_anything_v2_vitb.pth`),
            downloaded on first use and cached under `~/.cache/vizion3d/models/`.
            Set `VIZION3D_MODEL_CACHE` to override the cache directory.

        return_depth_image: When `True`, the result includes a 16-bit grayscale
            `open3d.geometry.Image` (dtype `uint16`).  Depth Anything V2 outputs
            inverse relative depth (higher = closer), so higher uint16 values
            correspond to closer pixels — closer = brighter.
            Requires Open3D (Python 3.12).
        return_raw_depth: When `True`, the result includes the raw depth array
            as a float32 numpy array of shape `(H, W)`.  Values are relative
            (not metric) for monocular depth — unmodified output from the model.
        return_point_cloud: When `True`, the result includes an
            `open3d.geometry.PointCloud` unprojected from the RGB-D image using
            the camera intrinsics in `advanced_config`. Point coordinates are in metres.
            Requires Open3D (Python 3.12).
        advanced_config: Camera intrinsics for point cloud unprojection. All fields
            auto-derive from image dimensions when left as ``None`` — only supply
            values for a real calibrated camera, e.g.
            ``advanced_config=DepthEstimationAdvanceConfig(fx=615.0, fy=615.0)``.
    """

    image_input: str | bytes
    model_backend: str = DEFAULT_DEPTH_MODEL_URL
    return_depth_image: bool = True
    return_raw_depth: bool = True
    return_point_cloud: bool = False
    advanced_config: DepthEstimationAdvanceConfig = field(
        default_factory=DepthEstimationAdvanceConfig
    )

DepthEstimationAdvanceConfig

Camera intrinsics and depth range settings. Pass an instance of this model as advanced_config on DepthEstimationCommand to override the PrimeSense defaults used for point cloud unprojection. See Camera Intrinsics Matrix for a full explanation of fx, fy, cx, and cy.

`vizion3d.lifting.models.DepthEstimationAdvanceConfig`

Bases: BaseModel

Camera intrinsics for depth estimation.

All fields default to None, which causes the handler to auto-derive them from the input image dimensions (fx = fy ≈ 0.85 × width for ~63° FOV; cx/cy at image centre). Supply explicit values for real calibrated cameras.

Attributes:

Name	Type	Description
`fx`	`float \| None`	Horizontal focal length in pixels. `None` = auto-derived from image width. A larger value means a narrower FOV and more perspective compression.
`fy`	`float \| None`	Vertical focal length in pixels. `None` = auto-derived (same as fx). Usually equal to `fx` for square pixels.
`cx`	`float \| None`	Principal point x — the pixel column of the optical axis. `None` = image width / 2.
`cy`	`float \| None`	Principal point y — the pixel row of the optical axis. `None` = image height / 2.

Source code in vizion3d/lifting/models.py

class DepthEstimationAdvanceConfig(BaseModel):
    """
    Camera intrinsics for depth estimation.

    All fields default to ``None``, which causes the handler to auto-derive them
    from the input image dimensions (fx = fy ≈ 0.85 × width for ~63° FOV;
    cx/cy at image centre). Supply explicit values for real calibrated cameras.

    Attributes:
        fx: Horizontal focal length in pixels. ``None`` = auto-derived from image
            width. A larger value means a narrower FOV and more perspective compression.
        fy: Vertical focal length in pixels. ``None`` = auto-derived (same as fx).
            Usually equal to ``fx`` for square pixels.
        cx: Principal point x — the pixel column of the optical axis. ``None`` =
            image width / 2.
        cy: Principal point y — the pixel row of the optical axis. ``None`` =
            image height / 2.
    """

    fx: float | None = None
    fy: float | None = None
    cx: float | None = None
    cy: float | None = None

DepthEstimationResult

Output contract returned by DepthEstimation.run(). All fields are always present; optional geometry fields are None when the corresponding return_* flag was not set. Returned point clouds use OpenGL/viewer camera space: X+ right, Y+ up, Z- forward.

`vizion3d.lifting.models.DepthEstimationResult`

Bases: BaseModel

Result payload returned after a depth estimation inference task.

Attributes:

Name	Type	Description
`depth_map`	`list[list[float]]`	Raw floating-point depth array, shape `[H][W]`. Values are relative (not metric) for monocular models — closer objects have higher values for inverse-depth outputs.
`min_depth`	`float`	Minimum value in `depth_map`.
`max_depth`	`float`	Maximum value in `depth_map`. Guaranteed `max_depth >= min_depth`.
`backend_used`	`str`	Resolved model identifier that processed the request (local file path).
`depth_image`	`Image \| None`	16-bit grayscale `open3d.geometry.Image` (dtype `uint16`), present by default (suppress with `return_depth_image=False`). Depth Anything V2 outputs inverse relative depth, so higher uint16 values correspond to closer pixels — closer = brighter.
`raw_depth`	`ndarray \| None`	Raw float32 depth array, shape `(H, W)`, present by default (suppress with `return_raw_depth=False`). Values are relative — not metric — for monocular depth estimation.
`point_cloud`	`PointCloud \| None`	Coloured `open3d.geometry.PointCloud` unprojected from the RGB-D image, present when `return_point_cloud=True`. Coordinates use the OpenGL/viewer convention (X+ right, Y+ up, Z- forward) and are in metres — multiply distances by `point_cloud_scale` (always `1.0`) to confirm the unit.
`point_cloud_scale`	`float`	Scale factor for the point cloud coordinate space. Multiply any distance measured between two points in the returned point cloud by this value to get the equivalent distance in metres. Always `1.0` — Open3D produces point cloud coordinates directly in metres.

Source code in vizion3d/lifting/models.py

class DepthEstimationResult(BaseModel):
    """
    Result payload returned after a depth estimation inference task.

    Attributes:
        depth_map: Raw floating-point depth array, shape `[H][W]`. Values are
            relative (not metric) for monocular models — closer objects have
            higher values for inverse-depth outputs.
        min_depth: Minimum value in `depth_map`.
        max_depth: Maximum value in `depth_map`. Guaranteed `max_depth >= min_depth`.
        backend_used: Resolved model identifier that processed the request
            (local file path).
        depth_image: 16-bit grayscale `open3d.geometry.Image` (dtype `uint16`),
            present by default (suppress with `return_depth_image=False`).
            Depth Anything V2 outputs inverse relative depth, so higher uint16 values
            correspond to closer pixels — closer = brighter.
        raw_depth: Raw float32 depth array, shape `(H, W)`, present by default
            (suppress with `return_raw_depth=False`).  Values are relative —
            not metric — for monocular depth estimation.
        point_cloud: Coloured `open3d.geometry.PointCloud` unprojected from the
            RGB-D image, present when `return_point_cloud=True`. Coordinates use
            the OpenGL/viewer convention (X+ right, Y+ up, Z- forward) and are in
            metres — multiply distances by `point_cloud_scale` (always `1.0`) to
            confirm the unit.
        point_cloud_scale: Scale factor for the point cloud coordinate space.
            Multiply any distance measured between two points in the returned
            point cloud by this value to get the equivalent distance in metres.
            Always `1.0` — Open3D produces point cloud coordinates directly in metres.
    """

    depth_map: list[list[float]]
    min_depth: float
    max_depth: float
    backend_used: str
    depth_image: O3dImage | None = None
    raw_depth: np.ndarray | None = None
    point_cloud: O3dPointCloud | None = None
    point_cloud_scale: float = 1.0

    model_config = ConfigDict(arbitrary_types_allowed=True)

Lifting API Reference

DepthEstimation

vizion3d.lifting.DepthEstimation

run(command)

DepthEstimationCommand

vizion3d.lifting.commands.DepthEstimationCommand dataclass

DepthEstimationAdvanceConfig

vizion3d.lifting.models.DepthEstimationAdvanceConfig

DepthEstimationResult

vizion3d.lifting.models.DepthEstimationResult

`vizion3d.lifting.DepthEstimation`

`run(command)`

`vizion3d.lifting.commands.DepthEstimationCommand` `dataclass`

`vizion3d.lifting.models.DepthEstimationAdvanceConfig`

`vizion3d.lifting.models.DepthEstimationResult`