Advanced Config: Camera Intrinsics
DepthEstimationAdvanceConfig lets you supply the camera intrinsics that control how a raw depth map is lifted into a 3D point cloud. All four fields default to None, which causes the handler to auto-derive reasonable values from the input image dimensions. For accurate metric geometry, supply intrinsics from your actual camera calibration.
Background: the pinhole camera model
Not sure what
fx,fy,cx,cyare? See the Camera Intrinsics Matrix reference for a full explanation of the K matrix and how to read it for your camera.
Every point in a point cloud is computed by inverting the pinhole camera projection. vizion3d emits OpenGL/viewer camera coordinates: X+ right, Y+ up, and Z- forward into the scene. Given a pixel at image coordinates (u, v) with a positive depth value d (in metres), its 3D position (X, Y, Z) is:
X = (u - cx) * d / fx
Y = (cy - v) * d / fy
Z = -d
All four intrinsic parameters — fx, fy, cx, cy — appear in this formula. Values that do not match your camera produce a point cloud that is geometrically distorted: correct topology but skewed angles, compressed shapes, or stretched geometry.
Config fields
fx — horizontal focal length (pixels)
Default: None (auto-derived as image.width × 0.85, ~63° horizontal FOV)
The horizontal focal length of the camera in pixels. A larger fx means the camera has a narrower horizontal field of view; the same scene width maps to fewer pixels.
Effect on the point cloud: Controls the horizontal spread of 3D points. If fx is too small, the point cloud is horizontally compressed. If too large, it is horizontally stretched.
How to find it: Use your camera's calibration matrix K[0][0], or compute it from the horizontal field of view FoV_h:
fx = (image_width / 2) / tan(FoV_h / 2)
fy — vertical focal length (pixels)
Default: None (auto-derived as image.width × 0.85, same as fx)
The vertical focal length in pixels. For cameras with square pixels, fy ≈ fx. Cameras with non-square sensors may have fy ≠ fx.
Effect on the point cloud: Controls vertical spread analogously to fx. A fy that does not match your sensor produces vertically compressed or stretched geometry.
How to find it: K[1][1] from the calibration matrix, or:
fy = (image_height / 2) / tan(FoV_v / 2)
cx — horizontal principal point (pixels)
Default: None (auto-derived as image.width / 2)
The horizontal image coordinate of the optical axis — ideally the exact centre of the sensor.
Effect on the point cloud: Shifts the entire point cloud left or right. A cx that does not match your sensor makes the scene appear viewed from an off-centre vantage point, introducing a lateral tilt.
cy — vertical principal point (pixels)
Default: None (auto-derived as image.height / 2)
The vertical image coordinate of the optical axis.
Effect on the point cloud: Shifts the entire point cloud up or down. Like cx, a value that does not match your sensor introduces a tilt — vertical in this case.
Default values
| Parameter | Default | Auto-derive formula |
|---|---|---|
fx |
None |
image.width × 0.85 (~63° FOV) |
fy |
None |
image.width × 0.85 (same as fx) |
cx |
None |
image.width / 2 |
cy |
None |
image.height / 2 |
Usage: Direct Python
from vizion3d.lifting import (
DepthEstimation,
DepthEstimationAdvanceConfig,
DepthEstimationCommand,
)
# Supply calibrated intrinsics (e.g. Intel RealSense D435 at 1280×720)
config = DepthEstimationAdvanceConfig(
fx=909.15,
fy=908.48,
cx=640.0,
cy=360.0,
)
with open("scene.png", "rb") as f:
img_bytes = f.read()
result = DepthEstimation().run(
DepthEstimationCommand(
image_input=img_bytes,
return_point_cloud=True,
advanced_config=config,
)
)
import numpy as np
points = np.asarray(result.point_cloud.points)
print(f"Points: {len(points)}")
Omit the config entirely to use auto-derived intrinsics (suitable for arbitrary photos):
result = DepthEstimation().run(
DepthEstimationCommand(
image_input=img_bytes,
return_point_cloud=True,
)
)
Usage: REST API
All four intrinsic fields are optional form fields on the POST /lifting/depth-estimation endpoint. Omit any field to auto-derive it from the image dimensions.
# Supply calibrated intrinsics
curl -X POST "http://localhost:8000/lifting/depth-estimation" \
-F "image=@scene.png" \
-F "return_point_cloud=true" \
-F "fx=909.15" \
-F "fy=908.48" \
-F "cx=640.0" \
-F "cy=360.0"
Python requests equivalent:
import requests
with open("scene.png", "rb") as f:
img_bytes = f.read()
response = requests.post(
"http://localhost:8000/lifting/depth-estimation",
files={"image": ("scene.png", img_bytes, "image/png")},
data={
"return_point_cloud": "true",
"fx": "909.15",
"fy": "908.48",
"cx": "640.0",
"cy": "360.0",
},
)
data = response.json()
print(f"Depth range: {data['min_depth']:.4f} → {data['max_depth']:.4f}")
Usage: gRPC API
The DepthEstimationAdvanceConfig proto message mirrors the Python model. All fields are optional, so any omitted field auto-derives from the image on the server side.
import grpc
from vizion3d.proto import lifting_pb2, lifting_pb2_grpc
channel = grpc.insecure_channel("localhost:50051")
stub = lifting_pb2_grpc.LiftingServiceStub(channel)
with open("scene.png", "rb") as f:
img_bytes = f.read()
request = lifting_pb2.DepthEstimationRequest(
image_bytes=img_bytes,
return_point_cloud=True,
advanced_config=lifting_pb2.DepthEstimationAdvanceConfig(
fx=909.15,
fy=908.48,
cx=640.0,
cy=360.0,
),
)
response = stub.RunDepthEstimation(request)
print(f"Depth range: {response.min_depth:.4f} → {response.max_depth:.4f}")
How to get your camera intrinsics
Option 1: camera datasheet or SDK
Most camera SDKs expose the intrinsic matrix directly:
# Intel RealSense
import pyrealsense2 as rs
pipeline = rs.pipeline()
profile = pipeline.start()
intr = profile.get_stream(rs.stream.color).as_video_stream_profile().intrinsics
config = DepthEstimationAdvanceConfig(
fx=intr.fx, fy=intr.fy, cx=intr.ppx, cy=intr.ppy
)
Option 2: OpenCV calibration
Run a standard checkerboard calibration with cv2.calibrateCamera. The returned camera_matrix is:
[[fx, 0, cx],
[ 0, fy, cy],
[ 0, 0, 1]]
import cv2
import numpy as np
# After calibrating…
_, camera_matrix, _, _, _ = cv2.calibrateCamera(obj_points, img_points, image_size, None, None)
config = DepthEstimationAdvanceConfig(
fx=float(camera_matrix[0, 0]),
fy=float(camera_matrix[1, 1]),
cx=float(camera_matrix[0, 2]),
cy=float(camera_matrix[1, 2]),
)
Option 3: approximate from field of view
If you know the camera's horizontal field of view FoV_h (in degrees) and image dimensions:
import math
image_width = 1920
image_height = 1080
fov_h_deg = 69.0 # horizontal FoV in degrees
fx = (image_width / 2) / math.tan(math.radians(fov_h_deg / 2))
fy = fx # assumes square pixels
cx = image_width / 2 - 0.5
cy = image_height / 2 - 0.5
config = DepthEstimationAdvanceConfig(fx=fx, fy=fy, cx=cx, cy=cy)
Common camera presets
These are approximate values for common cameras. Always prefer calibrated values over these presets.
| Camera | Resolution | fx | fy | cx | cy |
|---|---|---|---|---|---|
| PrimeSense / Kinect v1 | 640×480 | 525.0 | 525.0 | 319.5 | 239.5 |
| Intel RealSense D415 | 1920×1080 | 1382.0 | 1382.0 | 960.5 | 540.5 |
| Intel RealSense D435 | 1280×720 | 909.0 | 908.0 | 640.0 | 360.0 |
| iPhone 14 wide (approx.) | 4032×3024 | 5500.0 | 5500.0 | 2016.0 | 1512.0 |
| Webcam 1080p (typical) | 1920×1080 | 1400.0 | 1400.0 | 960.0 | 540.0 |