3D-Image Shape Representation

3 min readDec 20, 2020

Real world objects, most times, are three dimensional. In my opinion, most objects exist in 3D, hence it is important that we represent images and train models using three dimensional images.

3D Vision is an active research area in Computer vision. Nowadays, 3D vision combines both traditional algorithms and deep learning. In this post, I will share existing representations of 3D shapes.

These representations include:

Depth Map

Depth Map gives distance from the camera to the object in real world. When a depth map is merged with a source(input) image, it forms a 3D image. Depth Maps data can also be recorded for some types of 3D sensors. In order to create a depth map from a 2D image, a fully convolutional network can be used to predict it. However, the major problem with depth map is scale ambiguity and this can be minimised by using a scale invariant loss.

2. Voxel Grid

Voxel Grid represents a shape with a V x V x V grid of occupancies. It’s like a segmentation mask in Mask-R-CNN, but in 3D. Voxel shapes can be generated from a 2D input image by using a combination of 2D and 3D CNN. It can also be generated by using a voxel tube representation model. You can also scale voxels using Oct-Trees.

3. Implicit Surface

Implicit Surface learns a function to classify arbitrary points as inside or outside the shape. The term implicit means that the equation is not solved for any variable.

Implicit Surface (Source: Google Images)

4. Point Cloud

Point Cloud represents a set of P points in 3D space. Self driving cars uses point cloud representation of the world around it. A neural network application popularly used is the PointNet .

5. Mesh

Mesh is commonly used in Computer Graphics. It represents a 3D shape as a set of triangles. A popular deep learning architecture for mesh is Pixel2Mesh. Graph Convolutions can be used to predict triangle meshes.

3D-Image Shape Representation

Written by Ayoola Olaleye

No responses yet