From KAIST AI621 Computational Image Generation and Manipulation

Light Field Rendering, Focal Stacks, and Depth from Defocus

This is a light field image of a chess board scene, obtained from (The (New) Stanford Light Field Archive — Lightfield.stanford.edu, n.d.) . This image file is formatted in the same way as images captured by a plenoptic camera.

By the way, the original image is very large, with its 11200 x 6400 resolution and ~100MB size.

if you zoom in to the image, you can see that the image is occupied by 16 x 16 pixel blocks (which we call “lenslets”).

Loading the light field image.

  
import numpy as np
from PIL import Image

def load_lightfield(path: str, lenslet_size: int = 16) -> np.ndarray:
    """ Read a raw plenoptic image and reshape it to L[u, v, s, t, c]."""
    img = Image.open(path).convert("RGB")
    arr = np.asarray(img)
    h, w, c = arr.shape
    
    assert c == 3, f"Expected 3 channels, got {c}"
    assert h % lenslet_size == 0 and w % lenslet_size == 0, f"Image dimensions {w}x{h} are not divisible by lenslet_size={lenslet_size}"
    
    t = h // lenslet_size
    s = w // lenslet_size
    
    # Reshape to [t, v, s, u, c] then transpose to [u, v, s, t, c]
    lightfield = arr.reshape(t, lenslet_size, s, lenslet_size, 3).transpose(3, 1, 2, 0, 4)
    return lightfield
    
...

if __name__ == "__main__":
    input_path = "chessboard_lightfield.png"
    lf = load_lightfield(input_path)
    print("L shape:", lf.shape) # Expected (16, 16, 700, 400, 3) for chessboard_lightfield.png

Sub-aperture views.

We can reorganize the images into what we call “sub-aperture views”. (Image from (Hahne, n.d.) )

sub-aperture views can be think of as images from many pinhole cameras

  
def subaperture_mosaic(lightfield: np.ndarray) -> np.ndarray:
    """Rearrange a light field L[u, v, s, t, c] into a mosaic of sub-aperture views.

    The resulting image stacks the 16 x 16 pinhole views into a grid where
    u increases left-to-right and v increases top-to-bottom.
    """
    if lightfield.ndim != 5:
        raise ValueError("Expected lightfield with 5 dimensions [u, v, s, t, c]")

    U, V, S, T, C = lightfield.shape
    if C != 3:
        raise ValueError(f"Expected 3 color channels, got {C}")

    # Move axes to [v, t, u, s, c] then flatten into a 2D mosaic
    return lightfield.transpose(1, 3, 0, 2, 4).reshape(V * T, U * S, C)

The solution is rather simple, transposing the dimensions accordingly gives out the following satisfying outcome:

Refocusing and focal-stack generation

A different effect can be achieved by appropriately combining parts of the light field is refocusing at different depths.

\[\int_u \int_v L(u,v,s,t,c)dvdu\]

As explained in detail in Section 4 of (Ng et al., 2005) , focusing at different depths requires shifting the sub-aperture images before averaging them with the shift of each image depending on the desired focus depth and the location of its sub-aperture.

\[I (s,t,c,d) = \int_u \int_v L(u,v, s+du, t+dv, c)dvdu\]

for \(d=0\) you can see that the image we obtain is the same from the first equation.

The resulting code looks like this:

  
def refocus_stack(lightfield: np.ndarray, d_values: np.ndarray) -> np.ndarray:
    """Generate a focal stack I(s,t,c,d) by shift-and-add refocusing across aperture samples."""
    if lightfield.ndim != 5:
        raise ValueError("Expected lightfield with 5 dimensions [u, v, s, t, c]")

    U, V, S, T, C = lightfield.shape
    if C != 3:
        raise ValueError(f"Expected 3 color channels, got {C}")

    u_coords = np.arange(U) - (U - 1) / 2.0
    v_coords = np.arange(V) - (V - 1) / 2.0

    stack = []
    for d in d_values:
        acc = np.zeros((T, S, 3), dtype=np.float64)
        wsum = np.zeros((T, S), dtype=np.float64)

        for vi, v in enumerate(v_coords):
            for ui, u in enumerate(u_coords):
                shift_s = d * u
                shift_t = -d * v # Fix directions!
                view = lightfield[ui, vi].transpose(1, 0, 2)  # reorder to (T, S, 3)
                shifted_view, weight_map = _bilinear_shift(view, shift_t, shift_s)
                acc += shifted_view
                wsum += weight_map

        wsum_safe = np.maximum(wsum, 1e-6)
        refocused = acc / wsum_safe[..., None]
        refocused = np.clip(refocused, 0, 255).astype(np.uint8)
        stack.append(refocused)

    return np.stack(stack, axis=0)

Because of how the original light field is constructed, how you add the shifts are the opposite for s and t:

  
shift_s = d * u
shift_t = -d * v # Fix directions!

Because it is a chessboard, I created 8 refocused images, animated below:

And the individual refocused images:

I had to add fix so that weird black padding at the right and bottom doesn’t appear when shift_t and shift_s is 0. Should have more elegant implementation.

  
def _bilinear_shift(view: np.ndarray, shift_t: float, shift_s: float):
    if shift_t == 0 and shift_s == 0:
        return view, np.ones(view.shape[:2])
        ...

Comment

I really enjoyed the time while I was doing these assignments, albeit I was quite busy from my lab schedule. Thank you for preparing the resources and classes!

References

The (New) Stanford Light Field Archive — lightfield.stanford.edu. http://lightfield.stanford.edu/
Hahne, C. The Plenoptic Camera aka Light Field Camera — plenoptic.info. https://www.plenoptic.info/pages/sub-aperture.html
Ng, R., Levoy, M., Brédif, M., Duval, G., Horowitz, M., & Hanrahan, P. (2005). Light Field Photography with a Hand-Held Plenopic Camera. Technical Report CTSR 2005-02, CTSR. https://graphics.stanford.edu/papers/lfcamera/lfcamera-150dpi.pdf

Light Field Photography

Exploring light field photography.

Light Field Rendering, Focal Stacks, and Depth from Defocus

Comment

References

Further Reading

Poisson Blending

Humanoids 2025

The most important unit of the transformer