From KAIST AI621 Computational Image Generation and Manipulation
Light Field Rendering, Focal Stacks, and Depth from Defocus
This is a light field image of a chess board scene, obtained from (The (New) Stanford Light Field Archive — Lightfield.stanford.edu, n.d.) . This image file is formatted in the same way as images captured by a plenoptic camera.
By the way, the original image is very large, with its 11200 x 6400 resolution and ~100MB size.
if you zoom in to the image, you can see that the image is occupied by 16 x 16 pixel blocks (which we call “lenslets”).

Loading the light field image.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import numpy as np
from PIL import Image
def load_lightfield(path: str, lenslet_size: int = 16) -> np.ndarray:
""" Read a raw plenoptic image and reshape it to L[u, v, s, t, c]."""
img = Image.open(path).convert("RGB")
arr = np.asarray(img)
h, w, c = arr.shape
assert c == 3, f"Expected 3 channels, got {c}"
assert h % lenslet_size == 0 and w % lenslet_size == 0, f"Image dimensions {w}x{h} are not divisible by lenslet_size={lenslet_size}"
t = h // lenslet_size
s = w // lenslet_size
# Reshape to [t, v, s, u, c] then transpose to [u, v, s, t, c]
lightfield = arr.reshape(t, lenslet_size, s, lenslet_size, 3).transpose(3, 1, 2, 0, 4)
return lightfield
...
if __name__ == "__main__":
input_path = "chessboard_lightfield.png"
lf = load_lightfield(input_path)
print("L shape:", lf.shape) # Expected (16, 16, 700, 400, 3) for chessboard_lightfield.png
Sub-aperture views.
We can reorganize the images into what we call “sub-aperture views”. (Image from (Hahne, n.d.) )
sub-aperture views can be think of as images from many pinhole cameras
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def subaperture_mosaic(lightfield: np.ndarray) -> np.ndarray:
"""Rearrange a light field L[u, v, s, t, c] into a mosaic of sub-aperture views.
The resulting image stacks the 16 x 16 pinhole views into a grid where
u increases left-to-right and v increases top-to-bottom.
"""
if lightfield.ndim != 5:
raise ValueError("Expected lightfield with 5 dimensions [u, v, s, t, c]")
U, V, S, T, C = lightfield.shape
if C != 3:
raise ValueError(f"Expected 3 color channels, got {C}")
# Move axes to [v, t, u, s, c] then flatten into a 2D mosaic
return lightfield.transpose(1, 3, 0, 2, 4).reshape(V * T, U * S, C)
The solution is rather simple, transposing the dimensions accordingly gives out the following satisfying outcome:

Refocusing and focal-stack generation
A different effect can be achieved by appropriately combining parts of the light field is refocusing at different depths.
\[\int_u \int_v L(u,v,s,t,c)dvdu\]As explained in detail in Section 4 of (Ng et al., 2005) , focusing at different depths requires shifting the sub-aperture images before averaging them with the shift of each image depending on the desired focus depth and the location of its sub-aperture.
\[I (s,t,c,d) = \int_u \int_v L(u,v, s+du, t+dv, c)dvdu\]for \(d=0\) you can see that the image we obtain is the same from the first equation.
The resulting code looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def refocus_stack(lightfield: np.ndarray, d_values: np.ndarray) -> np.ndarray:
"""Generate a focal stack I(s,t,c,d) by shift-and-add refocusing across aperture samples."""
if lightfield.ndim != 5:
raise ValueError("Expected lightfield with 5 dimensions [u, v, s, t, c]")
U, V, S, T, C = lightfield.shape
if C != 3:
raise ValueError(f"Expected 3 color channels, got {C}")
u_coords = np.arange(U) - (U - 1) / 2.0
v_coords = np.arange(V) - (V - 1) / 2.0
stack = []
for d in d_values:
acc = np.zeros((T, S, 3), dtype=np.float64)
wsum = np.zeros((T, S), dtype=np.float64)
for vi, v in enumerate(v_coords):
for ui, u in enumerate(u_coords):
shift_s = d * u
shift_t = -d * v # Fix directions!
view = lightfield[ui, vi].transpose(1, 0, 2) # reorder to (T, S, 3)
shifted_view, weight_map = _bilinear_shift(view, shift_t, shift_s)
acc += shifted_view
wsum += weight_map
wsum_safe = np.maximum(wsum, 1e-6)
refocused = acc / wsum_safe[..., None]
refocused = np.clip(refocused, 0, 255).astype(np.uint8)
stack.append(refocused)
return np.stack(stack, axis=0)
Because of how the original light field is constructed, how you add the shifts are the opposite for s and t:
1
2
shift_s = d * u
shift_t = -d * v # Fix directions!
Because it is a chessboard, I created 8 refocused images, animated below:

And the individual refocused images:
![]() | ![]() | ![]() | ![]() |
|---|---|---|---|
![]() | ![]() | ![]() | ![]() |
I had to add fix so that weird black padding at the right and bottom doesn’t appear when shift_t and shift_s is 0. Should have more elegant implementation.
1
2
3
4
def _bilinear_shift(view: np.ndarray, shift_t: float, shift_s: float):
if shift_t == 0 and shift_s == 0:
return view, np.ones(view.shape[:2])
...
Comment
I really enjoyed the time while I was doing these assignments, albeit I was quite busy from my lab schedule. Thank you for preparing the resources and classes!
References
- The (New) Stanford Light Field Archive — lightfield.stanford.edu. http://lightfield.stanford.edu/
- Hahne, C. The Plenoptic Camera aka Light Field Camera — plenoptic.info. https://www.plenoptic.info/pages/sub-aperture.html
- Ng, R., Levoy, M., Brédif, M., Duval, G., Horowitz, M., & Hanrahan, P. (2005). Light Field Photography with a Hand-Held Plenopic Camera. Technical Report CTSR 2005-02, CTSR. https://graphics.stanford.edu/papers/lfcamera/lfcamera-150dpi.pdf








