Image Processing and Computer Graphics

This document is derived of the lecture by Prof. Dr. T. Brox & Dr. O. Ronneberger & Prof. Dr. M. Teschner at the University of Freiburg in semester WS1516.

Introduction

Digital image

regular grid $I_{ij}$ (intensity values)
continuous function $I: (\Omega \subset \Re^2) \rightarrow \Re$ with $(x, y) \rightarrow I(x, y)$
$\Omega$ is image domain, usually rectangular

Human vison

Ability to adapt to changing light intensities
Retina: Cones (color vision), rods (gray, high sensitivity)
Analog preprocessing (smoothing, edge detection)
Sharp image only in center, sign. reduces the amount of data before sending to brain
“digital”: spiking or not spiking, timing
Gabor filter: linear filter for edge detection
Sparse coding: over-complete code, only few are active for certain signal
lower cortical areas (primary visual cortex V1): neuros respond only to signals in very small local area
higher cortical areas: receptive fields larger (integration of information from lower areas)

Camera

Pinhole: Objects points $(X, Y, Z)$ are projected to $(x, y)$ by $x = f \frac{X}{Z}; y = f \frac{Y}{Z}$ . Problem: sharpness, light intensity. aperture (size of hole): Ratio
Optical: Light focusing, large aperture and sharpness possible. $\frac{1}{S_1} + \frac{1}{S_2} = \frac{1}{f}$ . Computervision practice: pinhole camera + correction of lens effects
effect of lenses is more important: shape from defocus, image analysis in microscopy

Sampling

Discretize image domain: $\{I_{ij}|i = 1, ..., N; j = 1, .., M\}$
Grid points: pixels
Grid: usually rectangular point grid, equal spacing $h$ (Grid size)
spacing is the same in all directions $h = h_x = h_y$
if true spacing not known: $h = 1$
Reduce no of pixels: downsampling. Increase: upsampling

Continuous vs discrete

Pros Continuous: scene is continuous, limiting case for successively finer grids, rotational invariance, exact length, subpixel accuracy
Pros Discrete models: sensor data discrete, model closer to implementation, efficient optimization possible

Aliasing and Moiré effect

function can be represented by its frequency components (Fourier)
Discrete signal can only repesent frequencies up to a certain limit (Nyquist frequency)
Ignorance of the Nyquist frequency leads to aliasing artifacs (straight line stepping)
Sampling of periodic signals is frequency modulation (multipl. of two periodic signals)
It can lead to Moiré effects (special aliasing artifact)
Videos: temporal aliasing

Nyquist-Shannon theorem

Input signal can be reconstructed from samples in a unique way if the sampling rate is at least two times the badwith of the input signal

two different frequencies may lead to the same discrete signal
reduction of bandwidth (max freq) can be achieved by smoothing
consequence: smooth your input image sufficiently before downsampling

Downsampling

Decanting operator: ensures minimum smoothing necessary to avoid aliasing
In case of overlap: intensity distribution to both cells according to overlap ratio
normalization of intensity value in low res image by downsampling factor
Operator is separable: can be applied sequentially along all axes

Upsampling

Bilinear interpolation: weighted avg of neighboring pixels
project fine grid point to available coarse grid
weighted avg along x and y average: $a_j = (1 - \alpha)I_{i,j} + \alpha I_{i+1,j}$ $a_{j+1} = (1 - \alpha)I_{i,j + 1} + \alpha I_{i+1,j+1}$ $I_{k,l} = (1 - \beta)a_j + \beta a_{j+1}$
general concept to retrieve vals at pts between grid pts
arbitrary dimensions

Quantization

Discretization of co-domain $\Re \rightarrow \{1, ..., N\}$
needed for float or integer rep
256 gray scales, 8 bit per px
human: only 40 gray scales, but thousands of colors
optimal quantization by clustering
assume: $I(x, y) \in \Re; 0 \leq I(x, y) \leq 255$

Noise

Additive noise: $I_{ij} = I^*_{ij} + n_{ij}$
Gaussian noise: $G_\sigma(x) = \frac{1}{\sqrt{2\pi\sigma^2}}exp(-\frac{(x-\mu)^2}{2\sigma^2})$
Multiplicative noise: $I_{ij}=I^*_{ij}(1 + n_{ij})$
- Trick: $log I_{ij} = log I^*_{ij} + log(1+ n_{ij})$ (additive noise).
- Backtransform: $I^*_{ij} = exp log I^*_{ij}$
Impulse noise: certain percentage of pixels is replaced by one or two fixed values (pixel defect)
- salt-and-pepper: some pixels replaced by white or black values
Uniform noise: certain percentage of px replaces by random variables

Signal-to-noise ratio (SNR)

Measure: $I$ versus $I^*$ (ground truth)
variance of image versus variance of noise
variance of image: $\sigma^2_I = \frac{1}{N}\sum_i(I^*_i - \mu)^2$
additive, zero-mean noise model: $I_i = I^*_i + n_i$
variance of noise $\sigma^2_n = \frac{1}{N}\sum_i(I^*_i - I_i)^2$
Signal-to-noise ratio $SNR = 10 {log}_{10}\frac{\sigma^2_I}{\sigma^2_n} = 10 {log}_{10}\frac{\sum_i(I^*_i - \mu)^2}{\sum_i(I^*_i - I_i)^2}$
Peak signal-to-noise ratio $PSNR = 10 {log}_{10}(\frac{N({max}_i I^*_i - {min}_i I^*_i)^2}{\sum_i(I^*_i - I_i)^2})$
Unit: decibel (dB). Higher the better

Point operations

treat each px independently $u_{ij} = f(I_{ij})$
example: change brightness ( $u(x, y) = I(x, y) + b$ ) (clipping needed)

Constrast enhancement

$u_{ij} = \alpha I(x,y)$

Gamma correction

camera chips different response curves $I \infty I^\gamma$
correction: $f(I(x, y)) = I_{max}(\frac{I(x, y)}{I_{max}})^\frac{1}{\gamma}$ , $\gamma > 0$
range $[0, I_{max}]$ not affected
Dark areas become brighter without leading to oversaturation in the brighter areas

Gray value histogram

number of pixels that contain certain gray value
histogram equalization: transform such that all gray values are equally frequent

Difference image

Gray: $I_\Delta = |I_1 - I_2|$
Color: $I_\Delta = \frac{1}{3} \sum_{k=1}^{3} |I_{k, 1} - I_{k, 2}|$

Linear filters

improve singal by taking neighboring points into account
filters defined in the Fourier domain (global Frequency analysis on all pixels)
convolution theorem: Translate filtering in the Fourier domain to the spatial domain: $F(f) \cdot F(h) = F(f * h)$
filters usually defined in spatial domain $h$ instead of $F(h)$ : more interested in local analysis, easier to generalize to nonlinear filters, better at boundaries

Convolution

With static filter $h$ : linear operation. $(f * h)(x) = \int h(-x')f(x + x')dx'$
in 2D: $(I * h)(x, y) = \int h(-x', -y')I(x + x', y + y')dx'dy'$
properies:
- linearity: $(\alpha f + \beta g) * h = \alpha (f * h) + \beta (g * h)$
- shift invariance: $f(x) * g(x + \delta) = (f * g)(x + \delta)$
- commutativity $f * g = g * f$
- associativity $(f * g) * h = f * (g * h)$
- correlation $f(x) * h(x) = \int h(x')f(x + x')dx'$

Discrete convolution

1D: $(f * h)_i = \sum^n_{k=1} f_k h_{i-k}$
2D: $(f * h)_{i,j} = \sum_{n, m}{k=1,l=1} f_{k, l} h_{i-k, j-l}$
Separability: A filter is separable if $h(x, y) = \delta(x, y) * h_1(x, \cdot) * h_2(\cdot, y)$ ( $\delta$ is Dirac distribution. $\delta(x) = \lbrace^{\infty; x = 0}_{0; else}$ and $\int \delta(x) dx = 1$
Separable filters can be applied succesively as 1D convolution, reduces runtime complexity from $O(NMnm)$ to $O(NM(n+m))$

Gauss filter

Kernel: $G_\sigma(x) = \frac{1}{\sqrt{2\pi\sigma^2}}exp(-\frac{x^2}{2\sigma^2})$
separable: $I'(x, y) = I(x, y) * G(x,\cdot), * G(\cdot, y)$
discrete stencil derived by sampling at $3\sigma$ interval
Runtime complexity $O(NM\sigma)$ (Fourier domain $O(NM log(NM))$

Boundary conditions

Boundary of $\Omega$ is $\delta \Omega$
Dirichlet boundary conditions $I(x, y) = 0 \space \forall (x, y) \in \delta \Omega$
Homogeneous Neumann boundary conditions: $\frac{\delta}{\delta n}I(x, y) = 0 \space \forall (x, y) \in \delta \Omega$ with $n$ as outer normal vector and $\frac{\delta I}{\delta n} = n^T\Delta I$ . (mirror the image at boundary). Preferred method (preserves average intensity)

Box filter

alternative to Gauss filter
Kernel $B_\sigma(x) = \lbrace^{\frac{1}{2\sigma}\space |x| < \sigma}_{0 \space {else}}$
simple averaging of neighboring values
properties:
- separable
- not rotationally invariant
- much more efficient $O(NM)$
result not as smooth
successive convolution of box filter with itself yields filters that resemble the Gauss filter
smoothing result of filter of order $k$ is $k$ times differentiable

Recursive filter (Deriche filter)

(fast) alternative to Gauss filter
Idea: recursively propagate information in both directions of the signal
$\alpha$ smoothness parameter
approximates Gaussian convolution
relation between $\sigma$ and $\alpha$ : $\alpha \sigma = \frac{5}{2 \sqrt{\pi}}$
properties:
- separable
- rotationally invariant
- $O(NM)$
- hard/impossible to implement Neumann boundary conditions

Derivative filters

measuring the local change of intensities
continuous functions: given by derivate $\frac{df}{dx}$
multidimensional functions: $\Delta I = (\frac{\delta I}{\delta x}, \frac{\delta I}{\delta y})^T$ (gradient)
- alternative notation: $\Delta I = (I_x, I_y)^T$
image must be differentiable

Gaussian derivative

ensure differentiability by combining with small amount of Gauss smoothing (Gaussian derivatives)
Convolution and derivatives are linear operations (Applying derivative to image or to Gaussian does not matter)
- $\frac{\delta}{\delta x}(I(x) * G_\sigma(x)) = I(x) * \frac{\delta}{\delta x} G_\sigma(x)$
arbitrarily high order derivatives exist
sample derivative of gaussian filter for implementation
common mask for first derivative is $(-\frac{1}{2}, 0, \frac{1}{2})$ (central difference)

Higher order derivatives

higher derivatives than first derivative
eg. Laplace filter: Zero-crossings of this filter correspond to image edges
- $\Delta I = (\frac{\delta^2 I}{\delta^2 x}, \frac{\delta^2 I}{\delta^2 y})^T$
Another way to detect edges is by the gradient magnitude
- $|\Delta I| = \sqrt{I^2_x + I^2_y}$

Energy Minimization

formalize model assumptions and cast task as $E(X) = A_1(x) + ... + A_n(x)$ (Energy/Loss function)
solve $x* = {argmin}_x E(x)$ with $u \in \Re^N$

Example: image denoising

model assumptions: similar to input image, result should be smooth
formalize:
- similar: $E_D(u_{i,j}) = \sum_{i,j} (u_{i,j} - I_{i,j})^2$
- smooth: $E_S(u_{i, j}) = \sum_{i,j} (u_{i, j+1} - u_{i, j})^2 + (u_{i+1,j} - u_{i, j})^2$
problem to solve: $u^*_{i,j} = {argmin} (E_D(u_{i,j}) + \alpha E_S(u_{i, j}))$ ( $\alpha$ is smoothening weight factor)

Advantages

transparency
optimality
theoretical aspects can be analysed
- Existence and uniqueness of solutions
- Stability of solutions with respect of the input data
- Difficulty of the problem class
fewer parameters than in heuristic multi-step methods
combination of approaches is easy

Problems

Formalizing is ad-hoc (use statistical interpretations)
choosing weighting parameter(s) not easy (can be learned)
global optimization is hard

Example: image denoising

necessary condition: $\frac{\delta E}{\delta u} = 0 \Leftrightarrow \frac{\delta E}{\delta u_{i, j}} = 0$
compute $\frac{\delta E}{\delta u_{i, j}}$
write as system of linear equations (matrix, $N \times N$ for $N$ pixels, symmetric and positive)
contains one main diagonal (central pixels) and for off-diagonals (for each of the four neighbors of a pixel)
matrix system is a sparse (almost all entries are 0)
Positive definite, the inverse $A^{-1}$ exists and we can solve for $u$

Convexity

Convex
- Positive curvature
- No local minima
- Global minimum is unique
- Minimization by setting the derivative to 0
Non-convex functions
- many local minima and maxima
- Global minimum may not be unique
- Global minimization is usually impossible, only heuristics exist

A functions is convex if

$f((1 - \alpha)x_1 + \alpha x_2) \leq (1 - \alpha)f(x_1) + \alpha f(x_2)$

A functions is strictly convex if

$f((1 - \alpha)x_1 + \alpha x_2) < (1 - \alpha)f(x_1) + \alpha f(x_2)$

every combination of (strictly) convex functions is again (strictly) convex

Jacobi method

iterative solver, preserves sparsity of the matrix
Converges if the matrix is strictly diagonal dominant $|a_{ii}| > \sum_{i \neq j} |a_{i,j}| \space \forall i$
decompose matrix into $A = D + M$ ( $D$ diagonal part, $M$ is off-diagonal part)
for linear system: $Ax = b \Leftrightarrow (D + M)x = b \Leftrightarrow Dx = b - Mx$
$D^{-1}$ : replace diagonal elements with inverse
compute solution $x$ iteratively, start at any $x_0$ : $x_{k+1} = D^{-1}(b - M_{k})$
iterate until $r_k = Ax_k - b$ (residual) is smaller than a threshold or the change in $(x_{k+1} - x_{k})^2$ becomes small. If that is 0, the iterate has converged.
properties:
- simple, paralelizable, few requirements
- slow
- convergence only for $k \leftarrow \infty$
- computation not in place

Gauß-Seidel method

$Ax = b \Leftrightarrow Dx = b - Lx - Ux$ : split the off-diagonal part $M$ into the lower triangle $L$ and the upper triangle $U$
iterate, traverse vector $x$ from top to bottom and use new values for multiplication with lower triangle $x_{k+1} = D^{-1} (b - Lx_{k+1} Ux_{k})$
Converges if $A$ is positive or negative definite
properties:
- in place computation
- recursive propagation of information: faster

Successive over-relaxation (SOR)

emphasize Gauß-Seidel by over-relaxing solution $x_{k+1} = (1 - \omega)x_k + \omega D^{-1} (b - Lx_{k+1} Ux_{k})$ ( $\omega = 1$ is Gauß-Seidel)
Converges for positive- or negative definite matrices (all eigenvalues positive or negative), if $\omega \in (0, 2)$
Over-relaxation for $\omega > 1$ faster convergence
Under-relaxation for $\omega < 1$ can help establish convergence in case of divergent iterative processes
optimal $\omega$ must be determined empirically

Conjugate gradient (CG)

two non-zero vectors $u$ and $v$ are **conjugate* with respect to $A$ if the inner product $<u, v>_A := u^T A v = 0$ . (vectors orthogonal with respect to special scalar product)
set of $n$ conjugate vectors $\{p_k\}$ form basis of $\Re^n$ , so solution $x^*$ of $Ax = b$ can be expanded as $x^* = \alpha_1 p_1 + ... + \alpha_n p_n$
coefficients $\alpha_k$ are derived
after $n$ computations we obtain exact solution $x^*$
Good choice of $\{p_k\}$ : few coefficients approximate solution well
Start at some initial point $x_0$
Let $p_0$ be residual $r_0 = b - Ax_0$ . This is gradient of $E(x) = \frac{1}{2}x^TAx - b^T x$ the minimizer of which is $x^*$ . (hence the name CG)
Iteratively compute parameters
Stop when residual is small. Guaranteed solution after $n$ iterations
matrix $A$ : must be symmetric ans positive definite
computing exact solution is not an option as $n$ is the number of pixels
number of iterations to get good solution depends on the condition number of A (largest vs smallest eigenvalue). The same holds for other iterative methods
preconditioners $P^{-1}$ are used to have small condition for $P^{-1}A$ $Ax = b \Leftrightarrow P^{-1}Ax = P^{-1}b$
Most simple preconditioner: Jacobi preconditioner

Multigrid methods

previous linear solvers: only act locally
- due to sparsity of matrix, distributes information at a pixel to four neighbors
idea of multigrid solvers: shorten distances by using coarser point of view
additional effect: fewer entries, iterations are faster at coarse levels

Unidirectional (cascadic) multigrid

downsample image, create linear system from this
compute approx solution at the coarsse grid (e.g. SOR)
take upsampled result as init guess for next finer grid
Refine (e.g. SOR)
properties:
- iterations at coarse level fast
- simple implementation
- coarse levels do not approximate original system well

Correcting (bidirectional) multi-grid

do not downsample image but error
compute first solution at fine grid
correct error at coarse grid
refine result at finer grid

Full multigrid

combine cascadig and correcting multigrid
start at coarse grid with downsampled image
at each finer level, apply a W-cycle

Integer problems

considering problems with continuous variables
segmentation and matching problems usually lead to integer problems
these are combinatorial problems, few are solvable in polynomial time
typical problems: linear programs, integer quadratic programs, second oder cone programs

Variational Methods

Continuous energies

continuous formulation: $u^*(x) = {argmin}_{u(x)}E(u(x))$ with $u(x) : \Omega \rightarrow \Re$
energy functional, optimization based on calculus of variation
Necessary condition(s) for a minimum: Euler-Lagrange equations
advantage: no ensurance that discrete energy is consistent with continuous one (eg. measuring line lengths)
disadvantage: gradients in direction of a function have to be computed

Consistency

dicretization of continuous quantities can lead to artifacts if done wrong
solution of problem should be independent of rotation (rotation invariance)
A discretization is called consistent if it converges to the continuous model with finer grid sizes
Example: discrete approximation of gradient magnitude: errors must depend on grid size

Calculus of variation

Calculus of variation and Gâteaux derivative to compute gradient of a functional $\frac{\delta E(u(x))}{\delta u(x)}$ with respect to a function $u(x)$

Gâteaux derivative

functional maps each element $u$ of a vector space (e.g. a function) to a scalar $E(u)$
the Gâteaux derivative generalizes the directional derivative to infinite dimensional vector spaces: $\frac{\delta E(u(x))}{\delta u(x)} \Big|_h = lim_{\epsilon \rightarrow 0} \frac{E(u + \epsilon h) - E(u)}{\epsilon} = \frac{dE(u + \epsilon h)}{d \epsilon}\Big|_{\epsilon}$
can be interpreted as projection of the gradient to the direction $h$ : $\frac{\delta E(u(x))}{\delta u(x)} \Big|_h = \Bigg\langle\frac{dE(u)}{du}, h \Bigg\rangle = \int \frac{dE(u)}{du}(x)h(x)dx$
gradient $\frac{dE(u)}{du}(x)$ is needed to minimize $E(u)$

Denoising example

$E(u(x)) = \int_\Omega (u(x) - I(x))^2 + \alpha|\nabla u(x)|^2 dx$

rather than on a vector, optimization problem is on a function
short writing: dependency of functions on $x$ is clear: $E(u) = \int_\Omega (u - I)^2 + \alpha|\nabla u|^2 dx$
condition for minimum: $\frac{\delta E(u)}{\delta u} \Big|_h = 0 \qquad \forall h$
compute the Gâteaux derivative and simplify
- Add $+ \epsilon h$ to all $u$
- derive to $\epsilon$
partial integration to turn $h_x$ and $h_y$ into $h$
directional derivative must be $0$ for all directions
yields two conditions
- gradient must be zero: Euler-Lagrange equation set to $0$
- Boundary conditions must be $0$ (natural boundary conditions, coincide with Neumann boundary conditions)

Implementation

Discretization of $\frac{dE(u)}{du} = (u - I) - \alpha (u_{xx} + u_{yy}) = 0$ leads to a linear system of equations $\frac{\delta E}{\delta u_{i, j}} = (u_{i, j} - I_{i, j}) - \alpha (u_{i-1,j} + u_{i,j-1} - 4 u_{i, j} + u_{i+1, j} + u_{i, j+1}) = 0$
solve for minimizing discrete engery: same linear solvers can be applied
different discretization stages do not always lead to same algorithm

Discontinuity preserving regularization

A quadratic regularizer leads to blurred edges
not smooth everywhere (outliners at edges)
Can be done by using non-quadratic regularizers
statistical interpretation of regularization

Bayesian approach

Energy minimization can also emerge from probablistic approach taking into account likelihoods and prior probabilities
Bayes formula: $p(u|I) = \frac{p(I|u)p(u)}{p(I)}$
Find solution $u$ that maximizes $p(u|I)$ (maximum a-posteriori (MAP) approach)
Marginal does not depend on $u$ , can be ignored in maximization

Energy minimization

MAP estimation: $p(u|I) \propto p(I|u)p(u) \rightarrow \max$
Noise model: independently and identically distributed Gaussian noise $p(I|u) \propto \prod_{x \in \Omega} exp(-(u(x) - I(x))^2)$
A-priori model: gradient is likely to small (smoothness). Gaussian noise with varians $\frac{1}{2\alpha}$ on gradient $p(u) \propto \prod_{x \in \Omega} exp(-\frac{|\nabla u(x)|^2}{\frac{1}{\alpha}})$
turn maximization into minimization of the negative logarithm (is monotonous): energy minimization problem: $-\log p(I|u) - \log p(u) = \int_{\Omega} (u-I)^2 + \alpha|\nabla u|^2 dx \rightarrow \min$

Interpretation

Other noise models lead to different penalizers in energry function
Expecting outliners in the smoothness assumption, replace Gaussian noise model with Laplace distribution
Leadsto energy function that preserves image edges
Tikhonov regularizer $\Psi (s^2) = s^2$

Gradient descent

Euler-Lagrange equation is nonlinear in the unknowns
nonlinear system of equations
gradient descent
Iterative
- start with initial value
- iteratively move in direction of largest decrease in energy (negative gradient direction)
converges to next local minimum
Linear combination is convex, Euler-Lagrange is convex: Gradient descent will find a global optimum
slow convergence

Faster: lagged diffusivity

keep nonlinear prefactor fixed (regain linear system) and compute updates iteratively
proven to converge if linear system in each step is solved exactly
only few iterations, much faster than gradient descent

Motion estimation

Given a sequence of images $I(x, y, t)$ , what is the motion of each pixel between subsequent frames
Vector field $(u, v)(x, y, t)$ that transforms the second image into the first one
optical flow: move each point $(x, y)$ at time $t$ , such that it fithe the point at time $t + 1$
Can be used for detection or segmentation of objects motion segmentation
scene flow: estimate 3D motion of objects
structure from motion: estimate 3D structure
apparent motion in image plane optic flow, differs from true motion
Ambiguities: not from image data alone, any black pixel could be moved to any black pixel
- assumptions are needed for unique soliton. Neighboorhood (aperture) determines the solution

Optic flow constraint

Gray value of moving pixel stays constant: $I(x + u, y + v, t + 1) - I(x, y, t) = 0$
Constraint nonlinear in unknown flow: difficult estimation
Linearize with Taylor expansion: $I(x + u, y + v, t + 1) = I(x, y, t) + I_xu + I_yv + I_t + O(u^2, v^2)$
Linearized optic flow constraint: $I_xu + I_yv + I_t = 0$
only sufficiently precise ifimages are smooth and flow is small

Lucas-Kanade method

regarding a single pixel, we cannot uniquely determine its flow vector
we need a second assumption to obtain a unique solution (two unknowns, one equation)
assume that motion is constant within a local neighborhood
results in multiple equations for the two unknowns $I_x(x', y', t)u + I_y(x',y', t) + I_t(x',y',t) = 0 \qquad \forall x',y' \in \Re(x, y)$
over-determined system: not a unique solution anymore
we seek a solution that minimizes the total squared error $\text{argmin}_{u, v} \sum_{x, y \in \Re}(I_x(x, y)u + I_y(x, y)v + I_t(x, y))^2$
condition for minimum is derivative to $u$ and $v$ are both zero
linear system at each pixel: ${\sum I^2_x \qquad \sum I_xI_y \choose \sum I_xI_y \qquad \sum I^2_y} {u \choose v} = {- \sum I_xI_t \choose - \sum I_y I_t}$
instead of uniformly weighted, box shared window, we can also use a gaussian window
weighted sums by window come to a convolution
unique solution only if system is not singular
- local neighborhood must contain non-parallel gradients
- otherwise aperture problem still present
assumption of locally constant motion not realistic (boundaries)
there are no direct constraints on the smoothness of the flow field
advantages: fast and simple
local method: point estimates are computed independently

Global optic flow estimation

global dependency between points due to global smoothness assumption
global solution derived with variational methods
basic model: Horn-Schunck method
can be further extended to yield highly accurate estimates in many practical situations

Horn-Schnuck model

assumes gray value constancy too
global smoothness of the flow field: $|\nabla u|^2 + |\nabla v|^2 \rightarrow \min$
can be combined to energy functional, sought optical flow is minimizer of this energy
energy is convex: we get a unique global optimum
Use Gâteaux derivatives to obtain Euler-Lagrange equations
discretized versions return linear equation system to solve, can be written in matrix-vector notation
matrix is sparse with only few diagonals different from zero
blocks structure (data term and smoothness term) due to coupling of $u$ and $v$
can be solved with linear solvers (Gauss-Seidel, SOR, Multigrid)

Problems

illumination changes will “distract”
motion discontinuities are blurred
underestimation of large displacements
outliers due to non-Gaussian noise or occlusion

Average angular error

accuracy of optical flow: average angular error
requires ground truth
measures the 3D angle between correct and estimated flow vectors
also captures errors in length of the 2D flow vector

Motion discontinuities and occlusions

consider a robust function applied to the smoothness term to allow for motion discontinuites
occlusions can be approached by applying a robust function to the data term
nonlinear system can be solved with lagged nonlinearity

Illumination changes

gray value constancy assumtion not fulfilled due to illumination changes
remedy: matching criterion that is invariant to illumination changes
the gradient is invariant to additive changes, use that as alternative constraint

Large motion

linearization of constancy assumtions is only valid for small displacements (subpixel)
to tackle larger displacements, we must refrain from such a linearization
difficulties: non-convex, Euler-Lagrange equations get highly nonlinear
can be handled with Gauss-Newton method and coarse-to-fine approach

Segmentation and Grouping

Segmentation

Partition of the image domain $\Omega$ into serveral (usually disjoint) regions $\Omega_i$
special case: two-region segmentation (foreground-background)
difference to edge detection: required closed contours
edge detection is party of solution as it provides pieces of potential contours
exponentially many possibilities
hierarchical decomposition ideal (object segmentation)
impossible without prior knowledge
can be seen as a grouping process combining pixels superpixels

Feature space

intensity, color, texture, motion, disparity, depth
first-order features vs second-order features
first-order: provided by sensor
second-order: derived from data

Edge-based vs region-based methods

edge-based techniques employ an edge indicator or pairwise similarities between pixels. Fit closed contours such that the contour coincides with strong edges (low similarities)
region-based techniques employ a statistical model of each region. Regions should be as homogeneous as possible. Include pixels in different regions that are maximally different
region-based techniques based on local statistics that approach edge-based techniques in the limit

Thresholding

must simple way to come to a segmentation
convert intensity image into a binary image to label the two regions
can be generalized to $N$ regions by introducing $N - 1$ thresholds
Problems
- no threshold that separates the objects
- context is ignored
region-based with very simple region model

Clustering

segmentation is similar to clustering: assign pixels to regions
example: k-means clustering
- initialize: all pixels belong to random region
- mean feature vector for each region
- move pixel to another region if this decreases the total distance
- iterate until pixels do not move any longer
- $K = 2$ , k-means is like thresholding with an automatically determined threshold
clustering ignore spatial context: only features determine assignment, not the position
adding pixel coordinates to feature vector is not equivalent to enforcing smooth region contours

Greedy heuristics

Region growing: seed points are initial regions
- for each point in boundary: if a neighbor is similar enough, it is assigned to the region
- grow until there are no more similar pixels along the boundary
Region merging: initially all pixels are their own region
- two most similar regions are successively merged into a larger region
- repeat until similarity threshold or a given number of regions is reached
some dissimilarity criteria
- euclidean distance of the features means: $d^2(\Re_1, \Re_2) = (\mu_1, \mu_2)^2$
- mean euclidean distance along common boundary

Watershed segmentation

Illustrative description: regard the image gradient magnitude as mountains and let it rain
- water flows downhil and gathers in catchment basins: the regions
regions meet a the watersheds (edges)
tiny gradient fluctuations result in over-segmentations (presmoothen image!)

Energy minimization: snakes model

use energy functional $E(C) = - \int_0^1 |\nabla I(C(s))|^2 ds + \alpha \int_0^1 |C_s(s)|^2 ds$
$C : [0, 1] \rightarrow \Omega$ is parametric contour. $C_s$ denotes first derivative
first term: external energy (depends on input image)
second term: internal energy (independent of model and data)
minimizing external enery drives the contour to follow maxima of the gradient
consider all possible contours and choose the one that best captures the maxima
external energy alone would lead to a fractal contours with infinite length
avoided by internal energy, which penalized the length of the contour
together: prefer a compromise of a short contour which captures as much image gradient as possible
with calculus of variation and gradient descent: local minima

Implicit representation of contours

incidcation functor $\phi : \Omega \rightarrow [-1, 1]$
zero-level line represent contour $C = {x \in \Omega | \phi(x) = 0}$
for evolving $C$ evolve $\phi$
allows topological changes, can be applied in any dimension
represents contour and enclosed region
level set

Region-based active contours

energy minimization based on regions statistics
states the optimal separation of pixel intensities
similar to k-means clustering (two-means) but with an addition constraint on the length of the separation contour
express using an implicit representation of the contour, can be found analytically as mean intensities inside the two regions

Graph cuts for segmentation

combinatorial optimization
structure for min-cut
- each pixel represented by node, connected to neighbors via edges
- neighborhoods can have different complexity (most simple: 4-neighborhood)
- two extra nodes (source and target) connect to all pixel nodes
goal: find minimum cut through graph that separates source and target
source and target nodes correspond to regional models
connection between neighbors enforces a regular labeling
can be formulated as energy problem
two terms:
- first term: comprises the links to the source and target nodes
- second term: comprises the n-links between neighboring pixels. Usually fixed weights, but can depend on image gradient
for fixed means $\mu_1, \mu_2$ it can be solved in polynomial (avg. case: linear) time

Semantic segmentation

run classifier which yields a score $S_i(x)$ for each class $i$
minimize with level sets or graph cuts

Interest points and local descriptors

matching of local structures

Block matching

straightforward way to match point in images
- regard the image patch around each point in image 1
- compare it to image patches around all point in image 2
Expensive $O(kN^2)$ with $k$ size of patch and $N$ pixels in image
not invariant to typical appearance changes

Interest points

only limited number of matches needed
idea: do not match all points, but only promising subsets
requirements
- point must come with enough information for unique matching
- subset in image 2 must contain matches from subset in image 1

Corner detection

choose points with high information content and clear localization (corner points)
with structure tensor, measure cornerness (fast to compute)
eigenvalue decompositon of structure tensor
smoothing integrates gradients from the neighborhood
eigenvectors yield the dominant orientation in this neighborhood and perpendicular orientation
eigenvalues yield the structure magnitude in these directions
a large second eigenvalue indicates strong structures in multiple directions (corners)
corner: local maxima of second eigenvalue
problem: depends on image scale

Characteristic scale

consider a Gaussian pyramid of smoothend images
characteristic scale can be computed based on the Laplacian
yields an estimate of scale shift between two images
uniqueness is not ensures
- may be multiple local maxima
- even global maximum need not be unique
maximum operator is not robust, little noise can lead to different outcome

Scale invariant feature transform (SIFT)

alternative to Harris-Laplace detector
considers local maxima of Laplacian scale space
advantage: does not mix apples and oranges (corner detector and Laplacian)
Laplacian focuses on blobs rather than corners (complementary information, one might be interested in both)

Block matching at interest points

positive issue of interest points
- significantly reduced complexity
- with 100 detected points in both images, one has to compare only 10000 patches instead of 96 billion in a 640x480 image
negative issues
- non-dense displacement fields (important matches might be missed)
- corresponding patches can be slightly shifted
further problems (independent of interest points)
- patches in both images may look very different due to rotation, transformations, lighting changes, blurring
- subpixel accuracy not available

Invariant: local descriptors

local descriptors are vectors that contain information about the local neighborhood of an interest point
simplest local descriptor: block of certain size centered at interest point
goal: design local descriptors that are invariant under the mentoined transformations

scale invariance

use characteristic scale from interest point detection
choose and normalize the size of the blocks: structures have same scale in both images

affine invariance

affine tranformation: $f(x) = Ax + t$
maps a circle to an ellipse (or vice-versa)
approximation of a projective transformation
parameters can be estimated (region detector, maximally stable extremal regions)

affine region detector

maximally stable extermal regions
- encircled by large gradients
- obtained by watershed-like algorithm
apart from scale also yields elongation of fitted ellipses
- affine invariance

histograms

alternative to normalized neighborhood: derive invariant features within the fixed block
gray value histogram: rotational invariance, invariant to blurring
- sensitive to lighting changes (bad)
- loss of information (very bad)
histogram of gradient direction orientational histograms
- invariant to (additive) lighting changes
- building block of many successful descriptors

SIFT/HOG descriptor

based on local assembly of orientation histograms and adaptive local neighborhoods
extract SIFT feature points
- strongest responses of Laplacian scale space
- fit quadratic function to obtain subpixel accuracy
create orientation histogram at selected scale
- peak of smoothed histogram estimated orientation
- in case of two peaks, create two feature points
estimation of position, scale and orientation
affine invariance provided with MSER
object recognition: dense sampling of such points at all position, scales, no rotation invariance

Algorithm

compute gradient orientation and magnitude at each pixel
compute orientation indicator at each pixel
- create $N \times M \times 8$ array and init with zero
- quantize the orientation at each pixel and add respective magnitude to the respective entry in array
local integration results in orientation histogram (smooth array with gaussian kernel)
smooth in orientation direction
sample feature vectors from the image
normalize the feature vector

Shape context

descriptor for matching edge maps
insensitive to small deformations and spurious edges

Descriptors learned with convolutional networks

CNNs are trained on large datasets with class labels
- network learns a representation that is good for object classification
intermediate layer outputs turn out to be good generic descriptors

Unsupervised training to trigger invariant features

CNN to discriminate surrogate classes
applied transformations: translation, rotation, scaling, color, contrast, brightness, blur
transformations define invariance properties of the features to be learned

Shapes from X

reconstruct the 3D structure of a scene
missing cue is depth of an observed point
any point along the projection ray that passes the camera center and the image point could have caused the observation
thus from position of a single image point we cannot determine the coordinates of the corresponding 3D point
many ways (hence shape from X)
prominent: reconstruction from binocular images
center of second camera must not coincide with the one of the first camera

Stereo reconstruction

need two corresponding image points (disparity estimation)
reconstruct the two projection rays, crossing is sough 3D point
- projection properties of camera needed
- comprised in the projection matrix $P$ (camera calibration)

Multiview

add further cameras
geometrically, only two cameras needed
practice
- find corresponding points
- no correspondence for occluded points
- accuracy is limited by pixel grid
more cameras decrease the noise in correspondences and total accuracy of depth estimates is increased
drawback: multiple cameras are more difficult to calibrate

Structure from motion

single moving camera observes static 3D point
advantages
- plenty of images from single camera
- reasonable frame rate: displacements are small
problems
- not fixed calibrated camera setup. Estimate camera motion (egomotion) together with structure
- scene must be static, otherwise ambiguities
sometimes the camera motion is approximately known (other sensors!)

Decomposition of the projection matrix

factorize projection matrix $P = KM$
camera calibration matrix, contains camera’s internal paramters $\begin{bmatrix}\alpha_x & s & x_0 \\ 0 & \alpha_y & y_0 \\ 0 & 0 & 1 \end{bmatrix}$
pose of the camera relative to a world coordinate system $M = (R|t)$ : rotation matrix and a translation vector with 3 degrees of freedom (external parameters)

How it works

camera’s internal parameters are calibrated
calibrate external parameters online (can be estimated from point correspondences)
internal parameters now stay fixed
in cause of autofocus, this is not the case. Internal parameters can also be estimated online (autocalibration), but decreases the robustness of estimation
when $P = KM$ (fully calibrated), we can estimate the 3D structure

loop closing

robotics: self-localization and mapping (SLAM)
localization errors accumulate over time
when camera returns to earlier position and if localization errors are small enough, the image can be matched to earlier image (loop closing)
special case of object recognition (instance recognition)
loop closing is the only way to correct accumulated errors (drift)

Shape from silhouette

correspondences: established for points whose projection rays hit surface in nearly orthogonal direction
if projection is tangential to surface: structures on surface are severely deformed (no robust matching)
shape form silhouette is kind of contrary approach: shape is inferred from points where the surface is tangential
silhouette: restricts the volume that can be occupied by the observed object
points along projection rays that see background cannot belong to the object volume
points along projection rays that see foreground may or may not belong to object volume
more cameras: volume can be occupied is successively reduces
remaining volume is the maximal volume that is consistent with all silhouettes (visual hull)
helps find good initializations for other reconstruction methods (e.g. stereo)

as 3D segmentation

find the silhouette in the image (segmentation) and reconstruction the surface can also be considered as single 3D segmentation problem
task: find a surface that consistently separates all images into a foreground and a background region

shape from shading

given a single image, compute shape, albedo, scattering, illumination
inverse rendering: reconstruct from the observed image
severely under-constrained problem

Bidirectional reflectance distribution function (BRDF)

single light source and surface element
direction of light in polar coordinates $(\theta, \phi)$ and incoming energy $E(\theta, \phi)$
light emitted by surface in directing $(\theta_e, \phi_e)$ given by $L(\theta_e, \phi_e)$
function describing reflectance properties of material called bidirectional reflectance distribution function (BRDF): $L(\theta_e, \phi_e) = f(\theta, \phi, \theta_e, \phi_e)E(\theta, \phi)$
more detailed models BRDF can have more than four params

Lambertian reflectance

lambertian surface emits incoming light uniformly in all directions, radiance only depends on angle between surface normal $n$ and light source direction $s$ : $I(x, y) = R(X, Y, Z) = \rho s^T n$
rough materials (stone, textiles, …)
most of these materials absorb some incoming light. Non-absorbed fraction is called albedo $\rho$
model often used in computer vision (stereo matching)
constance $\rho$ and given light source direction $s$ leads to basic shape form shading

Horn

sought depth map $Z(x, y)$ can be related to surface normal direction $n$ . Estimate $n$
derivatives of surface $S(x, y) = (x, y, Z(x, y))$ with respect to $x$ and $y$ yield two vector fields that span the tangential plane $\delta_xS = \begin{bmatrix}1\\0\\p\end{bmatrix} \qquad \delta_yS = \begin{bmatrix}0\\1\\q\end{bmatrix}$
cross product of two vector fields yields the normal field $n = \frac{1}{\sqrt{p^2+q^2+1}} \begin{bmatrix}-p\\-q\\1\end{bmatrix}$
plug into Lamertian radiance function: like in optical flow: one equation, two unknowns

Ikeuchi-Horn

assuming smooth surface, leads to variational approach
derive Euler-Lagrange equations, minimizer for $p$ and $q$ can be found by gradient descent
Energy is non-convex and can have multiple local minima. Coarse-to-fine methods can be used as heuristic to find global minimum

SIRFS: Shape, Illumination and Reflectance from Shading

recent framework by Barron and Malik considers more components of inverse rendering
includes multiple sources of prior knowledge
priors learned from training data
- reflectance is mostly smooth and histogram is sparse
- shape is mostly smooth
- isotropy of shapes and the fact that an observed surface is likely to point towards observer
- natural lightning prior

Shape from defocus

series of image with focus set to different depths
estimate the relative blur $d \sigma$ needed to match the other image
amount of blur needed determines the depth

Shape from texture

if appearance of texture is known (regular, repetitive pattern)
usually: texture not known. Practical assumptions are that the texture is homogenous, isotropic and stationary
from repetitive nature of texture elements under these conditions we can estimate the non-deformed shape of such an element and the surface that causes its deformation

Object Recognition and deep learning

instance recognition seeks to detect the presence of known object in new image
should also be localized in image (detection)
major problem: object might look quite difference (viepoint, lighting, occlusions, …)
important difference to object class recognition or object class localization: the same instance of an object class is seen in the two images
there can be large differences between two instances of an object class (eg. two dogs)

Classification problem

two main parts: set of features that describe the object and a classifier that separates objects
classical approach: use a handcrafted feature representation and learn the classifier
deep learning: learn both
typical classifiers: support vector-matchine (two-class), logistic regression (multi-class, used in deep networks)
nearest neighbor classifier (multi-class)

Support vector machine

Basic idea: learn a decision function with maximum margin (distance to most critical training points)
decision function modeled as linear combination of features $y(x) = w^T \phi(x) + b$
large marign concept leads to a convex optimization problem: $\text{argmin}_{w,b} \frac{1}{2} ||w||^2$ subject to $t_n(w^T \phi(x_n) + b) \geq 1$
efficient code publicly available

Nearest neighbor classifier

assign the class label of most similar training point
advantages
- simple concept
- works for multiple classes
drawbacks
- all training samples must be stored (no generalizing)
- search form most similar training sample consumes much time

Feature representation

color
local descriptors (e.g. SIFT)
- discriminative properties good
- easy to extract from input images
- describe the object locally (robust to occlusions, local variation)
- fixed level of abstraction
deep representations
- multiple levels of abstraction
- learned from training data

Invariance requirements in object recognition

object should be recognized even if appearance has changed due to typical transformations
descriptors and classifiers are called invariant with respect to transformation
tradeoff between invariance and discriminative power of descriptor
instance recognition: invariance needed with respect to
- viewpoint
- background
- lighting
- partial occlusion
class recognition: needs complex invariant features learned from training examples

Some popular local descriptors

distortion corrected patches
SIFT/HOG (orientation histograms)
Shape context

Feature learning

instead of manual descriptor design, let computer find optimal descriptor for a task from a training set
task: object classification
- training set consists of images and their class labels
- $\text{Image} \rightarrow f(I) \rightarrow \text{features} \rightarrow C(f(I)) \rightarrow \text{Class}$
shallow modeling of function $f(I)$ is not efficient to cover all the variation that appears in an object class

deep network for image classification

each layer produces more abstract representation of the input
layers are similar in structure
weights of connections between layers (filters) learned from training data
fully connected layer
- 200x200 image: 40k hidden units, 2 billion parameters
- spatial correlation is local
- waste of resources, not enough training samples
locally connected layer
- 200x200 images, 40k hidden units, Filter size: 10x10, 4M parameters
- parametrization is good when input image is registered (face recognition)
- stationary: statistics is similar at different location
convolutional layer
- share the same parameters across different location: convolutions with learned kernels
- single layer: max-pooling replaces small local area of feature map by maximum value in that area (downsampling)
  - data reduction, loss of localization accuracy
  - allows for larger receptive fields in next layer
  - ReLU nonlinearity sets negative values to $0$ (no activation)

Large ConvNet

up to 30 layers
cost function for training the weights, classification error on training set $w^* = \text{argmin}_w \sum_i y_i \log(f_w(x_i))$
optimization by gradient descent (back-propagation)

Back-propagation

initialize the weights
Compute $f_w(x_i)$ by forward-propagating a sample though the nextowkr
gradient with regard to a weight in a certain layer: use chain rule
update the weights

Nice properties

filters of different layers capture different scales due to max-pooling
- filters at low layers are well localized and consider only small image area (small receptive field)
- filters at high layers are badly localized and capture context
feature sharing: same features obtained at low layers can be useful for assembling complex features at higher layers
successive abstraction
- features close to image input resemble simple edge filters
- features close to class output are invariant to the typical appearance variations

Localization tasks

image classification only provides a class label per image
object localization: provide bounding box of the object
object detection: bounding boxes for potentially many object instances in the image
semantic segmentation: for each pixel, which object class does it belong to?
instance segmentation: additionally separates different class instances

Sliding window approach

define features in local window
consider many (or all) positions and scales and mace binary decision: is this window a card or not?
non-maximum-suppression: keep only local maxima
convolutional networks and sliding window concept are redundant
- exploit for larger efficiency

deep network classification on object window proposals (R-CNN)

input image $\rightarrow$ extract region proposals $\rightarrow$ compute CCN features $\rightarrow$ classify regions
instead of windows, only $\approx 1000$ region proposals are considered
faster implementations first compute the ConvNet activations of the whole image and use them to classify the proposal windows

Object recognition benchmarks

common benchmark needed
ImageNet, PAsCAL Visual Object Classes, Microsoft CoCo
training set and test set
detection performance is assessed by precision-recall curves

Precision-recall curves

precision: which part of the detected objectsis correct $\frac{\text{true positives}}{\text{true positives} + \text{false positives}}$
recall: which percentage of objects is detected $\frac{\text{true positives}}{\text{true positives} + \text{false negatives}}$
Precision-recall curves obtained by different thresholds on the classification score
single number: average precision

Rendering Pipeline

Rendering

generate an image given a virtual camera, objects and light sources
various techniques: rasterization, raytracing
major research topics in computer graphics

Rasterization

generate 2D images from 3D scenes
transform geometric primitives such as lines and polygons into raster image representations
3D objects are approximated by vertices (points), lines and polygons

Rendering pipline

processing stages comprise the rendering pipeline
supported by commodity grahics hardware
OpenGL and DirectX
Task: 3D input (camera, objects, light source, textures) $\rightarrow$ 2D output
functionality: resolve visibility, lighting model, compute shadows, apply textures

Main stages

vertex processing / geometry stage /vertex sharer
- process vertices independently in same way
- transformations per vertex, computes lighting per vertex
geometry shader
- generates, modifies, discards primitives
primitive assembly and rasterization / rasterization state
- assembles primitives such as points, lines, triangles
- converts primitives into raster image
- generates fragments / pixel candidates
- fragment attributes are interpolated from vertices of a primitive
fragment processing /fragment shader
- processes all fragments independently in the same way
- fragments are processed, discarded or stored in framebuffer

Vertex processing (Geometry stage)

Model/View Transform

position, orientate and scale each object and respective vertices in scene with model transform
position and orientate camera by view transform
i.e. inverse view transform is the transform that places camera at the origin of coordinate system, facing z-direction
entire scene is transformed with inverse view transform

Lighting

interaction of light sources and surfaces is represented with a lighting / illumination model
lighting computes color for each vertex (light source positionand properties, materials)

Projection Transform

$P$ transforms the view volume to canonical view volume
view volume depends on camera properties (orthographic projection vs perspective projection)
canonical view volume is a cube from $(-1, -1, -1)$ to $(1, 1, 1)$
view volume is specified by near, far, left, right, bottom, top
view volume (cuboid or frustum) is transformed into a cube
object inside the view volume are transformed accordingly
orthographic
- combination of translation and scaling
- all objects are translated and scaled in the same way
perspective
- complex transformation
- scaling factor depends on distance of an object to viewer
- objects farther away appear smaller

Clipping

primitives, that intersect boundary of view volume are clipped

Viewport Transform / Screen Mapping

projected primitve coordinates $(x_p, y_p, z_p)$ are transformed to screen coordinates $(x_s, y_s)$
screen coordinates together with depth value are window coordinates $(x_s, y_s, z_w)$
$(x_p, y_p)$ are translated and scaled from range $(-1, 1)$ to actual pixel positions $(x_s, y_s)$ on the display
$z_p$ is generally translated and scaled from the range of $(-1, 1)$ to $(0, 1)$ for $z_w$
screen coordinates $(x_s, y_s)$ represent the pixel position of a fragment that is generated in a subsequent step
$z_w$ , the depth value, is an attribute of this fragment used for further processing

Fragment processing

Fragment attributes are processed and tests are performed

Attribute processing

texture lookup
texturing
fog
antialiasing

Tests

scissor test: check if fragment is inside a specified rectangle
- used for, e.g., masked rendering
alpha test: check if the alpha value fulfills a certain requirement
- transparency
stencil test: check if stencil value in the framebuffer at the position of the fragment fulfills a certain requirement
- masking, shadows
depth test
- compare depth value of fragment and depth value of the framebuffer at the position of the fragment
- used for resolving visibility

Blending / Merging

combines the fragment color with the framebuffer color at the position of the fragment
- usually: determined by the alpha values
dithering
- finite number of colors
- map color value to one of the nearest renderable colors
logical operations / masking

OpenGL

graphics rendering API: display of geometric representations and attributes. Independent from OS and window system
interaction with GPUs, hardware-accelerated

OpenGL 1.0

fixed-function pipeline
focus on parallelized implementation
promoted by quasi-standards of all components of rasterization-based renderer

OpenGL 2.0

fixed-function pipeline with programmable vertex and fragment processing
- optional: shaders (work on vertex / fragment)

OpenGL 3.0

programmable vertex and fragment processing
no fixed-function pipeline
- required: shaders
focus on core functionality
improved handling of large data
improved flexibility
- implementation of non-standard effects not restricted to “misusing” pipeline functionality
programming
- not just a set of parameters of standard functionality
- concepts of vertex and fragment processing are not “nice to know”, but required knowledge
OpenGL Mathematics glm

OpenGL 3.2

geometry shader
- optional, modify geometric primitives
- generation of geometry no longer restricted to CPU
flexible use of texture data

OpenGL 4.1

tessellation shader
- optional
- tessellate patches
- flexible generation of large and detailed geometries

OpenGL 4.3

compute shader
- optional
- perform arbitrary computations
- are not part of the rendering pipeline

GPU Data Flow

data transfer to GPU (vertices with attributes and connectivity)
vertex shader: program that is executed for each vertex. Input and output is vertex
rasterizer
fragment shader (each fragment)
framebuffer update

Data Transfer

Vertex Buffer Object VBO
- copy memory from CPU to GPU
- arbitrary data: typically vertex attributes
Vertex Array Object VAO
- link between VBO and shader programs
- specifies how to interpret VBO data
- specifies the mapping to input variables of shaders

Shader

written in OpenGL Shading Language GLSL, runs on GPU, vertex and fragment shader are mandatory

Vertex shader

works on vertices
minimum functionality: transform from local to clip space
can compute/set a color at vertices
rasterizer can interpolate attributes from vertices to fragments

Fragment shader

colorize

Transformations

congruent transformations (Euclidean transformations)
- preserve shape and size
- translation, rotation, reflection
similarity transformations
- preserve shape
- translation, rotation, reflection, scale

Affine transformations

preserve collinearity: points on a line are transformed to points on a line
preserve proportions: ratios of distances between points are preserved
preserve parallelism: parallel linst stay parallel
angles and length are not preserved
translation, rotation, reflection, scale, shear are affine
orthographic project is a combination of affine transf.
perspective projection is not affine
3D point: $p' = T(p) = Ap + t$ ( $A$ : 3x3 matrix for scale and rotation; $t$ : translation)
preserve affine combinations: $T(\sum_i \alpha_i p_i) = \sum_i \alpha_i T(p_i) \qquad \forall \sum_i \alpha_i = 1$
e.g.: Line can be transformed by transforming control points
all affine transformations are represented with one matrix-vector multiplication

Points and Vectors

rendering pipeline transforms vertices, normals, colors, texture coordinates
points (e.g. vertices) specify a location in space
vectors (e.g. normals) specify a direction
relations between points and vectors:
- $p_1 - p_2 = \vec{v}$
- $p_1 + \vec{v} = p_2$
- $\vec{v_1} + \vec{v_2} = \vec{v_3}$
- $p_1 + p_2$ undef
- $\vec{p} = p - O \qquad p = O + \vec{p}$
transformations can have different effects on points and vectors
translation
- move point to a different position
- does not change the vector
using homogeneous coordinates, transformations of vectors and points can be handled in a unified way

Homogeneous Coordinates of Points

$\begin{bmatrix}x\\y\\z\\w\end{bmatrix}$ with $w \neq 0$ are homogeneous coordinates of the 3D point $\begin{bmatrix}\frac{x}{w}\\\frac{y}{w}\\\frac{z}{w}\end{bmatrix}$
$\begin{bmatrix}\lambda x\\\lambda y\\\lambda z\\\lambda w\end{bmatrix}$ represent the same point $\begin{bmatrix}\frac{\lambda x}{\lambda w}\\\frac{\lambda y}{\lambda w}\\\frac{\lambda z}{\lambda w}\end{bmatrix} = \begin{bmatrix}\frac{x}{w}\\\frac{y}{w}\\\frac{z}{w}\end{bmatrix}$ for all $\lambda \neq 0$

Homogeneous Coordinates of Vectors

Set $w = 0$

Affine Transformations and Projections

general form $\begin{bmatrix}m_{00} & m_{01} & m_{02} & t_0 \\ m_{10} & m_{11} & m_{12} & t_1 \\ m_{20} & m_{21} & m_{22} & t_2 \\ p_{0} & p_{1} & p_{2} & w \\ \end{bmatrix}$
$m_ii$ rotation, scale
$t$ translation
$p_i$ represent projection
$w$ is analog to the forth component for points/vectors

Examples

Translation
Rotation
inverse rotation
mirroring / reflection
scale
shear

Orthogonal matrices

rotation and reflection matrices are orthogonal $RR^T = R^TR = I$ and $R^{-1} = R^T$
$R_1, R_2$ are orthogonal $\Rightarrow R_1R_2$ is orthogonal
rotation: $\det R = 1$ reflection: $\det R = -1$
length of a vector does not change $||Rv|| = ||v||$
angles are preserved $<Ru, Rv> = <u, v>$

Basis Transform

translation: $p_2 = p_1 - t$ and $p_2 = T(-1)p_1$
rotation: by rotation matrix

application

view transform can be seen as basis transform
objects are places with respect to a global coordinate system $C_1 = (O_1, \{e_1, e_2, e_3\})$
camera is positioned at $O_2$ and oriented at $\{b_1, b_2, b_3\}$ given by viewing direction and up-vector
after view transform, all objects are represented in the eye or camera coordinate system $C_2 = (O_2, \{b_1, b_2, b_3\})$
placing and orienting the camera corresponds to the application of the inverse transform to the objects
rotation the camera by $R$ and translating it by $T$ corresponds to translating the objects bs $T^{-1}$ and rotating them by $R^{-1}$

planes and normals

planes can be represented by a surface normal $n$ and a point $r$ . All points p with $n(p-r) = 0$ form a plane.

Composing transformations

Composition is achieved by matrix multiplication.
Eg: Apply $T$ to $p$ and then $R$ : $R(Tp) = (RP)p$
Note that: $TR \neq RT$
order matters!

Projection

in 2D

projection from $v$ onto $l$ maps point $p$ onto $p'$
$p'$ is the intersection of the line though $p$ and $v$ with line $l$
$v$ is viewpoint, center of perspectivity
$l$ is the viewline
line though $p$ and $v$ is projector
$v$ is not on line $l, p \neq v$
if homogeneous component of viewpoint $v$ is not equal to zero, we have perspective projection
if $v$ is at infinity, we have parallel projection

Classification

location of viewpoint and orientation of viewline determine the type of projection
parallel (viewpoint at infinity, parallel projectors)
perspective (non-parallel projectors)

General Case

2D projection is represented by the matrix $M = vl^T - (l v) I_3$
$i = \{ax + by + c = 0\} = (a, b, c)^T$

Discussion

matrices $M$ and $\lambda M$ represent the same transformation $\lambda M p = \lambda p'$
2D transfomrations in homogeneous form $\begin{bmatrix}m_{11} & m_{12} & t_x \\ m_{21} & m_{22} & t_y \\ w_x & w_y & h \\ \end{bmatrix}$
$w_x$ and $w_y$ map the homogeneous component of $w$ of a point to a value $w'$ that depends on $x$ and $y$
therefore, scaling of a point depends on $x$ and / or $y$
in perspective 3D projections, this is generally employed to scale the $x$ and $y$ component with respect to $z$ , its distance to the viewer

in 3D

projection from $v$ onto $n$ , maps a point $p$ onto $p'$
$p'$ is the intersection of the line through $p$ and $v$ with plane $n$
$v$ is the viewpoint center of perspectivity
$n$ is the viewplane
the line through $p$ and $v$ is a projector
$v$ is not on the plane $n, p \neq v$
represented by a matrix $M = vn^T - (nv) I_4$
$n = \{ax + by + cz + d = 0\} = (a, b, c, d)^T$

OpenGL: View Volume

projection transformation maps a view volume to the canonical view volume
the view volume is specified by its boundary
the canonical view volume is a cube from $(-1, -1, -1)$ to $(1, 1, 1)$
transformation maps
- from eye coordinates
- to clip coordinates
- to normalized device coordinates NDC
- $x$ component of a point from (left, right) to $(-1, 1)$
- $y$ component of a point from (bottom, top) to $(-,1 1)$
- $z$ component of a point from (near, far) to $(-1, 1)$

Perspective projection

to obtian $x$ and $y$ component of a projected point, the point is first projected onto the near plane (viewplane)
$x_p = \frac{nx_e}{-z_e} \qquad y_p = \frac{ny_e}{-z_e}$
transform the view volume, the pyramidal frustum to the canonical view volume

Parallel projection

view volume is represented by cuboid
the projection transform maps the cuboid to the canonical value

OpenGL Matrices

Objects are transformed from object to eye space with $\text{GL_MODELVIEW} \cdot p$
Objects are transformed from eye space to clip space with $\text{GL_PROJECTION} \cdot \text{GL_MODELVIEW} \cdot p$
colors are transformed with color matrix GL_COLOR
texture coordinates are transformed with texture matrix GL_TEXTURE

Matrix Stack

each matrix type has a stack
matrix on top of stack is used

Lightning

Radiometric Quantities

radiant energy $Q$
radiant flux $\Phi$ radiant power $P$
- rate of flow of radient energy per unit time: $\Phi = \frac{dQ}{dt}$
flux density (irradiance, radiant existance)
- radiant flux per unit area: $E = \frac{d \Phi}{A}$
irradiance measures overall radiant flux $\Phi$ (light flow, photons per unit time) into a surface element
radiant intensity
- radiant flux per unit solid angle $I = \frac{d \Phi}{d \omega}$
- radiant flux $\Phi$ (light flow, photons per unit time) incdent on, emerging from or passing through a point in a certain direction

Solid Angle

2D angle in 3D space
measured in steradians
- $d \omega = \frac{dA}{r^2}$

Inverse square law

point light source with radiant intensity $I$ in direction $\omega$
irradiance along a ray in direction $\omega$ is inversely proportional to the square of the distance $r$ from the source $E = \frac{I}{r^2}$
the number of photons emitted in direction $d\omega$ and hitting surface area $dA$ at distance $r$ is inversely proportinal to $r^2$
the area subtended by a solid angle is proportional to $r^2$
$E = \frac{d \Phi}{d A} = \frac{d \Phi}{r^2dw} = \frac{I}{r^2}$

Radiance

radiant fulx per unit solid angle per unit projected area incident on, emerging from, passing through a surface element in a certain direction: $L = \frac{d^2 \Phi}{d A_p d \omega} = \frac{d^2 \Phi}{d A \cos \theta d w}$
describes strength and directon of radiation
constant radiance in all directions corresponds to a radiant intensity that is proportional to $\cos \theta$ but constant radiant intensity results in different radiance
measured by sensors
computed in computer-generated images
preserved along lines in space
does not change with distance

Visible color spectrum

photons characterized by wavelength in visible spectrum $390 \text{nm}$ to $750 \text{nm}$
light consists of a set of photons
the distribution of wavelengths within this set is referred to as spectrum of light
spectra are perceived as colors
if spectrum consists of dominant wavelength, humans perceive a “rainbow” color (monochromatic)
equally distributed wavelengths are perceived as shade of gray (achromatic)
otherwise, colors “mixed from rainbox colors” are perceived (chromatic)

Human Eye

sensitive to radiation within the visible spectrum
has sensors for day and night vision (cones, rods)
perceived light is radiation spectrum weighted with absorption spectra of the eye
in daylight, three cone signales are interpreted by the brain

CIE Color Space

XYZ color space
- $X = \int_\lambda I(\lambda) \bar{x}(\lambda) d\lambda$
- $Y = \int_\lambda I(\lambda) \bar{y}(\lambda) d\lambda$
- $Z = \int_\lambda I(\lambda) \bar{z}(\lambda) d\lambda$
spectrum $I$ is represented by $X, Y, Z$ given the color-matching functions $\bar{x}, \bar{y}, \bar{z}$
color-matching functions correspond to spectral sensitivities of the cones of a standard observer

CIE xy Chromaticity diagram

XYZ represents color and brightness / luminance
two values are sufficient to represent color
- $x = \frac{X}{X + Y + Z}$
- $y = \frac{Y}{X + Y + Z}$
monochromatic colors are on the boundary
the center is achromatic

Display Devices

use three primary colors, can only reproduce colors within spanned triangle

RGB Color Space

three primaries: red, green, blue
- light source color: what is contained in spectrum
- object color: what is reflects, what is absorbed

Lightning models

compute interaction of objects with light
account for variety of light sources and material properties
but: keep it fast
- only use local information
- interaction between objects neglected
- interreflections among objects not handled

diffuse vs specular

diffuse: reflected into many different directions, matte surfaces
specular (mirror like): is reflected into a dominant direction (shiny surface)
material properties
general: combination of both

Lambert’s Cosine Law

computation of diffuse reflection is governed by Lambert’s cosine law
radiant intensity reflected from a “Lambertian” (matte, diffuse) surface is proportional to the cosine of the angle between view direction and surface normal $n$
irradiance on an oriented surface patch above a Lambertian surface is constant
irradiance on a surface is proportional to the cosine of the angle between surface normal $n$ and light source direction $l$

specular reflection

perceived radiance depends on viewing direction
- maximum radiance, if viewer direction $v$ corresponds to the reflection direction $r$ or, if the halfway vector $h$ corresponds to the surface normal
Phong and Blinn-Phong do not account for energy preservation
radiance depends on angle $\theta$ and exponent $m$
radiant exitance from surface depends on $m$
normalized Blinn-Phong versions account for energy conservation

ambient light

diffuse and specular reflection are modeled for directional light from a point light source of from parallel light
ambient light is an approx. for indirect light sources
the effect is generally approximated by a constant

Phong illumination model

combines, ambient, diffuse and specular components
Phong allows to set different light colors for different components
multiple light sources possible

Considering Distances

between object surface and light source
- irradiance of surface illuminated by a point light source is inversely proportional to the square distance from the surface to the light source
- light source attenuation
between object surface and viewer
- atmospheric effects, e.g. fog, influence the visibility of objects
- refers to transparency of air
- low visibility, radiance converges to a “fog color”

Shading models

specify whether lighting is evaluated per vertex or per fragment
if per vertex: colors are interpolated or not
per fragment: surface normals are interpolated accross a primitive
flat shading (Constant shading)
- evaluation of the lighting model per vertex
- primitives are colored with color of one specific vertex
Gouraud shading
- evaluation of lighting model per vertex
- primitives are colored by bilinear interpolation of vertex colors
Phong shading
- bilinear interpolation of vertex normals during rasterization
- evaluation of the lighting model per fragment

Mach bad Effect

mach bands are illusions due to our neural processing
the bright bands at 45 degrees and 135 degrees are illusory. The intensity inside each square is the same

Shading models

flat shading (constant shading): simple, fast
Gouraud shading: more realistic, suffers from Mach band effect, local highlights are not resolved if the highlight is not captured by a vertex
Phong shading: highest quality, most expensive

Rasterization

Transform geometric primitives into a raster (pixel positions).

Line

components start and end point of line are integer values $p_b = (x_b, y_b)$ and $p_e = (x_e, y_e)$
lines are represented as $y = mx + b$ or $F(x, y) = ax + by + c = 0$
algorithms are often restricted to $0 \leq m \leq 1$
algorithms consist of an initialization step and a loop
line rasterization algorithms are usually described for a subset of lines and generalized using symmetries
incremental updates are often employed
Bresenham avoids floating-point aritthmetic
improved algorithms address aliasing / clipping artifacts
note that the algorithms do not compute all pixels that are intersected by the line

Circle

center $(0, 0)$ and radius $r$
implicit representation $F(x, y) = x^2 + y^2 - r^2 = 0$
algorithms compute only one eight of a circle
incremental updates are often employed
floating-point arithmetic is avoided

Polygons

a polygon is defined by edges
should be closed to allow inside / outside classification
rasterization estimates all pixel positions inside a polygon
in general simple, but
- if adjacent polygons share an edge, each pixel on the edge should belong to exactly one polygon
- no pixel along the edge should be missed
- no pixel along the edge should be rasterized twice
estimates all pixel positions inside a polygon

Shadow

improve the realism in rendered images
illustrate spatial relations between objects
Goal: determination of shadowed parts in 3D scenes
only geometry is considered
not standard functionality of rasterization-based rendering pipeline

Projection Shadows

project the primitives of occluders onto a receiver plane, based on the light source location
projected primitives form shadow geometry
implementation: draw receiver plane, disable depth test, draw shadow geometry, enable depth test, draw occluders
issues: restricted to planar receivers, no self-shadowing, antishadows

Shadow Maps

see shadow casting as visibility problem
scene points are visible from light source (illuminated), others invisible from light source (in shadow)
resolving visibility is standard functionality in rendering pipeline (z-buffer algorithm)
algorithm:
- render scene from light source
- store all distances to visible (illuminated) scene points in a shadow map
- render scene from camera
- compare the distance of rendered scene points to the light with stored values in the shadow map
- if both distances are equal, the rendered scene point is illuminated

Scene rendering

use two depth buffers
- “usual” depth buffer for view point
- second depth buffer for light position (shadow map)
render scene from light position into shadow map
render scene from view position into depth buffer
- transform depth buffer values to shadow map values
- compare transformed depth buffer values and shadow map values to decide whether a fragment is shadowed or not

Aliasing

discretized representation of depth values can cause an erroneous classification of scene points
offset of shadow map value reduces aliasing artifacts

Sampling

in large scenes, sampling artifacts can occur
uniform sampling of the shadow map can result in non-uniform resolution for shadows in a scene
shadow map resolution tends to be too coarse for nearby objects and too high for sistand objects
Solution: perspective shadow maps
- scene and light are transformed using the projective transformation of the camera
- compute the shadow map in post-perspective space

Shadow volumes

employ a polygonal representation of the shadow volume
point-in-volume test
implementation issues: classification of shadow volume polygons into front and back fases
stencil buffer values count the number of intersected front and back faces
algorithm: Z-pass
- render scene to initialize depth buffer
- stencil enter / leave counting approach
  - render shadow volume twice using face culling
  - do not update depth and color
- if pixel is shadow, stencil is non zero
- else stencil is zero
implementation
- render the scene with only ambient lighting
- render front facing shadow volume polygons to determine how many front face polygons are in front of the depth buffer pixels
- render back facing shadow volume polygons
- render the scene with full shading where stencil is zero
can miss intersections due to near plane clipping

Z-Fail

basic Z-Fail
- counts the difference of occluded front and back faces
- misses intersections behind the far plane
with depth clamping
- do not clip primitives at the far plane
- clamp the actual depth value to the far plane
Z-fail with far plane at infinity
- adapt the perspective projection matrix

Texture Mapping

add per-pixel surface details without raising geometric complexity of a scene
keep number of vertices and primitives low
textures are represented as 2D images
can represent variety of properties: colors, normals, etc.
positions of texture pixels are characterized by texture coordinates $(u, v)$ in texture space
texture mapping is a transformation from object space to texture space $(x, y, z) \rightarrow (u, v)$
texture mapping is generally applied per fragment

Pipeline

object space location
- texture coordinates are defined for object space locations
- if the object moves in world space, move texture along with it
projector function
- maps a 3D object space location to a 2D parameter-space value
corresponder functions
- map from parameter space to texture space
- texture-space values are used to obtain values from a texture
value transform
- a function that transforms obtained texture values

Projector functions

planar projection, cylindrical projection or spherical projection
can be an initialization step for the surface parameterization
use different projections for surface patches
- minimize distortions, avoid artifacts at seams
- several texture coordinates can be assigned to a single vertex

Corresponder functions

one or more corresponder functions transform from parameter space to texture space
transform of $(u, v)$ into valid range from $0$ to $1$ e.g.
- repeat, mirror, clamp to edge, clamp to border

Value transform

arbitrary transformations, e.g. represented by a matrix in homogeneous form, can be applied to texture values
texture blending functions
- define how to use the tetxue value
- replace the original surface color with texture color
- linearly combine the original surface color with the texture
- multiply, add, subtract surface color and texture color
alternatively, texture values can be used in the computation of illumination model

Image texturing

2d image is mapped / glued to a triangulated object surface
certain aspects affect the quality
interpolation of texture coordinates
sampling
quality of applied textures can be improves by
- perspective-correct interpolation
- considering magnification and minification

Transparency and Reflection

simplified transparency model
- semitransparent objects are filters / attenuators of occluded objects
- refraction and object thickness are neglected
algorithms are based on stipple patterns, color blending per pixel

Stipple patterns

screen-door transparency
transparent object is rendered with a fill / stipple pattern, e.g. checkerboard (pattern of opaque and transparent fragments)
limited number of fill patterns results in limited number of transparency levels
aliasing artifacts
simple methods

Color Blending

alpha values: describes the opacity of fragment, stored with RGB color (RGBA)
order matters

Depth ordering

polygons / fragments have to be rendered in sorted depth order
hardware generally renders in object order
depth test only returns the nearest fragment per pixel, sorting is not realized
intersecting polygons have to be handled
dynamic scenes require re-sorting

Convex objects

exactly two depth layers for arbitrary viewing directions
first depth layer defined by front faces
second depth layer defined by back faces
algorithm: render back faces in first pass, blend with front faces in second pass

arbitrarily shaped objects

object-space methods
- use pre-computed spatial data structures (e.g. binary partition tree)
- useful for static geometry
- varying viewer positions and orientations can be handled
screen-space methods
- employ the functionality of the rendering pipeline
- several rendering passes compute depth layers
- final pass renders the ordered depth layers
- useful for dynamic / deforming geometry and arbitrary views
- no pre-computation is required

Binary space partitioning (BSP)

BSP tree is hierarchical spatial data structure
3D space is subdivided by means of arbitrary oriented planes
leaves represent convex space cells
applications
- visible surface algorithm
- depth sorting
- collision detection
precomputed for static scenes
all planar primitives are represented in the tree
balancing is less important, as entire tree has to be queried
one rendering pass

Querying (rendering)

back to front rendering
- render far branch of viewpoint
- render root (node) polygon
- render near branch of the viewpoint
- recursively applied to sub-trees

Discussion

not only visible surface generation, but depth sorting of all primitives per pixel position
additional data structure
can be pre-computed
required polygon splits
dynamic scenes require an update

Depth peeling

use the functionality of rendering pipeline
several rendering passes compute depth layers
final pass renders the ordered depth layers
useful for dynamic / deforming geometry
no pre-computation is required / can be employed
algorithm
- first render pass gives the front-most fragment color / depth
- each successive render pass extracts the fragment for the next-nearest fragment on a per pixel basis (screen-space approach)
- two depth buffers are used
object is rendered once for each depth layer
two separate depth tests per fragment
depth peeling strips away one depth layer with each successive rendering pass
implementation based on shadow mapping

Discussion

screen-space algorithm
multiple rendering passes generate depth layers per pixel position
view dependent (in constrast to BSP)
appropriate for dynamic scenes
quality and performance are determined by number of rendering passes

Reflection

Angle of incidence is equal to the angle of reflection.

Planar surfaces

original and reflected geometry is rendered
reflected geometry is generated with respect to the reflection plane with surface normal $n = (n_x, n_y, n_z)$ and point $p$ on the reflection plane
semi-transparent reflection plan, e.g. with color blending (original geometry)
reflection plane to stencil: only where stencil is set

Arbitrary surfaces: Environment Mapping

cube mapping: approximates reflections of the environment on arbitrary surfaces
project environment onto a object-embedding shape (sphere or cupe)
view-dependent mapping
approximate implementation of reflections off arbitrary surfaces
cannot handle changing reflections of moving objects

Steps

generate or load a 2D map of environment
for each fragment of a reflective object: compute the normal $n$
compute the reflection vector $r$ from the view vector $v$ and the normal $n$ at the surface point
use the reflection vector to compute and index into the environment map that represents the objects in the reflection direction.
use the texel data (texture value) from the environment map to color the current fragment

Cubic mapping

placing a camera at center of object and take pictures in 6 directions

Spherical mapping

orthographically project an image of a mirrored sphere
map stores colors seen by reflected rays
sphere map contains information about both, the environment in front of the sphere and in back of the sphere
can be obtained by Raytracing, warping automatically generated cubic maps

Discussion

disadvantages: maps are hard to create on the fly, sampling non-linear, non-uniform, sampling is view-point dependent
advantages: no interpolation across map seems
objects should be small compared to environment
issues with self-reflections and non-convex objects
separate map for each object
maps may need to be changed in case of a change in viewpoint due to non-uniform sampling
translated objects might require a map update

Image Processing and Computer Graphics WS1516

Introduction

Digital image

Human vison

Camera

Sampling

Continuous vs discrete

Aliasing and Moiré effect

Nyquist-Shannon theorem

Downsampling

Upsampling

Quantization

Noise

Signal-to-noise ratio (SNR)

Point operations

Constrast enhancement

Gamma correction

Gray value histogram

Difference image

Linear filters

Convolution

Discrete convolution

Gauss filter

Boundary conditions

Box filter

Recursive filter (Deriche filter)

Derivative filters

Gaussian derivative

Higher order derivatives

Energy Minimization

Example: image denoising

Advantages

Problems

Example: image denoising

Convexity

Jacobi method

Gauß-Seidel method

Successive over-relaxation (SOR)

Conjugate gradient (CG)

Multigrid methods

Unidirectional (cascadic) multigrid

Correcting (bidirectional) multi-grid

Full multigrid

Integer problems

Variational Methods

Continuous energies

Consistency

Calculus of variation

Gâteaux derivative

Denoising example

Implementation

Discontinuity preserving regularization

Bayesian approach

Energy minimization

Interpretation

Gradient descent

Faster: lagged diffusivity

Motion estimation

Optic flow constraint

Lucas-Kanade method

Global optic flow estimation

Horn-Schnuck model

Problems

Average angular error

Motion discontinuities and occlusions

Illumination changes

Large motion

Segmentation and Grouping

Segmentation

Feature space

Edge-based vs region-based methods

Thresholding

Clustering

Greedy heuristics

Watershed segmentation

Energy minimization: snakes model

Implicit representation of contours

Region-based active contours

Graph cuts for segmentation

Semantic segmentation