About
Click here to check the course website.
Below are some sample answers to the supervision questions inlined, highlighted .
Supervision 1
Warmup questions

What is an image? How are digital images represented in memory?
An image can be thought of as a 2D function (colour as a function of location). A digital image is commonly represented as a 2D array of pixels.

What is colour banding?
Colour banding is a visual artefact when there is an insufficient number of bits allocated to the colour depth. Most commonly appearing in gradient images (slow monotonic change, see slide 14). Looks worse due to the visual system emphasising edges (aka Mach banding)

What is quantisation?
Quantisation is the process of mapping a continuous variable to a discrete one. E.g. storing linear 0..1 values on 8 bits (256 discrete levels). The more bits you use, the more accurate the signal remains after quantisation.

What are the ray parameters of the intersection points between ray (1,1,1) + t(−1,−1,−1) and the sphere centred at the origin with radius 1?
1+1 / \sqrt(3), 1  1 / \sqrt(3). From these two intersections we are likely to use the the smaller nonnegative one in Graphics.

Why do we need antialiasing? Why is a random grid better than a regular grid?
Aliasing artefacts include: (1) disappearing thin objects, (2) jagged edges, (3) Moire effects https://en.wikipedia.org/wiki/Moir%C3%A9_pattern
Antialiasing reduces these aliasing artefacts, commonly by taking multiple samples in each pixel.
A regular supersampling grid can remove most artefacts, but it might interfere with the sampled pattern (e.g. sampling a brick wall, or something with a rectangular structure). A random sampling grid is unlikely to cause interference (Moirelike) arterfacts.
Longer questions

Explain the three components of the Phong reflection model. What colour should the specular highlights be?
slides 4245; pleas provide both the equations and the reason/rationale.
E.g. see (a) in https://www.cl.cam.ac.uk/teaching/exams/solutions/2017/2017p04q03solutions.pdf

What information would you need to define a raytracing viewing volume / frustum (look these up if you are not sure what they mean)?
https://en.wikipedia.org/wiki/Viewing_frustum
There are a few options here, but a common approach is to include (1) camera position (2) forward look vector from camera (3) up vector from the camera (4) horizontal field of view in deg or rad (5) screen aspect ratio. It also makes sense for numerical precision to define the distance to a near and far clipping plane, which gives the "clipped pyramid" shape.

Write pseudocode for the ray tracing algorithm, where the first line of code is as stated below.
(if you use slide 32, make sure you explain each line in detail)
for each pixel: calculate the ray set closest_object to unknown for each object: if this object is closer than closest_object: set closest_object to object calculate colour based on closest object (expensive!!!!) set pixel colour

Explain how Ray tracing can achieve the following effects:
In all examples we tend to offset the ray start location by a small Epsilon value to avoid having a p=0 intersection in the new ray

reflections
Upon an object intersection compute the vector of perfect reflection, then recursively calculate colour. The recursion often has a maximum depth (maximum bounce count). Reflected and local colour are often mixed (interpolated) depending on surface properties. I.e. in a perfect mirror we only use the reflected colour, in a nonreflective surface we don't use the reflected colour at all. 
refraction
Same as reflection, but we compute the recursive ray using Fresnel's law 
shadows
Shoot a ray from the object towards each light source. Only take into account light sources, where there is no object intersecting the ray between the start location and the light source.

reflections

Provide two examples for distributed ray tracing and explain how the selected techniques works
 antialiasing via supersampling: shoot multiple samples within each pixel and average them
 motion blur: compute image multiple times over a time interval and average them
 depth of field: instead of using a pinhole eye model, shoot multiple rays with small eye offsets over each pixel and average them
 soft shadows: instead of a single shadow ray, shoot multiple rays over the finitelysized light source and compute what % of them are blocked.
Supervision 2
Warmup questions

What is OpenGL? What does it mean that it's an API?
OpenGL is crosslanguage, crossplatform 2D and 3D graphics API. The fact that it's an API (application programming interface) means that there are numerous implementations; an embedded hardware will achieve the same pipeline step in a different way than the latest PC GPU. You can think of it as a Java interface with hundreds of implementing classes.

How is Vulkan different from OpenGL?
Vulkan is a lowerlevel API which gives finer control over the graphics hardware at the cost of typically (even) more boilerplate code.

We use a lot of triangles to approximate stuff in computer graphics. Why are they good? Why are they bad? Can you think of any alternatives?
+ve: Triangles are always coplanar, three points always describe an unambiguous primitive.
ve: curves (e.g. a sphere) take a lot of tringles to approximate well
?: we could use some nonpolygon objects such as Bezier patches

Put the the following stages of the OpenGL rendering pipeline in the correct order. Very briefly explain what each stage does and comment whether each stage is programmable.
This is the most likely solution, but actual hardware implementation might deviate e.g. when it comes to clipping and rasterisation.

Vertex shader: transforms the vertices to screen coordinates.
Programmable
 Primitive setup: groups vertices together into primitives (typically triangles) using the vertex shared output and the index (element) buffer. This is best done after vertex shading, so vertices shared by multiple triangles are only `shaded` once.
 Clipping: remove triangles outside the screen. Cannot be done before primitive setup. Tricky part is if you have triangles partially inside.
 Rasterization: break up each triangle into fragments (or pixels). No colour computed yet, but the vertex properties (as computed by the vertex shader) are interpolated over the triangles using barycentric coordinates.

Fragment shader: compute the colour of the final fragment/pixel based on the interpolated data. Could also use textures here. Might just run the Phong equation or something fancier.
Programmable

Vertex shader: transforms the vertices to screen coordinates.
Programmable

What are “in”, “out” and “uniform” variables in GLSL? How are the values of these variables set?
in: field in the input class. E.g. a property of each vertex in the vertex shader. immutable
out: field in the output class. E.g. a property of the transformed vertex in the vertex shader. Computed in the shader itself.
uniform: a value that is constant during a single draw call, but it can change between subsequent draw calls (unlike a constant which is constant forever). Might be something like camera coordinates.
Longer questions

Similarly to last supervision, write a few lines of pseudocode for rendering with OpenGL (rasterisation):
function draw_triangles(triangles): for each triangle in triangles: transform triangle to screen space # vertex shader for each pixel in triangle: # rasterise if this pixel is closer than the current z value: calculate colour # fragment shader set colour
notice how this is very similar to ray tracing; notice how the nested for loops are swapped which affect performance (e.g. in OpenGL subsequent colour calculations are normally on the same object, which means that corresponding textures can be cached. This is not true for ray tracing).

Describe the Model, View, and Projection transformations. Comment on why we use homogeneous coordinates.
 Model: transform from model coordinates (typically the object sitting on the origin) into world coordinates. Allows using the same object many times. Typically involves translation, rotation and scaling. Translation cannot be represented as a 3x3 matrix transform, but we ideally want to represent everything as matrices and their products, hence we need homogeneous coordinates.
 View: transform from world coordinates to view coordinates. No space distortion, just make sure that the camera is in the origin with the forward vector pointing down the negative z axis (if using righthanded coordinate system). Can be thought of as the inverse of the transform that takes the origin to the camera. https://learnopengl.com/Gettingstarted/Camera

Projection: transforms the view coordinates to screen coordinates. I.e. pixels covering each other will have the same x and y coordinates. Can be perspective or orthographic. If perspective then this is a spacedistorting transform that effectively turns the viewing frustum into a cube
http://www.songho.ca/opengl/gl_projectionmatrix.html

When transforming objects into world coordinates using matrix
M
, position vectors are premultiplied with
M
. Discuss whether this matrix is suitable to transform the objects' normals. If not, can you suggest an alternative?
inverse of the transpose of the top 3x3 of M, see slide 99 
2010 Paper 4 Question 4
https://www.cl.cam.ac.uk/teaching/exams/solutions/2010/2010p04q04solutions.pdf

2017 Paper 4 Question 3
https://www.cl.cam.ac.uk/teaching/exams/solutions/2017/2017p04q03solutions.pdf

Describe the z buffering algorithm. Compare the projection matrix on slide 86 with the projection matrix in the 2010P4Q4 past paper, and discuss which one you need to use for Z buffering
Z buffering involves using a screensized floating point or fixed point buffer which stores the z value of the nearest pixel we have seen so far. Initially each value in the z buffer might be set to infinity / max. Whenever a new pixel is drawn, we can calculate its new z value and check whether this is closer or further away. If it is closer (passes the z test) then we write the colour to the colour buffer and update the z value in the z buffer.
The 2010P4Q4 matrix results in every vector having z=1 which is mathematically speaking is indeed projection, but makes z buffering impossible. Hence slide 86 is better.
On another note for slide 86: this outputs 1/z rather than z which is useful for perspectivelycorrect texturing (see https://en.wikipedia.org/wiki/Texture_mapping#Perspective_correctness ). This means however that the z buffering algorithm might need some tweaking. E.g. initialise the z buffer to 0 (smaller than any 1/z value), and upon drawing check if this 1/z value is larger than the previous 1/z value.
Either way, z buffering is a pretty simple bruteforce, highmemoryfootprint approach that is very popular.

What is the worst case scenario, in terms of a number of times a pixel colour is computed, when rendering N triangles using the Zbuffer algorithm? How could we avoid such a worstcase scenario?
Worst case scenario is that triangles overlap and we get unlucky in their ordering. I.e. if we draw the furthest first and the closest last, then z buffering will not save any colour calculations (pixel computed N colours, N1 which are discarded). Remembering that colour calculation can be very expensive, some games might opt to render objects that are known to be close (e.g. UI elements, or a weapon in an FPS), then the background (e.g. skybox) last.
Supervision 3
Texturing

How could you use the following texture types to texture a sphere in OpenGL?

2D
commonly done with Mercator projection , but there are a few alternatives. Distortions around the poles 
3D
think of it as layers of 2D textures or voxels (Minecraft world shape of coloured cubes). Very expensive in terms of storage, and for a sphere it is probably not worth it (very small number of voxels actually on the surface of the sphere). 
CUBE_MAP
6 2D textures describing faces of a cube. For sampling, shoot a ray from the centre of the sphere and intersect with the cube. Slightly more memory usage and more expensive sampling than 2D, but fewer distortions (there are some around the cube vertices, but not as bad as Mercator poles). Much cheaper in terms of storage than 3D.

2D

For downsampling an image, explain how each of the following sampling techniques work (feel free to use khronos.org when unsure). Discuss performance, storage and visual quality.
 GL_NEAREST
 GL_LINEAR
 GL_NEAREST_MIPMAP_NEAREST
 GL_LINEAR_MIPMAP_NEAREST
 GL_NEAREST_MIPMAP_LINEAR

GL_LINEAR_MIPMAP_LINEAR
bookwork with many good online resource. E.g. see https://learnopengl.com/Gettingstarted/Textures
GL_LINEAR_MIPMAP_LINEAR is the most expensive, but hardware is typically optimised for it, so the most commonly used. Unless we know that there is no downsampling (e.g. showing UI elements) when nearest or linear might be preferred.

Search for "normal map" images on the internet. Why do they tend have an overall blue shade?
normal vectors have x,y,z, coorindates where x and z range from 1 to +1 and y typically ranges from 0 to 1. When encoding this as colour, each value is stored from 0 to 255 (assuming an 8bit image), so e.g. x is encoded as (x+1)/2*255. Effectively mapping a neutral normal to (127, 127, 255)

How could you implement a reflective water surface in OpenGL using Frame Buffer Objects? What if you wanted to add reflection onto a spherical surface? (Ray tracing is tempting, but you are to think about the OpenGL way here :) ).
Reflection, refraction and shadows in OpenGL are done using multiple render passes , where the entire scene is rendered using multiple cameras and stored in textures before the final image is computed. For water, we can render the scene from an imaginary underwater camera and use this to compute the final image. See fun example
For a spherical surface, we can use 6 cameras to build up a cube map and use environment mapping.
Colour / perception

What is the difference between luma and luminance?
Luma is the internal pixel intensity representation, often gammacompressed. Luminance is a physical unit, the measure of light weighted by the achromatic response of the eye.

Why is gamma correction needed?
When sending the signal from the GPU to the monitor, the signal is often quantised to few bits (e.g. 8 bits). The eye finds it easier to distinguish small absolute differences in low luminance levels, so banding artefacts would be very visible in the dark parts of an image. Gamma correction/compression is more perceptually uniform. Gamma compression is undone by the screen during display. see slide 208

What are the differences between rods and cones?
rods operate in lowlight conditions, are mostly outside the fovea (centre of the retina) and are colour blind
cones operate in daylight condition, are mostly inside the fovea and as there are 3 types of them (L, M, S) they can encode colour.
A fun fact here: when star gazing, you don't have enough light to trigger your cones, so you actually want to look slightly off the start (outside the fovea) where you have rods.

How can two colour spectra appear the same? What are these called then?
metamers are colour spectra which are different physically, but their LMS responses are identical. E.g. pure yellow and some combination of red+green appear identical to a human.

What is the relation between LMS cone sensitivities, CIE XYZ and the RGB space of a monitor?
These are all trichromatic colour spaces (i.e. there are 3 primary colours). As such, there are 3x3 matrices that can transform one to the other.
If interested, take a look at the Math page of http://www.brucelindbloom.com/index.html

Explain the purpose of tonemapping and displayencoding steps in a rendering pipeline.
slides 230232

What is the rationale behind sigmoidal tonecurves?
slide 238240