Click here to check the course website.

Below are some sample answers to the supervision questions inlined, highlighted .

Supervision 1

Warmup questions

  1. What is an image? How are digital images represented in memory?
    An image can be thought of as a 2D function (colour as a function of location). A digital image is commonly represented as a 2D array of pixels.

  2. What is colour banding?
    Colour banding is a visual artefact when there is an insufficient number of bits allocated to the colour depth. Most commonly appearing in gradient images (slow monotonic change, see slide 14). Looks worse due to the visual system emphasising edges (aka Mach banding)

  3. What is quantisation?
    Quantisation is the process of mapping a continuous variable to a discrete one. E.g. storing linear 0..1 values on 8 bits (256 discrete levels). The more bits you use, the more accurate the signal remains after quantisation.

  4. What are the ray parameters of the intersection points between ray (1,1,1) + t(−1,−1,−1) and the sphere centred at the origin with radius 1?
    1+1 / \sqrt(3),    1 - 1 / \sqrt(3).   From these two intersections we are likely to use the the smaller non-negative one in Graphics.

  5. Why do we need anti-aliasing? Why is a random grid better than a regular grid?
    Aliasing artefacts include: (1) disappearing thin objects, (2) jagged edges, (3) Moire effects https://en.wikipedia.org/wiki/Moir%C3%A9_pattern
    Anti-aliasing reduces these aliasing artefacts, commonly by taking multiple samples in each pixel.
    A regular super-sampling grid can remove most artefacts, but it might interfere with the sampled pattern (e.g. sampling a brick wall, or something with a rectangular structure). A random sampling grid is unlikely to cause interference (Moire-like) arterfacts.

Longer questions

  1. Explain the three components of the Phong reflection model. What colour should the specular highlights be?
    slides 42-45; pleas provide both the equations and the reason/rationale.
    E.g. see (a) in https://www.cl.cam.ac.uk/teaching/exams/solutions/2017/2017-p04-q03-solutions.pdf

  2. What information would you need to define a ray-tracing viewing volume / frustum (look these up if you are not sure what they mean)?
    There are a few options here, but a common approach is to include (1) camera position (2) forward look vector from camera (3) up vector from the camera (4) horizontal field of view in deg or rad (5) screen aspect ratio. It also makes sense for numerical precision to define the distance to a near and far clipping plane, which gives the "clipped pyramid" shape.

  3. Write pseudo-code for the ray tracing algorithm, where the first line of code is as stated below.
    (if you use slide 32, make sure you explain each line in detail)

            for each pixel:          calculate the ray set closest_object to unknown for each object: if this object is closer than closest_object: set closest_object to object calculate colour based on closest object (expensive!!!!) set pixel colour

  4. Explain how Ray tracing can achieve the following effects:
    In all examples we tend to offset the ray start location by a small Epsilon value to avoid having a p=0 intersection in the new ray
    • reflections
      Upon an object intersection compute the vector of perfect reflection, then recursively calculate colour. The recursion often has a maximum depth (maximum bounce count). Reflected and local colour are often mixed (interpolated) depending on surface properties. I.e. in a perfect mirror we only use the reflected colour, in a non-reflective surface we don't use the reflected colour at all.
    • refraction
      Same as reflection, but we compute the recursive ray using Fresnel's law
    • shadows
      Shoot a ray from the object towards each light source. Only take into account light sources, where there is no object intersecting the ray between the start location and the light source.
  5. Provide two examples for distributed ray tracing and explain how the selected techniques works
    • anti-aliasing via super-sampling: shoot multiple samples within each pixel and average them
    • motion blur: compute image multiple times over a time interval and average them
    • depth of field: instead of using a pinhole eye model, shoot multiple rays with small eye offsets over each pixel and average them
    • soft shadows: instead of a single shadow ray, shoot multiple rays over the finitely-sized light source and compute what % of them are blocked.

Supervision 2

Warmup questions

  1. What is OpenGL? What does it mean that it's an API?
    OpenGL is cross-language, cross-platform 2D and 3D graphics API. The fact that it's an API (application programming interface) means that there are numerous implementations; an embedded hardware will achieve the same pipeline step in a different way than the latest PC GPU. You can think of it as a Java interface with hundreds of implementing classes.

  2. How is Vulkan different from OpenGL?
    Vulkan is a lower-level API which gives finer control over the graphics hardware at the cost of typically (even) more boilerplate code.

  3. We use a lot of triangles to approximate stuff in computer graphics. Why are they good? Why are they bad? Can you think of any alternatives?
    +ve: Triangles are always coplanar, three points always describe an unambiguous primitive.
    -ve: curves (e.g. a sphere) take a lot of tringles to approximate well
    ?: we could use some non-polygon objects such as
    Bezier patches

  4. Put the the following stages of the OpenGL rendering pipeline in the correct order. Very briefly explain what each stage does and comment whether each stage is programmable.
    This is the most likely solution, but actual hardware implementation might deviate e.g. when it comes to clipping and rasterisation.

    • Vertex shader: transforms the vertices to screen co-ordinates. Programmable
    • Primitive setup: groups vertices together into primitives (typically triangles) using the vertex shared output and the index (element) buffer. This is best done after vertex shading, so vertices shared by multiple triangles are only `shaded` once.
    • Clipping: remove triangles outside the screen. Cannot be done before primitive setup. Tricky part is if you have triangles partially inside.
    • Rasterization: break up each triangle into fragments (or pixels). No colour computed yet, but the vertex properties (as computed by the vertex shader) are interpolated over the triangles using barycentric co-ordinates.
    • Fragment shader: compute the colour of the final fragment/pixel based on the interpolated data. Could also use textures here. Might just run the Phong equation or something fancier. Programmable

  5. What are “in”, “out” and “uniform” variables in GLSL? How are the values of these variables set?
    in: field in the input class. E.g. a property of each vertex in the vertex shader. immutable
    out: field in the output class. E.g. a property of the transformed vertex in the vertex shader. Computed in the shader itself.
    uniform: a value that is constant during a single draw call, but it can change between subsequent draw calls (unlike a constant which is constant forever). Might be something like camera co-ordinates.

Longer questions

  1. Similarly to last supervision, write a few lines of pseudo-code for rendering with OpenGL (rasterisation):
    function draw_triangles(triangles):
    	for each triangle in triangles:
                transform triangle to screen space  # vertex shader
                for each pixel in triangle:   # rasterise
                    if this pixel is closer than the current z value:
                        calculate colour  # fragment shader
                        set colour

    notice how this is very similar to ray tracing; notice how the nested for loops are swapped which affect performance (e.g. in OpenGL subsequent colour calculations are normally on the same object, which means that corresponding textures can be cached. This is not true for ray tracing).

  2. Describe the Model, View, and Projection transformations. Comment on why we use homogeneous co-ordinates.
    • Model: transform from model co-ordinates (typically the object sitting on the origin) into world co-ordinates. Allows using the same object many times. Typically involves translation, rotation and scaling. Translation cannot be represented as a 3x3 matrix transform, but we ideally want to represent everything as matrices and their products, hence we need homogeneous co-ordinates.
    • View: transform from world co-ordinates to view co-ordinates. No space distortion, just make sure that the camera is in the origin with the forward vector pointing down the negative z axis (if using right-handed co-ordinate system). Can be thought of as the inverse of the transform that takes the origin to the camera. https://learnopengl.com/Getting-started/Camera
    • Projection: transforms the view co-ordinates to screen co-ordinates. I.e. pixels covering each other will have the same x and y co-ordinates. Can be perspective or orthographic. If perspective then this is a space-distorting transform that effectively turns the viewing frustum into a cube http://www.songho.ca/opengl/gl_projectionmatrix.html

  3. When transforming objects into world co-ordinates using matrix M , position vectors are pre-multiplied with M . Discuss whether this matrix is suitable to transform the objects' normals. If not, can you suggest an alternative?
    inverse of the transpose of the top 3x3 of M, see slide 99
  4. 2010 Paper 4 Question 4
  5. 2017 Paper 4 Question 3
  6. Describe the z buffering algorithm. Compare the projection matrix on slide 86 with the projection matrix in the 2010P4Q4 past paper, and discuss which one you need to use for Z buffering
    Z buffering involves using a screen-sized floating point or fixed point buffer which stores the z value of the nearest pixel we have seen so far. Initially each value in the z buffer might be set to infinity / max. Whenever a new pixel is drawn, we can calculate its new z value and check whether this is closer or further away. If it is closer (passes the z test) then we write the colour to the colour buffer and update the z value in the z buffer.
    The 2010P4Q4 matrix results in every vector having z=1 which is mathematically speaking is indeed projection, but makes z buffering impossible. Hence slide 86 is better.
    On another note for slide 86: this outputs 1/z rather than z which is useful for perspectively-correct texturing (see https://en.wikipedia.org/wiki/Texture_mapping#Perspective_correctness ). This means however that the z buffering algorithm might need some tweaking. E.g. initialise the z buffer to 0 (smaller than any 1/z value), and upon drawing check if this 1/z value is larger than the previous 1/z value.
    Either way, z buffering is a pretty simple brute-force, high-memory-footprint approach that is very popular.

  7. What is the worst case scenario, in terms of a number of times a pixel colour is computed, when rendering N triangles using the Z-buffer algorithm? How could we avoid such a worst-case scenario?
    Worst case scenario is that triangles overlap and we get unlucky in their ordering. I.e. if we draw the furthest first and the closest last, then z buffering will not save any colour calculations (pixel computed N colours, N-1 which are discarded).  Remembering that colour calculation can be very expensive, some games might opt to render objects that are known to be close (e.g. UI elements, or a weapon in an FPS), then the background (e.g. skybox) last.

Supervision 3


  1. How could you use the following texture types to texture a sphere in OpenGL?
    • 2D
      commonly done with Mercator projection , but there are a few alternatives. Distortions around the poles
    • 3D
      think of it as layers of 2D textures or voxels (Minecraft world shape of coloured cubes). Very expensive in terms of storage, and for a sphere it is probably not worth it (very small number of voxels actually on the surface of the sphere).
    • CUBE_MAP
      6 2D textures describing faces of a cube. For sampling, shoot a ray from the centre of the sphere and intersect with the cube. Slightly more memory usage and more expensive sampling than 2D, but fewer distortions (there are some around the cube vertices, but not as bad as Mercator poles). Much cheaper in terms of storage than 3D.
    How do these techniques compare in terms of visual quality and storage? see above

  2. For downsampling an image, explain how each of the following sampling techniques work (feel free to use khronos.org when unsure). Discuss performance, storage and visual quality.
      bookwork with many good online resource. E.g. see https://learnopengl.com/Getting-started/Textures
      GL_LINEAR_MIPMAP_LINEAR is the most expensive, but hardware is typically optimised for it, so the most commonly used. Unless we know that there is no down-sampling (e.g. showing UI elements) when nearest or linear might be preferred.

  3. Search for "normal map" images on the internet. Why do they tend have an overall blue shade?
    normal vectors have x,y,z, co-orindates where x and z range from -1 to +1 and y typically ranges from 0 to 1.  When encoding this as colour, each value is stored from 0 to 255 (assuming an 8-bit image), so e.g. x is encoded as (x+1)/2*255. Effectively mapping a neutral normal to (127, 127, 255)

  4. How could you implement a reflective water surface in OpenGL using Frame Buffer Objects? What if you wanted to add reflection onto a spherical surface? (Ray tracing is tempting, but you are to think about the OpenGL way here :) ).
    Reflection, refraction and shadows in OpenGL are done using multiple render passes , where the entire scene is rendered using multiple cameras and stored in textures before the final image is computed. For water, we can render the scene from an imaginary underwater camera and use this to compute the final image. See fun example
    For a spherical surface, we can use 6 cameras to build up a cube map and use environment mapping.

Colour / perception

  1. What is the difference between luma and luminance?
    Luma is the internal pixel intensity representation, often gamma-compressed. Luminance is a physical unit, the measure of light weighted by the achromatic response of the eye.

  2. Why is gamma correction needed?
    When sending the signal from the GPU to the monitor, the signal is often quantised to few bits (e.g. 8 bits). The eye finds it easier to distinguish small absolute differences in low luminance levels, so banding artefacts would be very visible in the dark parts of an image. Gamma correction/compression is more perceptually uniform. Gamma compression is undone by the screen during display. see slide 208

  3. What are the differences between rods and cones?
    rods operate in low-light conditions, are mostly outside the fovea (centre of the retina) and are colour blind
    cones operate in daylight condition, are mostly inside the fovea and as there are 3 types of them (L, M, S) they can encode colour.

    A fun fact here: when star gazing, you don't have enough light to trigger your cones, so you actually want to look slightly off the start (outside the fovea) where you have rods.

  4. How can two colour spectra appear the same? What are these called then?
    metamers are colour spectra which are different physically, but their LMS responses are identical. E.g. pure yellow and some combination of red+green appear identical to a human.

  5. What is the relation between LMS cone sensitivities, CIE XYZ and the RGB space of a monitor?
    These are all trichromatic colour spaces (i.e. there are 3 primary colours). As such, there are 3x3 matrices that can transform one to the other.
    If interested, take a look at the Math page of http://www.brucelindbloom.com/index.html

  6. Explain the purpose of tone-mapping and display-encoding steps in a rendering pipeline.
    slides 230-232

  7. What is the rationale behind sigmoidal tone-curves?
    slide 238-240