Computer vision

We encode each element in a greyscale or RGB matrix (which correspond to a pixel) but individual pixels are impossible to extract information from.

Pixels however have certain properties

This means that we cannot use linear operations (which we have been using, see Linear regression) as:

This is solved by using Convolution, resulting in a new image Y where each output pixel represents the sum of the product of corresponding image pixel and the filter, Y=f(X,K)

Using convolution

  1. 2D matrix X of size h×w with pixel values (Xi,j)[0,255]
  2. Convolution kernel K of size k×k with values (Ki,j)R, which produces an image Y of size h×w. k is often odd sized

i[1,h],j[1,w],

Yi,j=1k2m=1kn=1kXi+m1,j+n1Km,n

2025-03-06_01-20-11_Computer vision_Convolution example.png

1k2 is the normalisation term and is sometimes removed (and we normalise the kernel instead)

# Our convolution function
def convolution(image, kernel):
    # Flip the kernel (optional)
    kernel = np.flipud(np.fliplr(kernel))

    # Get the dimensions of the image and kernel
    image_rows, image_cols = image.shape
    kernel_rows, kernel_cols = kernel.shape

    # Convolve using Numpy
    output = correlate(image, kernel, mode = 'valid')

    # Note that this is equivalent to this
    """
    # Loop through the image, applying the convolution
    output = np.zeros_like(image)
    for x in range(image_rows - kernel_rows + 1):
        for y in range(image_cols - kernel_cols + 1):
            output[x, y] = (kernel * image[x:x+kernel_rows, y:y+kernel_cols]).sum()
    """

    return output
# Display image in matplotlib
plt.imshow(image_conv1, cmap = 'gray')
plt.show()

Flipping the kernel

Answer here, will revisit someday