Project 2: Stereo matching and homographies

Stereo matching (50 points)

From left to right: left image, right image, disparities from simple stereo matching, ground truth.

Middlebury stereo evaluation dataset

left

right

Simple stereo matching:

(3 points). Load the left and right camera images (we suggest to use the above image pair), and convert these to float arrays.
(5 points). Establish a maximum disparity value based on the size of the image. For this input pair, a maximum disparity of about 1/3 the height of the image works well.
(10 points). Find the disparity space image (DSI), which is a 3D array with coordinates (x, y, d). This can be established by taking the difference between the color of the pixel (x, y) in the left image and a pixel that has been appropriately horizontally shifted to the left by d in the right image. Hint 1: To take a difference between colors, you can use the sum of squared differences between the RGB components of the color. Hint 2: Visualize some slices through the DSI at different values of the disparity d, to verify that objects that are either close or far have low entries in the DSI for appropriate choices of d. Hint 3: The sum of squared differences between vectors x and y is Σ_i(x_i-y_i)², where here the sum runs through the 3 color channels.
(10 points). For each disparity d, perform spatial aggregation on the DSI by using a Gaussian filter across all spatial coordinates (x, y). A Gaussian filter is available in the skimage library.
(5 points). For each pixel (x, y), choose out the disparity d that gives the smallest DSI value after the aggregation step. This gives a map of disparities d for each (x, y).
(5 points). Visualize your disparity map and compare it with the ground truth disparity map for your image pair (you can use numpy.load to load the previous ground truth .npy file). Print out the root-mean-square (RMS) distance between the ground truth disparity and your disparity. Experiment with changing the σ of the Gaussian filter, and report in the readme submitted with your assignment the least RMS distance that you can obtain by using Gaussian filtering. Also include an image visualizing the stereo matching for the Gaussian filter.
(5 points). One issue with the previous algorithm is that it does not preserve sharp edges when aggregating spatially with the Gaussian filter. Experiment with replacing the Gaussian spatial aggregation with an edge-preserving aggregation. We suggest to use the bilateral filter, which is a filter that blurs out an image while preserving strong edges. An implementation is available in OpenCV as cv2.ximgproc.jointBilateralFilter(joint, src, d, sigmaColor, sigmaSpace) -> result image (see also the C++ OpenCV documentation for this function). The joint bilateral filter blurs out a source image src (in this case, this can be the slice at each disparity d through the disparity space image) so that the blur does not cross edges in the joint image joint (in our case, this can just be the left color image). Experiment with sigmaColor and sigmaSpace to see if you can obtain a better result than Gaussian filter in terms of the RMS distance between your result and the ground truth. Report in your readme the least RMS distance from the ground truth that you can obtain by bilateral filtering. Include a visualization of the stereo matching. Hint: the joint bilateral filter prefers a greyscale joint image in float32 with a maximum value not exceeding 1.
(7 points). A second issue with the previous algorithm is that it does handle occluded or mismatched regions well. Implement what is called a left-right consistency check. This performs two stereo matchings, from the left image to the right image, and from the right image to the left image, and checks if these are consistent. Check if corresponding disparities differ by more than a threshold number of pixels (I used a threshold of 15). (Hint: if we look up a given pixel (x, y) in the left disparity image, make sure to shift it left appropriately when looking it up in the right disparity image). If so, then label the pixel as mismatched/occluded. Visualize the disparities (for the left image matched to the right image) only for the pixels that are not mismatched/occluded. You should get a visualization such as:

Compute the RMS distance between your result and ground truth only for pixels that are not labelled as mismatched/occluded, and report this in your readme. You should find that this RMS distance is lower than the previous one.

Panorama stitching using homographies (50 points)

sparse feature descriptors / SIFT


Input A	Input B	Resulting Composite

At left: two input images A and B. The rightmost image is the desired result of the program: a composite of images A and B warped to coordinate system of image A.

Steps:

(2 points). Load images A and B using cv2.imread().

Hint 1: OpenCV prefers the images be kept in the default 8-bit unsigned int format to work with SIFT.

Hint 2: OpenCV weirdly uses a BGR format (blue is channel 0, green is channel 1, red is channel 2), which is the opposite of most of the other Python packages. If you are getting red and blue colors mixed up when you save or display an image A, you can use A[:,:,::-1] to flip the order of the channels.

(5 points). Compute keypoints and SIFT descriptors for images A and B. Match from image A keypoints to image B keypoints. Each match gives a correspondence between a 2D SIFT keypoint in image A and one in image B. Reject matches that fail the ratio test (from the slide titled "Feature-space outlier rejection").

To make this easier, you can just follow along this OpenCV code that shows how to do this. (Note that in their code, the line assigning sift should be changed to: sift = cv2.xfeatures2d.SIFT_create() for OpenCV version 3+. You can check your OpenCV version by printing cv2.__version__ in Python).

(5 points). Write a function that applies a homography matrix H to a point (x_b, y_b) in image B, returning a point (x_a, y_a) in image A. The homography is discussed on the slide titled "Homographies." As a reminder, if the homography H is:

Then the mapping from a point (x_b, y_b) in image B to a point (x_a, y_a) in image A is:

(1)

(10 points). Write a function that fits a homography. This function should take a list of four 2D points in image A and a list of four 2D points in image B and return the 3x3 matrix for the homography mapping from image B points to image A points.

By multiplying the mapping equations (1) through by their denominators, you can obtain a linear system of 8 equations in the 8 unknowns (a, b, c, d, e, f, g, h). Construct an 8x8 matrix A and length 8 vector b for this system so that the unknowns can be solved by the linear system Ax = b. You can use a linear algebra routine to solve this system (e.g. numpy.linalg.lstsq).

You can check that your homography fitting routine works correctly by using the image A points [(0, 0), (1, 0), (0, 1), (1, 1)], image B points [(1, 2), (3, 2), (1, 4), (3, 4)], which should result in the homography matrix [[0.5, 0, -0.5], [0, 0.5, -1], [0, 0, 1]].

Hint: We recommend to pass in the four points in image A and in image B as an array, initialize the matrix A and vector b with numpy.zeros, and populate these arrays by a single loop through all four points, with two rows (corresponding to two equations) populated for each point. This will avoid an unnecessary explosion of code/algebra.

(20 points). Write a function that uses RANSAC to compute the best homography mapping image B coordinates to image A coordinates. Consult the slides titled "RANSAC" and "RANSAC for estimating homography" for a description of how to do this. RANSAC runs a loop for some fixed number of iterations, with the following steps inside the loop:

(2 points). Use random.sample to randomly sample four pairs of matched features (from the matches you computed earlier).
(5 points). Fit a homography H that maps from image B points to image A points, using the function you implemented in step 4. Note the (x, y) position of keypoint i can be extracted with (key_list[i].pt[0], key_list[i].pt[1]).
(10 points). Count inliers. Loop through all matched pairs. If (a_i, b_i) are the image A and B points in a matched pair, then consider the pair as an inlier if || Hb_i - a_i || < ε, where ε is a threshold in pixels to account for noise. To apply the homography, you can simply use the function you implemented earlier in step 3.

(3 points) RANSAC should then return the homography H corresponding to the largest set of inliers. You can skip the optional process of refitting the model H using all of the inliers.

(5 points). Warp image B to image A's coordinate system by applying the homography H, and composite the two images. We supply a helper routine that you can use to do this for you:

import skimage, skimage.transform, numpy, numpy.linalg

def composite_warped(a, b, H):
    "Warp images a and b to a's coordinate system using the homography H which maps b coordinates to a coordinates."
    out_shape = (a.shape[0], 2*a.shape[1])                               # Output image (height, width)
    p = skimage.transform.ProjectiveTransform(numpy.linalg.inv(H))       # Inverse of homography (used for inverse warping)
    bwarp = skimage.transform.warp(b, p, output_shape=out_shape)         # Inverse warp b to a coords
    bvalid = numpy.zeros(b.shape, 'uint8')                               # Establish a region of interior pixels in b
    bvalid[1:-1,1:-1,:] = 255
    bmask = skimage.transform.warp(bvalid, p, output_shape=out_shape)    # Inverse warp interior pixel region to a coords
    apad = numpy.hstack((skimage.img_as_float(a), numpy.zeros(a.shape))) # Pad a with black pixels on the right
    return skimage.img_as_ubyte(numpy.where(bmask==1.0, bwarp, apad))    # Select either bwarp or apad based on mask

Note that this routine assumes it is given unsigned 8-bit integer input images and returns an image in the same format.

(3 points). Include in your readme the best homography matrix found by RANSAC and the composite panorama image.

(Optional extra credit: 5 points). The final composite image does not look that great because it has a seam in it. Determine the overlap region between the two images. Inside this overlap region, modify the function composite_warped to use linear blending to smoothly interpolate each pixel's color between the colors drawn from a's image and the corresponding colors drawn from the warped b image (bwarp). In particular, you can determine the distances to the region where only a pixels exist, and the region where only b pixels exist using a distance transform, and use these distances to make a smooth interpolation.

Project 2: Stereo matching and homographies

CS 4501 -- Introduction to Computer Vision

Due: Fri, Mar 17 (11:59 PM)

Assignment Overview

Stereo matching (50 points)

Panorama stitching using homographies (50 points)

Policies

Submission