I want to write down what our group has learned in our final project for the Computational Photography course. We wanted to explore how to capture a 360° environment map in real world and use that to perform image-based lighting in Maya. We tried two approaches, the mirror ball unwrapping approach and the panorama stitching approach. I will mainly talk about the panorama-based approach since that is what I worked on.
I mainly followed the slides and problem sets in MIT's 6.815 Digital Computation Photography course to implement the auto-stitching part. The auto-stitching consists of several stages to produce high quality corresponding feature point pairs between two images. It mainly consists of 4 stages: corner detection, descriptor creation, correspondence search and RANSAC.
In the first stage, we use the Harris Corner Detector to produce the feature points, which involves find structure tensors, calculating responses and find the local maximums of the responses. We also need a way to describe the neighborhood of a feature point, which is called “descriptors”, to help us measure the similarity of two feature points. We just used a simple patch descriptor, which is just a $k \times k$ patch around a feature point with some gaussian blur and values normalized. After getting the descriptors, we simply used L2 distance to determine how "close" any two descriptor are and kept the descriptor pairs whose distance is under a threshold. We also used the second-best test to filter out too ambiguous matches. Finally, even after the second-best test, usually our descriptor pairs will still contain some outliers. We used RANSAC to filter out these final outliners.
RANSAC is really powerful for eliminating outliners in descriptor matches, as demonstrated in the image below. The green lines shows the inlier matches while the red lines shows the outliner matches. The blue lines shows the matches selected by RANSAC to calculated the final rotation matrix $R$ between the two images.
After being able to stitch two images together with autostitching, we still need to figure out how to compose a full panorama by stitching N images. We can either stitch the images pair by pair locally, or use some global optimization approach. Due to the time constraint, we decided to use the local approach. We found that if you first stitch all pitch angles of a single yaw, then stich images of different yaw angles, you will tend to get much less ghosting in the output image.
And this is how we automatically generated the panorama image above given a series of individual images. Compared to the mirror ball unwrapping approach, the panorama-based approach can produce environment maps with much higher resolution. However, when using the panorama-based approach, the north and south pole area will cause a lot of problems. For example, if the sky is of a single color, the corner detector won't be able to find any corner and the whole pipeline will fail. To generate the final image combining a real-world photo with a virtual scene, we had to use some tricks (such as adding a hat to the mirror ball) to hide the holes in the north and south pole area.
We further compared these two methods in this table:
|Advantages||Requires only a few photos.|
Contains the sky and ground regions naturally.
|Higher resolution output.|
Can capture up to 360 degrees in yaw.
|Disadvantages||Output resolution is low.|
Doesn’t capture the full 180 degrees in yaw (using only photos captured from 1 angle since our code does not stitch equirectangular maps generated from the ball).
|Time-consuming photo capturing process.|
Can’t stitch images with just the sky, resulting in a hole in the north pole.
And this is basically all what we have learned in this project! I want to thank my teammates, Josephine Nguyen and Yang Qi, for their amazing work. Without them I wouldn't have the confidence to tackle such a broad project during COVID. I also had a lot of fun writing a job system to parallelize our C++ image processing code, but that will probably need its own post. Also, just for reference, Greg Zaal from HDRI Haven has a great blog post explaining what equipment he used to capture environment maps. I wish our media center also had a slide like his that can let the camera rotate around the entrance pupil...