General Information

Student: Andrea Burns
Office: 432 CoRE
School: Tulane University
E-mail: ab1748@scarletmail.rutgers.edu
Project: Machine Learning from Multimodal Data

Project Description

Traditionally, many applications in machine learning, ranging from object detection and classification to human activity recognition and surveillance, have relied on Red-Green-Blue (RGB) images and videos for decision making. Recent advances in sensor technology, however, have enabled us to gather multimodal imaging and video data that goes beyond simple RGB data; examples of such data include ultraviolet (UV), infrared (IR), and LIDAR data. While one expects that joint exploitation of these multimodal data would lead to better outcomes, most machine learning algorithms are incapable of such joint exploitation. The INSPIRE lab at Rutgers ECE has received government funding to procure sensors that enable acquisition of data in seven different modalities. The goal of this REU project is to curate datasets using these sensors and develop machine learning algorithms for exploitation of these multimodal datasets for applications such as object detection, classification, and human activity recognition.

Weekly Log

Week 1:
For the first week of the REU, there are two main goals. First, I need to familiarize myself with the different cameras we will be using to capture objects or environments with. These cameras will be our different modalities, so it is important to understand each of their capabilities in order to use them appropriately in our research. After logging information about the cameras, I can start taking sample pictures to see how their settings affect the image and what information/data we can make use of from each mode. Secondly, I will be finding multimodal datasets online to practice training and testing multimodal data with classification machine learning (supervised) algorithms. This practice will be necessary before I can curate and classify my own multimodal datasets.
Week 2:
This week involves a continuation of my initial goals. I have spent some time in the BME lab with the different cameras, and it has been a bit of a slow start as I have to investiage the software quirks of each camera, how to save the images, and some required an initial set up. The difficulty at hand for image capture is that ideally we would like these pictures to be taken "simultaneously," which is not possible due to physical constraints. I am trying to minimize error and differences between all images I take in different modes, which becomes increasingly difficult as I switch cameras which have different sizes, which affects the overall focal length and thus the viewing frame. I have had to adjust the physical location of the camera and then adjust the focus of the camera as well. I now have taken pictures of several objects with the Point Grey monochromatic visible light camera, and used a handful of filters with a 10nm range between the values of 405-730nm. In addition to this camera, I have used the PCO Ultraviolet camera, but it actually has higher sensitivity for lower range wavelengths (such as below 150-200). The only filter I have in the UV range (which is 100-400nm) is 355nm, which is a wavelength the camera does not have high quantum efficiency for (approximately 30% efficiency). Thus, when I used this filter, the image was completely black and I was only able to take one image with the lens with no filter of my current set of objects. There has been some difficulting beginning to classify multimodal data, as techniques differ vastly depending on the purpose and context. I need to refocus on data-driven feature learning, which is a key part of the process between capturing images or having a ready image dataset and training a classifier. As of now, we are planning on using a KSVM (Kernel Support Vector Machine) learning algorithm. I will be meeting with two graduate students this week to clarify some more theoretical confusions, as Dr. Bajwa is out of town.
Week 3:
My current task is to replicate some results from the paper Hyperspectral Image Classification Using Dictionary-Based Sparse Representation by Yi Chen, Nasser M. Nasrabadi, and Trac D. Tran. I downloaded the AVIRIS hyperspectral dataset of the Indian Pines, and am practicing classification of it using a kernel SVM. Although the paper presents a new dictionary learning algorithm for feature learning, it is not necessary for my purposes as the data is labeled on a pixel by pixel level. I have groundtruth information stored for each pixel, and I am going to follow the procedure of the paper as closely as possible to replicate results (they compare their new algorithm to an SVM model, as well as others). The data contains 200 different modes/spectral ranges of the same image, with each of the 200 bands covering a 10nm wavelength range. My goal is to train and test the samples for several different sections of the 200 bands, to see how differently wavelengths ultimately affect accuracy of the classification. This practice is crucial for understanding how hyperspectral data can benefit our research, as well as viewing data by the level of labeling we are interested in. We may move to image-level labels, which will result in the need for feature learning. Additionally, this week I will finish familiarizing myself with the two other cameras, the SWIR camera (short wave infrared) as well as the MWIR/LWIR (medium and long) camera which changes between MWIR and LWIR by change of lens. There have been some software issues with the SWIR camera that we are still trying to tweak.
Week 4:
The focus of this week was finishing the SVM model of the AVIRIS data set. I consistently was finding much lower accuracy levels than the paper's results, and I needed to pinpoint the problem before I could move onto my own data curation, as it is important that I understand how the model is working and what components must be adjusted for useful results. I tried adjusting the training and testing sample sizes, as the paper peculiarly only used 10% for training and 90% for testing, and these proportions are usually reversed, as we want to be able to best fit a model before testing it. Additionally, as sample sizes are usually limited, the testing portion tends to be small. I also learned that the parameters used in the optimization problem which forms the SVM's hyperplane can greatly affect the accuracy. If they are not tuned correctly, some component may be greatly out-weighing other data's influence in creating the model. I used cross-validation to investigate the effects of C (which affects the degree of influence each training sample has over the fit). This can also occur when sample sizes between the classes differ greatly. I also learned about the pros and cons of normalizing data. At first I tried to normalize the data thinking it would help the model, to then realize it actually was slightly detrimental. Depending on one's data, normalization can actually lose important information that could have been useful in their classifcation (say if one class is well defined by having extremely high values, when normalized it may not show this extremity). I also switched between one against one and one against all multiclass strategies. One against one creates n(n-1)/2 binary classifiers, and determines a sample's classification based on a majority vote, whereas there are n classifiers for one against all, where the classification occurs between one class and the rest combined (n being the number of classes). The paper uses one against one, but in my experience it seemed less logical when dealing with 16 classes. I used one against rest which improved the accuracy, but not significantly. It turned out to be none of these possibilities. I actually was just randomizing the training and testing portions of the data in a funky way, and when I used 5-fold cross-validation to randomly determine the training and testing groups, the accuracy immediately matched the results of the paper. Although I spent a significant amount of time debugging and trying things that were completely unrelated to the source of the problem, I ultimately learned much more about SVM in practice and ways to deal with unfriendly data (which most data is).
Week 5:
This week I was able to refocus to the initial goal of using the hyperspectral AVIRIS Indian Pines data set. Now that I had successfully set up my code with the SVM for all 200 spectral bands, I wanted to see how using different spectral ranges would affect the accuracy between the same amount of bands being used to fit and train the SVM. I tried different spectral ranges using 20, 40, 50, 100, 150 (as well as 200) bands at a time, differing their spectral ranges to see if certain ranges improve or worsen prediction accuracy. It appears that the ranges do noticeably affect accuracy, as I presumed they would. However, which ranges increase accuracy are not very consistent and are definitely reliant on the data. This data is hyperspectral imaging of crops, so certain crops must be more easily defined in certain spectral wavelengths, but this could easily change if given a different data set with different intrinsic properties. In addition to doing this process on raw data, I'll be applying the same investigation to the data once PCA has been performed. I'm still looking at how the different number of components chosen for "directions" from the PCA affects the accuracy with 200 bands, but once this is finished I can continue experimenting with a lower number of bands and differing ranges. After I finish this, I'll try to do this process with 2-3 other data sets to see if there are any other noticeable patterns. I also started imaging for my own experiment. I want to see if using multimodality will improve the classification of liquids. My plan is to have 12 classes: empty, water, rubbing alcohol, hydrogen peroxide, seltzer water, Coca Cola, Ginger Ale, cranberry juice, orange juice, milk, lemonade vitamin water (it is white), and almond milk. I have gotten through the visible light imaging and next week I'll be moving onto the UV, SWIR, MWIR, and LWIR cameras.
Week 6:
This week I replicated the same process on the data after applying PCA. Similar results followed, showing that the number of bands greatly impacts accuracy, and the range (within the same number of bands) has some slight affect on the accuracy as well. After finishing this, I needed to investigate how to resolve some physical limitations of the cameras I am able to work with. Hypothetially or ideally, these images should be replicating the design of a hyperspectral data set, where each image is aligned and simultaneous. This cannot be achieved under my circumstances, so I have to minimize the amount of error between each image. I have to account for differences in resolution, the moving of the cameras, the different size of the cameras and their corresponding focal lengths, the view frame/alignment of all of the images, as well as the pixel value bit size (8,12,16 bit integers). I added something similar to a QR code to the corners of my set up to ensure a similar viewing frame size, and I'll be doing some processing of the images in matlab to resize them to the lowest resolution and align them accordingly after. Because I added this and had not been perfecting the alignment process, I had to restart my imaging. The plan is to finish my data collection and processing next week before I apply feature learning and image classification again.
Week 7:
This week consisted of two main goals: to create, perfect and give my final presentation on my project, as well as to finish my data collection and post-processing. I worked on my PowerPoint presentation over the weekend and duration of the week, practicing with friends and solidifying the results I wanted to share. A decent portion of the week was taken up by this, as these presentations were between ten and twelve minutes. Additionally, time on Thursday and Friday was spent observing all of the other presentations, in addition to our final lunch together on Friday. Putting aside the presentations, I finally finished capturing all of my images in the BME lab for all 12 liquid classes. I was able to write up a short Matlab script for the post-processing, where I resized/reshaped all of my images to the lowest resultion of 480 x 640, converted all pixel values to 8 bit integers, and then registered them relative to our "reference"/visible light image. The registration is the alignment process which strives to create more simultaneous imaging conditions. I created a 3D matrix for each classes' images which I saved in a mat file, and will start on the classification process next week.
Week 8:
This week has been focused on the analysis of my data. I wanted to classify the 12 liquids (or rather 11 liquids and one empty class). I performed PCA on the 12 classes, where each labeled sample consisted of a 12 by 1 vector representing a pixel's values in each of the 12 bands. I croppped the images to only contain the bottle containing said liquid, so that the pixels being used as samples contained information of the liquid, rather than including the whole image background, etc. After performing PCA, I could usually use the maximal, maximal-1, or maximal-2 number of components created and maintain 95% or more of the variance captured in the data. I performed classification on all classes and looked at the effects of using 12-5 bands of the total 12, and different ranges. I have results I plan on writing up next week, summarizing my findings.
Week 9:
For the final week, the focus is on report writing. I have written up my final paper detailing my work and results for the past nine weeks. Additionally, I wrote a tutorial-like paper with instructions on how I did all of my work. This will be used by future participants of this project. Lastly, I put all of my work, papers, code, and data onto a hard drive for safe keeping and for future access.


Additional Information