SIFT/SURF BoW for big number of clusters

If you spend some time browsing, there are some examples already available for Python SIFT/SURF bag of words (BoW) classifier in the internet. They use clustering (usually K-Means) to build dictionary of visual vocabularies (usually with sklearn or cv2 clustering library) of SIFT/SURF features. However, most of the sample codes that I found can’t properly handle big number(> 100) of vocabularies/clusters, while some papers (such as this one) shows best result are achieved using 2000+ clusters.

Building visual dictionary using cv2.BOWKMeansTrainer is super slow when using > 100 clusters. While using sklearn.cluster.KMeans solves the speed issue, it requires huge amount of memory (8 GB of RAM is still insufficient to handle > 400 clusters). That’s where klearn.cluster.MiniBatchKMeans comes into picture.

import cv2
import numpy as np
import progressbar
from sklearn.cluster import MiniBatchKMeans

...

def build_dictionary(xfeatures2d, dir_names, file_paths, dictionary_size):
  print('Computing descriptors..')        
  desc_list = []
  num_files = len(file_paths)
  bar = progressbar.ProgressBar(maxval=num_files).start()
  for i in range(num_files):
    p = file_paths[i]
    image = cv2.imread(p)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    kp, dsc = xfeatures2d.detectAndCompute(gray, None)
    desc_list.extend(dsc)
    bar.update(i)
  bar.finish()

  print('Creating BoW dictionary using K-Means clustering with k={}..'.format(dictionary_size))
  dictionary = MiniBatchKMeans(n_clusters=dictionary_size, batch_size=100, verbose=1)
  dictionary.fit(np.array(desc_list))
  return dictionary

...

# usage example
sift = cv2.xfeatures2d.SIFT_create()
dir_names = ['class1', 'class2', ...]
file_paths = ['/data/class1/1.jpg', '/data/class1/2.jpg', ..., '/data/class2/1.jpg', '/data/class2/2.jpg', ...]
dictionary_size = 2800
dictionary = build_dictionary(sift, dir_names, file_paths, dictionary_size)

Using the code above, I was able to complete whole process of raw data (2000+ images) preprocessing, building SIFT dictionary, cross-validating 6 classifiers within 52 minutes (still quite long, but acceptable 😜). While using SURF, it takes around 45 minutes. The complete main files for both experiments can be found in here and here.

Leave a Reply

Your email address will not be published. Required fields are marked *