Download - Pc Seminar Jordi
Vi l Obj R i iVisual Object RecognitionPerceptual Computing SeminarPerceptual Computing Seminar
Sergio Escalera, Xavier Baró, Jordi Vitrià, Petia Radeva, Oriol PujolBCN Perceptual Computing Lab
Index
1. Introduction
2. Recognition with Local Features: Basics.
3 I i i SIFT3. Invariant representations: SIFT
4. Recognition as a Classification Problem: gFERNS
5 Very large databases Hashing5. Very large databases: Hashing
Visual Object Recognition Perceptual Computing Seminar Page 2
Introduction
The recognition of object categories in imagesThe recognition of object categories in imagesis one of the most challenging problems incomputer vision especially when the numbercomputer vision, especially when the numberof categories is large.
Humans are able to recognize thousands ofobject types, whereas most of the existingobject recognition systems are trained toj g yrecognize only a few.
Visual Object Recognition Perceptual Computing Seminar Page 3
Introduction
I i t i i t ill i ti “ h ” l l t t t
Visual Object Recognition Perceptual Computing Seminar Page 4
Invariance to viewpoint, illumination, “shape”, color, scale, texture, etc.
Introduction
Why do we care about recognition? (theoretical question)y g ( q )
Perception of function: We can perceive thep p3D shape, texture, material properties,without knowing about objects But thewithout knowing about objects. But, theconcept of category encapsulates alsoi f ti b t h t d ithinformation about what can we do withthose objects.
Visual Object Recognition Perceptual Computing Seminar Page 5
Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.
Introduction
Why it is hard?yFind the chair in this image Output of correlation
This is a chair
Visual Object Recognition Perceptual Computing Seminar Page 6
Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.
Introduction
Why it is hard?y
P tt h b Si l t l tFind the chair in this image Pretty much garbage; Simple template matching is not going to make it
Visual Object Recognition Perceptual Computing Seminar Page 7
Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, September 24.
IntroductionWhy do we care about recognition? (practical question)
Visual Object Recognition Perceptual Computing Seminar Page 8
IntroductionWhy do we care about recognition? (practical question)
Visual Object Recognition Perceptual Computing Seminar Page 9
IntroductionWhy do we care about recognition (practical question)?
Query Results from 5k Flickr images (demo available for 100k set)
Visual Object Recognition Perceptual Computing Seminar Page 10
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, Andrew Zisserman: Object retrieval with large vocabularies and fast spatial matching. CVPR 2007
Recognition with Local Featuresg
It is known that the visual system can use local,informative image «fragments» of a givenobject, rather than the whole object, toj , j ,classify it into a familiar category.
This approach has some advantages over holisticmethodsmethods...
Visual Object Recognition Perceptual Computing Seminar Page 11
Recognition with Local Featuresg
Holistic Fragment‐based
Visual Object Recognition Perceptual Computing Seminar Page 12
g
Recognition with Local Featuresg
Visual Object Recognition Perceptual Computing Seminar Page 13
Jay Hegde, Evgeniy Bart, and Daniel Kersten, "Fragment‐based learning of visual object categories", CurrentBiology, 2008.
Recognition with Local FeaturesgThe most basic approach is called the “bag ofwords” approach (it as inspired inwords” approach (it was inspired intechniques used by the natural languageprocessing community).
Visual Object Recognition Perceptual Computing Seminar Page 14
Recognition with Local FeaturesgAssumptions:
d d f Fragments• Independent features.
• Histogram representation.
Fragments vocabulary
(generic/class‐based etc )based, etc.)
ImageImage =
Fragments histogramhistogram
Visual Object Recognition Perceptual Computing Seminar Page 15
Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.
Recognition with Local FeaturesgA more advanced approach involves several stepssteps:
• Stage 0: Find image locations where we canreliably find correspondences with other images.
• Stage 1: Image content is transformed into localg gfeatures (that are invariant to translation,rotation, and scale).
• Stage 2: Verify if they belong to a consistentconfigurationconfiguration
Visual Object Recognition Perceptual Computing Seminar Page 16Slide credit: David Lowe
SIFTA wonderful example of these stages can be found inDavid Lowe’s (2004) “Distinctive image features fromDavid Lowe s (2004) Distinctive image features fromscale‐invariant keypoints” paper, which describes thedevelopment and refinement of his Scale Invariantdevelopment and refinement of his Scale InvariantFeature Transform (SIFT).
L l F t SIFT
Visual Object Recognition Perceptual Computing Seminar Page 17
Local Features, e.g. SIFT
Recognition with Local FeaturesgWhich local features?
?
Visual Object Recognition Perceptual Computing Seminar Page 18Slide credit: A. Efros
SIFTStage 0: How can we find image locations where we can reliably findcorrespondences with other images?
A “good” location has one stable sharp extremum.
f ff Good !
f
x
bad
x
bad
xx x x
Visual Object Recognition Perceptual Computing Seminar Page 19
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 20
SIFTStage 0: How can we find image locations where we can reliably findcorrespondences with other images?
How to compute extrema at a given scale:
1) We apply a Gaussian filter:
2) We compute a difference‐of‐Gaussians
3) We look for 3D extrema in the resulting structure.
Visual Object Recognition Perceptual Computing Seminar Page 21
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 22
SIFTThese features are invariant to location and scale
Visual Object Recognition Perceptual Computing Seminar Page 23
SIFTStage 1: Image content is transformed into local features (that are invariantto translation, rotation, and scale).
In addition to dealing with scale changes, we need todeal with (at least) in‐plane image rotation.
One way to deal with this problem is to designdescriptors that are rotationally invariant, but suchdescriptors have poor discriminability, i.e. they mapdifferent looking patches to the same descriptor.
Visual Object Recognition Perceptual Computing Seminar Page 24
SIFT
A better method is to estimate a dominantA better method is to estimate a dominantorientation at each detected keypoint.
1.Calculate histogram of local gradients in the window
2.Take the dominant orientation gradient as “up”
3.Rotate local area for computing descriptor
Visual Object Recognition Perceptual Computing Seminar Page 25
SIFTLowe:
• computes a 36‐bin histogram of edge orientationsweighted by both gradient magnitude and Gaussiandistance to the center,
• finds all peaks within 80% of the global maximum,and then
• computes a more accurate orientation estimateusing a 3‐bin parabolic fit.
Visual Object Recognition Perceptual Computing Seminar Page 26
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 27
SIFT
Local patch around descriptor from Gaussian pyramid
Gradient magnitude Gradient orientationfrom Gaussian pyramid
Visual Object Recognition Perceptual Computing Seminar Page 28
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 29
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 30
SIFTEven after compensating for translation,rotation and scale changes the localrotation, and scale changes, the localappearance of image patches will usually stillvary from image to image.
How can we make the descriptor that we matchmore invariant to such changes while stillmore invariant to such changes, while stillpreserving discriminability between different(non corresponding) patches?(non‐corresponding) patches?
Visual Object Recognition Perceptual Computing Seminar Page 31
SIFTSIFT features are formed by computing the gradient at
h l d d h d deach pixel in a 16x16 window around the detectedkeypoint, using the appropriate level of the Gaussian
id hi h h k i d dpyramid at which the keypoint was detected.
Th di t it d d i ht d b G i f ll ff f tiThe gradient magnitudes are downweighted by a Gaussian fall‐off functionin order to reduce the influence of gradients far from the center, as theseare more affected by small misregistrations.
Visual Object Recognition Perceptual Computing Seminar Page 32
SIFTIn each 4x4 quadrant, a gradient orientationhistogram is formed b (concept all ) addinghistogram is formed by (conceptually) addingthe weighted gradient value to one of 8orientation histogram bins.
Visual Object Recognition Perceptual Computing Seminar Page 33
SIFT
The resulting 128 non negative values form aThe resulting 128 non‐negative values form araw version of the SIFT descriptor vector.
To reduce the effects of contrast/gain (additivevariations are already removed by thegradient), the 128‐D vector is normalized togradient), the 128 D vector is normalized tounit length.
Visual Object Recognition Perceptual Computing Seminar Page 34
SIFTOnce we have extracted features and their descriptorsfrom two or more images the next step is to establishfrom two or more images, the next step is to establishsome preliminary feature matches between theseimagesimages.
Visual Object Recognition Perceptual Computing Seminar Page 35
SIFTOnce we have extracted features and their descriptorsfrom two or more images the next step is to establishfrom two or more images, the next step is to establishsome preliminary feature matches between theseimagesimages.
SIFT uses a nearest neighbor classifier with a distance ratiomatching criterion We can define this nearest neighbormatching criterion. We can define this nearest neighbordistance ratio as
where d1 and d2 are the nearest and second nearest neighbordistances, and DA…..DC are the target descriptor along with itsclosest two neighbors
Visual Object Recognition Perceptual Computing Seminar Page 36
closest two neighbors.
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 37
SIFT
Linear method:
The simplest way to find all correspondingfeature points is to compare all featuresagainst all other features in each pair ofpotentially matching images.
f l h d hUnfortunately, this is quadratic in thenumber of extracted features, which makes itimpractical for some applications.
Visual Object Recognition Perceptual Computing Seminar Page 38
SIFT
Nearest‐neighbor matching is the majorNearest‐neighbor matching is the majorcomputational bottleneck:
• Linear search performs dn2 operations for nfeature points and d dimensionsfeature points and d dimensions• No exact NN methods are faster than linearsearch for d>10search for d>10• Approximate methods can be much faster, butat the cost of missing some correct matchesat the cost of missing some correct matches.Failure rate gets worse for large datasets.
Visual Object Recognition Perceptual Computing Seminar Page 39
SIFT
A better approach is to devise an indexing structureA better approach is to devise an indexing structuresuch as a multi‐dimensional search tree or a hashtable to rapidly search for features near a giventable to rapidly search for features near a givenfeature.
For extremely large databases (millions of images ormore), even more efficient structures based onmore), even more efficient structures based onideas from document retrieval (e.g., vocabularytrees) can be used.trees) can be used.
Visual Object Recognition Perceptual Computing Seminar Page 40
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
The first step is to establish a set of putativeThe first step is to establish a set of putativecorrespondences.
Visual Object Recognition Perceptual Computing Seminar Page 41
SIFT
How can we discard erroneous correspondences?
Visual Object Recognition Perceptual Computing Seminar Page 42
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Once we have some hypothetical (putative)Once we have some hypothetical (putative)matches, we can use geometric alignmentt if hi h t h i li dto verify which matches are inliers andwhich ones are outliers.
Visual Object Recognition Perceptual Computing Seminar Page 43
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
• Extract features
• Compute putative matches
Visual Object Recognition Perceptual Computing Seminar Page 44
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
• Loop:– Hypothesize transformation T (using a small group of putative
matches that are related by T)
Visual Object Recognition Perceptual Computing Seminar Page 45
matches that are related by T)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
• Loop:• Loop:– Hypothesize transformation T (small group of putative matches that
are related by T)
Visual Object Recognition Perceptual Computing Seminar Page 46
– Verify transformation (search for other matches consistent with T)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Visual Object Recognition Perceptual Computing Seminar Page 47
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
2D transformation models:2D transformation models:• Similarity
(translation,(translation, scale, rotation)
• Affine
• Projective(homography)
Visual Object Recognition Perceptual Computing Seminar Page 48
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):
),( ii yx ),( ii yx
Visual Object Recognition Perceptual Computing Seminar Page 49Slide credit: S. Lazebnik
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):
t
m
m
2
1
0100
2
1
43
21
tt
yx
mmmm
yx
i
i
i
i
i
i
ii
ii
yx
tmm
yxyx
4
3
10000100
tt
2
1
Visual Object Recognition Perceptual Computing Seminar Page 50Slide credit: S. Lazebnik
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):
• Linear system with six unknowns
• Each match gives us two linearly independent equations: d l h l f h fneed at least three to solve for the transformation
parameters
C l A b i d i• Can solve Ax=b using pseduo‐inverse:
x = (ATA)‐1ATb
Visual Object Recognition Perceptual Computing Seminar Page 51Slide credit: S. Lazebnik
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):
• Linear system with six unknowns
• Each match gives us two linearly independent equations: d l h l f h fneed at least three to solve for the transformation
parameters
C l A b i d i• Can solve Ax=b using pseduo‐inverse:
x = (ATA)‐1ATb
Visual Object Recognition Perceptual Computing Seminar Page 52Slide credit: S. Lazebnik
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
The process of selecting a small set of seedmatches and then verifying a larger set isy g goften called random sampling or RANSAC.
Visual Object Recognition Perceptual Computing Seminar Page 53
RANSACRANSAC was originally formulated in Martin A. Fischler and Robert C. Bolles (June
1981). "Random Sample Consensus: A Paradigm for Model Fitting withApplications to Image Analysis and Automated Cartography". Comm. of thepp g y g p yACM 24: 381–395.
Visual Object Recognition Perceptual Computing Seminar Page 54
RANSAC“We approached the fitting problem in the opposite way from most previoustechniques. Instead of averaging all the measurements and then trying tothrow out bad ones we used the smallest number of measurements tothrow out bad ones, we used the smallest number of measurements tocompute a model’s unknown parameters and then evaluated theinstantiated model by counting the number of consistent samples”
Visual Object Recognition Perceptual Computing Seminar Page 55From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.
RANSAC
It’s easy to understand and it’s effective
• It helps solve a common problem (i.e., filter out gross errorsintroduced by automatic techniques)introduced by automatic techniques)
• The number of trials to “guarantee” a high level of success(e g 99 99 probability) is surprisingly small(e.g., 99.99 probability) is surprisingly small
• The dramatic increase in computation speed made it possibleto do a large number of trials (100s or 1000s)
• The algorithm can stop as soon as a good match is computedThe algorithm can stop as soon as a good match is computed(unlike Hough techniques that typically compute a largenumber of examples and then identify matches)
Visual Object Recognition Perceptual Computing Seminar Page 56From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.
RANSACThe basic idea is to repeat M times the following process:1. A model is fitted to the hypothetical inliers, i.e. all free parameters of theyp , pmodel are reconstructed from the data set.
2. All other data are then tested against the fitted model and, if a point fitswell to the estimated model also considered as a hypothetical inlierwell to the estimated model, also considered as a hypothetical inlier.
3. The estimated model is reasonably good if sufficiently many points havebeen classified as hypothetical inliers.
4. The model is reestimated from all hypothetical inliers, because it has onlybeen estimated from the initial set of hypothetical inliers.
5 Finally the model is evaluated by estimating the error of the inliers relative5. Finally, the model is evaluated by estimating the error of the inliers relativeto the model.
This procedure is repeated a fixed number of times, each time producingeither a model which is rejected because too few points are classified as inliersor a refined model together with a corresponding error measure. In the lattercase, we keep the refined model if its error is lower than the last saved model.
Visual Object Recognition Perceptual Computing Seminar Page 57From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.
, p
RANSAC
Visual Object Recognition Perceptual Computing Seminar Page 58
RANSAC
Line fitting example:Line fitting example:
Task:Estimate best line
Visual Object Recognition Perceptual Computing Seminar Page 59
st ate best e
RANSAC
Line fitting example:Line fitting example:
Sample two points
Visual Object Recognition Perceptual Computing Seminar Page 60
RANSAC
Line fitting example:Line fitting example:
Fit Line
Visual Object Recognition Perceptual Computing Seminar Page 61
RANSAC
Line fitting example:Line fitting example:
Total number of points within a threshold of line.
Visual Object Recognition Perceptual Computing Seminar Page 62
RANSAC
Line fitting example:Line fitting example:
Repeat, until get a good result
Visual Object Recognition Perceptual Computing Seminar Page 63
good esu t
RANSAC
Line fitting example:Line fitting example:
Repeat, until get a good result
Visual Object Recognition Perceptual Computing Seminar Page 64
good esu t
RANSAC
Visual Object Recognition Perceptual Computing Seminar Page 65
RANSAC example: translationp
Putative matches
Visual Object Recognition Perceptual Computing Seminar Page 66Slide credit: A. Efros
RANSAC example: translationp
Select onematch, count inliers
Visual Object Recognition Perceptual Computing Seminar Page 67Slide credit: A. Efros
RANSAC example: translationp
Find “average” translation vector
Visual Object Recognition Perceptual Computing Seminar Page 68Slide credit: A. Efros
RANSACInterest points( / )(500/image)
Putative correspondences (268)(268)
Outliers (117)
Inliers (151)
Final inliers (262)
Visual Object Recognition Perceptual Computing Seminar Page 69
SIFT Applicationspp
Visual Object Recognition Perceptual Computing Seminar Page 70
SIFT Applicationspp
Visual Object Recognition Perceptual Computing Seminar Page 71
SIFT Applicationspp
Visual Object Recognition Perceptual Computing Seminar Page 72
HDRSoft
SIFT Applicationspp
Visual Object Recognition Perceptual Computing Seminar Page 73
Matching and Classificationg
SIFT allows reliable real‐time recognition butat a computational cost that severely limitsthe number of points that can be handled.
A standard implementation requires 1 ms perfeature point which limits the number offeature point, which limits the number offeature points to 50 per frame if one‐requires frame rate performancerequires frame‐rate performance.
Visual Object Recognition Perceptual Computing Seminar Page 74
Matching and Classificationg
An alternative is to rely on statistical learningtechniques to model the set of possibleappearances of a patch.
The major challenge is to use simple modelsto allow for real time efficient recognitionto allow for real‐time, efficient recognition.
Visual Object Recognition Perceptual Computing Seminar Page 75
Matching and Classificationg
Can we match keypoints using simplerfeatures without intensive preprocessing?
{ }? : { … }We will assume that we have the possibilityp yto train a classifier for each keypoint class.
Visual Object Recognition Perceptual Computing Seminar Page 76
Matching and ClassificationgSimple binary features I(mi,1)
I( )I(mi,2)
The test compares the intensities of twopixels around the keypoint:pixels around the keypoint:
)I(m)if I(m ii1 21
otherwise
)I(m)if I(mf i,i,
i 01 21
Visual Object Recognition Perceptual Computing Seminar Page 77
Matching and ClassificationgWithout intensive preprocessing
We can synthetically generate the set ofkeypoint’s possible appearances undervarious perspective, lighting, noise, etc.
Visual Object Recognition Perceptual Computing Seminar Page 78
Matching and ClassificationgFERN Formulation
We model the class conditional probabilitiesof a large number of binary features whichare estimated by a training phase.y g p
At run time, these probabilities are used toAt run time, these probabilities are used toselect the best match for a given imagepatchpatch.
Visual Object Recognition Perceptual Computing Seminar Page 79
Matching and ClassificationgFERN Formulation
fi : Binary feature.
Nf : Total number of features in the model.
Ck : Class representing all views of an image patcharound a keypoint.
Given f1 ,..., f Nf select the class k such that
)|()|( CfffPfffCPk )|,,,(maxarg),,,|(maxarg 2121 kNk
Nkk
CfffPfffCPkff
Visual Object Recognition Perceptual Computing Seminar Page 80
Mustafa Ozuysal, Michael Calonder, Vincent Lepetit, Pascal Fua, "Fast Keypoint Recognition Using RandomFerns," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 99, , 2009
Matching and ClassificationgFERN Formulation
However, it is not practical to model the jointdistribution of all features. We group featuresinto small sets (fern) and assume independencebetween these sets (Semi‐Naïve BayesianClassifier):
Fj : A fern is defined to be the set of S binaryfeatures {f f +S }.features {fr ,..., fr+S }.
M is the number of ferns, Nf = S X M.
Visual Object Recognition Perceptual Computing Seminar Page 81
Matching and ClassificationgFERN Formulation
NkN CfffP f
f21 !parameters2)|,,,(
fkikN
kN
NCfPCfffP
ffffN
f
21
21
,parameters)|()|,,,(
p)|,,,(
fi
kikN fffff
121
simple. but too
,p)|()|,,,(
M
j
skjkN MCFPCfffP
f1
21 .parameters 2)|()|,,,( j 1
Visual Object Recognition Perceptual Computing Seminar Page 82
Matching and ClassificationgFERN Implementation
We generate a random set of binary features.A binary feature outputs a binary number
2
y p y
possibilities
8ibili ipossibilities
A fern with S nodes outputs a number between o and 2S‐1
Visual Object Recognition Perceptual Computing Seminar Page 83
A fern with S nodes outputs a number between o and 2 ‐1.
Matching and ClassificationgFERN Implementation
When we have multiple patches of the sameclass we can model the output of a fern witha multinomial distribution.
Probability for each possibility.a multinomial distribution. possibility.
Visual Object Recognition Perceptual Computing Seminar Page 84
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 85Slide Credit: V.Lepetit
Matching and Classificationg
0
1
1
6
Visual Object Recognition Perceptual Computing Seminar Page 86Slide Credit: V.Lepetit
Matching and Classificationg
10
01
01
1
6
Visual Object Recognition Perceptual Computing Seminar Page 87Slide Credit: V.Lepetit
Matching and Classificationg
110
001
101
1
65
Visual Object Recognition Perceptual Computing Seminar Page 88Slide Credit: V.Lepetit
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 89Slide Credit: V.Lepetit
Matching and Classificationg
N liNormalize:P ( f1, f 2 , , f n | C c i )
000001
1
001
111
Visual Object Recognition Perceptual Computing Seminar Page 90Slide Credit: V.Lepetit
Matching and ClassificationgFERN Implementation
At the end of the training we haveAt the end of the training we havedistributions over possible fern outputs foreach classeach class.
Visual Object Recognition Perceptual Computing Seminar Page 91
Matching and ClassificationgFERN Implementation
To recognize a new patch the outputs selectsTo recognize a new patch the outputs selectsrows of distributions for each fern and theseare then combined assuming independenceare then combined assuming independencebetween distributions.
Visual Object Recognition Perceptual Computing Seminar Page 92
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 93
Matching and ClassificationgFERN Implementation
…in 10 lines of code….
1: for(int i = 0; i < H; i++) P[i ] = 0.;2: for(int k = 0; k < M; k++) {3: int index = 0, * d = D + k * 2 * S;4: for(int j = 0; j < S; j++) {5: index <<= 1;6: if (*(K + d[0]) < *(K + d[1]))7: index++;8: d += 2;
}9: p = PF + k * shift2 + index * shift1;10: for(int i = 0; i < H; i++) P[i] += p[i];
}
Visual Object Recognition Perceptual Computing Seminar Page 94
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 95
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 96
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 97
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 98
Matching and Classificationg
The FERN technique speeds‐up keypointmatching but the training is slow andperformed offline.
Hence, it is not suited for applications thatrequire real‐time online learning orrequire real time online learning orincremental addition of arbitrary numbersof keypoints (f e SLAM)of keypoints (f.e. SLAM).
Visual Object Recognition Perceptual Computing Seminar Page 99
Matching and Classificationg
This limitation can be removed if we train aFERN classifier to recognize a number ofkeypoints extracted from a referencedatabase and all other keypoints aredatabase and all other keypoints arecharacterized in terms of their response tothese classification ferns (signature)these classification ferns (signature).
Visual Object Recognition Perceptual Computing Seminar Page 100
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 101
M. Calonder, V. Lepetit, and P. Fua, Keypoint Signatures for Fast Learning and Recognition. In Proceedings of European Conference on Computer Vision, 2008.
Matching and Classificationg
It can be empirically shown that theseIt can be empirically shown that thesesignatures are stable under changes inviewing conditionsviewing conditions.
Signatures are sparse in nature if we apply aSignatures are sparse in nature if we apply athreshold function.
Signatures do not need a training phase andscale well with the number of classesscale well with the number of classes(nearest neighbor).
Visual Object Recognition Perceptual Computing Seminar Page 102
Matching and Classificationg
However, matching signatures still involvesHowever, matching signatures still involvesmany more elementary operations thanabsolutely necessaryabsolutely necessary.
M l i h i iMoreover, evaluating the signatures requiresstoring many distributions of the same size asthemselves and, therefore, large amounts ofmemory.y
Visual Object Recognition Perceptual Computing Seminar Page 103
Matching and Classificationg
The full response vector r(p) for all J Ferns is takenp (p)to be: Vectors storing the
probability that p is one of the N reference points
where Z is a normalizer s.t. its elements sum to one.
the N reference points.
In practice, when p truly corresponds to one of thereference keypoints r(p) contains one element that is closereference keypoints, r(p) contains one element that is closeto one where all others are close to zero.
Otherwise it contains a few relatively large values thatOtherwise, it contains a few relatively large values thatcorrespond to reference keypoints that are similar inappearance and small values elsewhere.
Visual Object Recognition Perceptual Computing Seminar Page 104
pp
Matching and Classificationg
We can compute a sparse signature by applting ap p g y pp gpoint wise threshold function with a θ value.
It is an N‐dimensional vector with only a few non‐yzero elements that is mostly invariant to differentimaging conditions and therefore presents a usefulg g pdescriptor for matching purposes.
Visual Object Recognition Perceptual Computing Seminar Page 105
Matching and ClassificationgThe patch
J Ferns
Vectors storingVectors storing the probability that p is one of the N reference points.
Typical parameters: J=50; d=10; N=500
Visual Object Recognition Perceptual Computing Seminar Page 106
J 50; d 10; N 500
Matching and Classificationg
Typical parameters: J=50; d=10; N=500J 50; d 10; N 500
We need for each of the 2d leaves in each of the J Ferns an N‐dimensional vector of floatsdimensional vector of floats.
The total memory requirement is M=bJ2d N bytes, where b is thenumber of bytes to store a float (8) In practice 100MB!
Visual Object Recognition Perceptual Computing Seminar Page 107
number of bytes to store a float (8). In practice, 100MB!
Matching and Classificationg
Compressive Sensing literature:Compressive Sensing literature:
• High‐dimensional sparse vectors can beg preconstructed from their linear projections intomuch lower‐dimensional spaces.p
• The Johnson–Lindenstrauss lemma states that all f h h d lsmall set of points in a high‐dimensional space can
be embedded into a space of much lowerdi i i h h di bdimension in such a way that distances betweenthe points are nearly preserved.
Visual Object Recognition Perceptual Computing Seminar Page 108
Matching and Classificationg
Many kinds of matrices can be used for thisMany kinds of matrices can be used for thispurpouse.
Random Ortho‐Projection (ROP) matricesare a good choice and can be easilyconstructed by applying a Gram‐Schmidty pp y gorthonormalization process to a randommatrixmatrix.
Visual Object Recognition Perceptual Computing Seminar Page 109
Matching and Classificationg
I th ti th G S h idt iIn mathematics the Gram–Schmidt process is amethod for orthonormalizing a set of vectors in
i d t t lan inner product space, most commonlythe Euclidean space Rn.
The Gram–Schmidt process takes a finite, linearlyi d d t t S { } f k ≤ dindependent set S = {v1, …, vk} for k ≤ n andgenerates an orthogonal set S' = {u1, …, uk} that
th k di i l b f Rn Sspans the same k‐dimensional subspace of Rn as S.
Visual Object Recognition Perceptual Computing Seminar Page 110
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 111
M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 112
M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 113
M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.
Matching and Classificationg
This approach reduces the memory requirement whenstoring the models: for N=512, M=176, therequirements change from 93.75MB to 175B!The CPU time is 6.3ms per an exhaustive NN matchingof 256 points (256x256)
Visual Object Recognition Perceptual Computing Seminar Page 114
of 256 points (256x256).
Internet‐scale image databasesg
Visual Object Recognition Perceptual Computing Seminar Page 115
Min HASH
How can we find similar images inHow can we find similar images in very large datasets?
Can we get clusters from thesegimages?
Visual Object Recognition Perceptual Computing Seminar Page 116
Min HASH
Let’s suppose that we choose a LARGE bag‐Let s suppose that we choose a LARGE bagof‐words representation of our images and that we use a binary histogramthat we use a binary histogram.
Visual Object Recognition Perceptual Computing Seminar Page 117
Min HASH
Given two different images, we canGiven two different images, we cancompute their histogram intersection:
Visual Object Recognition Perceptual Computing Seminar Page 118
Min HASH
…and their histogram union:…and their histogram union:
Visual Object Recognition Perceptual Computing Seminar Page 119
Min HASH
Then we can define a set similarityThen we can define a set similaritymeasure in the following way:
That is, the number of times both images have a givenkeypoint in common divided by the total number ofkeypoint in common divided by the total number ofkeypoints that are present in both images.
Visual Object Recognition Perceptual Computing Seminar Page 120
Min HASH
Visual Object Recognition Perceptual Computing Seminar Page 121
Min HASHWe can perform clustering or matchingf d d f i h hof an unordered set of images with this
measure, but this can be used only witha limited amount of data!
The method requires
w
id2
similarity evaluations, where w is the size of the vocabulary and di is th b f i i d t
i
i1
the number of regions assigned to the i‐th visual word. Vocabulary commonly used is w=1 000 000
Visual Object Recognition Perceptual Computing Seminar Page 122
w=1.000.000.
Min HASH
From can perform clustering orFrom can perform clustering ormatching of an unordered set of imageswith this measure but this can be usedwith this measure, but this can be usedonly with a limited amount of data!
Observation: histograms for angimage are highly sparse!
Visual Object Recognition Perceptual Computing Seminar Page 123
Min HASH
The key idea of min‐hash is to mapThe key idea of min hash is to map(“hash”) each row/histogram to a smallamount of data Sig(A) (the signature)amount of data Sig(A) (the signature)such that:
• Sig(A) is small enough.• Rows A1 and A2 are highly similar ifSig(A1) is highly similar to Sig(A2).g 1 g y g 2
Visual Object Recognition Perceptual Computing Seminar Page 124
Min HASH
Useful convention: we will refer to columns asbeing of four types:
A1: 1 0 1 01A2: 1 1 0 0Type: a b c dyp
We will also use “a” as the number of columns of type a. yp
Notes: • Sim (A1 A2)=a/(a+b+c)Sim (A1 , A2)=a/(a+b+c)• Most columns are type d.
Visual Object Recognition Perceptual Computing Seminar Page 125
Min HASH• Imagine the columns permuted randomly indorder.
• Hash each row A to h(A), the number of thefi l i hi h hfirst column in which row A has a 1.
h(A ) 21 0 0 1 0
1 0 0 0 0
0 1 0 0 1
0 1 0 0 0
π h(A1)=2
h(A2)=2
The probability that h(A1) = h(A2) is1 2a/(a+b+c) = Sim (A1 , A2) (the hash agree if thefirst column with a 1 is a and disagree if it is of type b or c).
Visual Object Recognition Perceptual Computing Seminar Page 126
Min HASHIf we repeat the experiment with a new
f l l b fpermutation of columns a large number oftimes, say 512, we get a signatureconsisting of 512 column numbers for eachrow.row.
The “similarity” of these lists (fraction ofpositions in which they agree) will be veryclose to the similarity of the rows (=close to the similarity of the rows (similar signatures mean similar rows!).
Visual Object Recognition Perceptual Computing Seminar Page 127
Min HASHIn fact, it is not necessary to permute the columns: wecan hash each original column with 512 different hashcan hash each original column with 512 different hashfunctions and keep for each row the lowest hash value ofa row in which that column has a 1, independently foreach of the 512 hash functions. Then we look for thecoincidences.
1 0 0 1 0rowsignature
5 1 3 2 4
1 2 5 3 4
3 4 1 5 2
h1
h2h
h1(row)= 2
h2(row)= 1
h (row)= 33 4 1 5 2
2 5 4 1 3
h3h4
h3(row)= 3
h4(row)= 1
Visual Object Recognition Perceptual Computing Seminar Page 128
Min HASH
1 0 1 1 0
0 1 0 0 1
1 1 0 1 0
Row 1
Row 2
R 3 1 1 0 1 0
1 2 3 4 5
5 4 3 2 1
h1
h
h1(row)= 1 , 2 , 1
h2(row)= 2 1 2
Row 3
3 4 5 1 2h2h3
h2(row) 2 , 1 , 2
h3(row)= 1 , 2 , 1
Similarities:
Row‐Row Sig‐SigRow Row Sig Sig1‐2: 0/5 0/31‐3: 2/4 3/32‐3: 1/4 0/3
Visual Object Recognition Perceptual Computing Seminar Page 129
/ /
Min Hash
For efficient retrieval, the min hashes aregrouped into n‐tuples. In this example, we canform the following 2‐tuples:
h1(row)= 1 , 2 , 1h (row)= 2 1 2h2(row)= 2 , 1 , 2 h3(row)= 1 , 2 , 1h4(row)= 3 , 2 , 3
The retrieval procedure then estimates the full
h4(row) 3 , 2 , 3
similarity for only those image pairs that have atleast h identical tuples out of k tuples.
Visual Object Recognition Perceptual Computing Seminar Page 130
Min Hash
From 100k imagesFrom 100k images....
Visual Object Recognition Perceptual Computing Seminar Page 131
Min Hash
From 100k imagesFrom 100k images....
Visual Object Recognition Perceptual Computing Seminar Page 132
Min Hash
From 100k imagesFrom 100k images....
Representatives of the largest clusters
Visual Object Recognition Perceptual Computing Seminar Page 133
Min Hash
Automatic localization of different buildings
Visual Object Recognition Perceptual Computing Seminar Page 134