pc seminar jordi

Post on 18-Dec-2014

866 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Vi l Obj R i iVisual Object RecognitionPerceptual Computing SeminarPerceptual Computing Seminar

Sergio Escalera,  Xavier Baró, Jordi Vitrià, Petia Radeva, Oriol PujolBCN Perceptual Computing Lab

Index

1. Introduction

2. Recognition with Local Features: Basics. 

3 I i i SIFT3. Invariant representations: SIFT

4. Recognition as a Classification Problem: gFERNS

5 Very large databases Hashing5. Very large databases: Hashing

Visual Object Recognition                 Perceptual Computing Seminar                        Page 2

Introduction

The recognition of object categories in imagesThe recognition of object categories in imagesis one of the most challenging problems incomputer vision especially when the numbercomputer vision, especially when the numberof categories is large.

Humans are able to recognize thousands ofobject types, whereas most of the existingobject recognition systems are trained toj g yrecognize only a few.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 3

Introduction

I i t i i t ill i ti “ h ” l l t t t

Visual Object Recognition                 Perceptual Computing Seminar                        Page 4

Invariance to viewpoint, illumination, “shape”, color, scale, texture, etc.

Introduction

Why do we care about recognition? (theoretical question)y g ( q )

Perception of function: We can perceive thep p3D shape, texture, material properties,without knowing about objects But thewithout knowing about objects. But, theconcept of category encapsulates alsoi f ti b t h t d ithinformation about what can we do withthose objects.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 5

Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.

Introduction

Why it is hard?yFind the chair in this image Output of correlation

This is a chair

Visual Object Recognition                 Perceptual Computing Seminar                        Page 6

Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.

Introduction

Why it is hard?y

P tt h b Si l t l tFind the chair in this image  Pretty much garbage; Simple template matching is not going to make it

Visual Object Recognition                 Perceptual Computing Seminar                        Page 7

Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, September 24.

IntroductionWhy do we care about recognition? (practical question)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 8

IntroductionWhy do we care about recognition? (practical question)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 9

IntroductionWhy do we care about recognition (practical question)?

Query Results from 5k Flickr images (demo available for 100k set)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 10

James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, Andrew Zisserman: Object retrieval with large vocabularies and fast spatial matching. CVPR 2007

Recognition with Local Featuresg

It is known that the visual system can use local,informative image «fragments» of a givenobject, rather than the whole object, toj , j ,classify it into a familiar category.

This approach has some advantages over holisticmethodsmethods...

Visual Object Recognition                 Perceptual Computing Seminar                        Page 11

Recognition with Local Featuresg

Holistic Fragment‐based

Visual Object Recognition                 Perceptual Computing Seminar                        Page 12

g

Recognition with Local Featuresg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 13

Jay Hegde, Evgeniy Bart, and Daniel Kersten, "Fragment‐based learning of visual object categories", CurrentBiology, 2008.

Recognition with Local FeaturesgThe most basic approach is called the “bag ofwords” approach (it as inspired inwords” approach (it was inspired intechniques used by the natural languageprocessing community).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 14

Recognition with Local FeaturesgAssumptions:

d d f Fragments• Independent features.

• Histogram representation.

Fragments vocabulary

(generic/class‐based etc )based, etc.)

ImageImage =

Fragments histogramhistogram

Visual Object Recognition                 Perceptual Computing Seminar                        Page 15

Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.

Recognition with Local FeaturesgA more advanced approach involves several stepssteps:

• Stage 0: Find image locations where we canreliably find correspondences with other images.

• Stage 1: Image content is transformed into localg gfeatures (that are invariant to translation,rotation, and scale).

• Stage 2: Verify if they belong to a consistentconfigurationconfiguration

Visual Object Recognition                 Perceptual Computing Seminar                        Page 16Slide credit: David Lowe

SIFTA wonderful example of these stages can be found inDavid Lowe’s (2004) “Distinctive image features fromDavid Lowe s (2004) Distinctive image features fromscale‐invariant keypoints” paper, which describes thedevelopment and refinement of his Scale Invariantdevelopment and refinement of his Scale InvariantFeature Transform (SIFT).

L l F t SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 17

Local Features, e.g. SIFT

Recognition with Local FeaturesgWhich local features?

?

Visual Object Recognition                 Perceptual Computing Seminar                        Page 18Slide credit: A. Efros

SIFTStage 0: How can we find image locations where we can reliably findcorrespondences with other images?

A “good” location has one stable sharp extremum.

f ff Good !

f

x

bad

x

bad

xx x x

Visual Object Recognition                 Perceptual Computing Seminar                        Page 19

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 20

SIFTStage 0: How can we find image locations where we can reliably findcorrespondences with other images?

How to compute extrema at a given scale:

1) We apply a Gaussian filter:

2) We compute a difference‐of‐Gaussians

3) We look for 3D extrema in the resulting structure. 

Visual Object Recognition                 Perceptual Computing Seminar                        Page 21

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 22

SIFTThese features are invariant to location and scale

Visual Object Recognition                 Perceptual Computing Seminar                        Page 23

SIFTStage 1: Image content is transformed into local features (that are invariantto translation, rotation, and scale).

In addition to dealing with scale changes, we need todeal with (at least) in‐plane image rotation.

One way to deal with this problem is to designdescriptors that are rotationally invariant, but suchdescriptors have poor discriminability, i.e. they mapdifferent looking patches to the same descriptor.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 24

SIFT

A better method is to estimate a dominantA better method is to estimate a dominantorientation at each detected keypoint.

1.Calculate histogram of local gradients in the window

2.Take the dominant orientation gradient as “up”

3.Rotate local area for computing descriptor

Visual Object Recognition                 Perceptual Computing Seminar                        Page 25

SIFTLowe:

• computes a 36‐bin histogram of edge orientationsweighted by both gradient magnitude and Gaussiandistance to the center,

• finds all peaks within 80% of the global maximum,and then

• computes a more accurate orientation estimateusing a 3‐bin parabolic fit.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 26

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 27

SIFT

Local patch around descriptor from Gaussian pyramid

Gradient magnitude Gradient orientationfrom Gaussian pyramid

Visual Object Recognition                 Perceptual Computing Seminar                        Page 28

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 29

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 30

SIFTEven after compensating for translation,rotation and scale changes the localrotation, and scale changes, the localappearance of image patches will usually stillvary from image to image.

How can we make the descriptor that we matchmore invariant to such changes while stillmore invariant to such changes, while stillpreserving discriminability between different(non corresponding) patches?(non‐corresponding) patches?

Visual Object Recognition                 Perceptual Computing Seminar                        Page 31

SIFTSIFT features are formed by computing the gradient at

h l d d h d deach pixel in a 16x16 window around the detectedkeypoint, using the appropriate level of the Gaussian

id hi h h k i d dpyramid at which the keypoint was detected.

Th di t it d d i ht d b G i f ll ff f tiThe gradient magnitudes are downweighted by a Gaussian fall‐off functionin order to reduce the influence of gradients far from the center, as theseare more affected by small misregistrations.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 32

SIFTIn each 4x4 quadrant, a gradient orientationhistogram is formed b (concept all ) addinghistogram is formed by (conceptually) addingthe weighted gradient value to one of 8orientation histogram bins.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 33

SIFT

The resulting 128 non negative values form aThe resulting 128 non‐negative values form araw version of the SIFT descriptor vector.

To reduce the effects of contrast/gain (additivevariations are already removed by thegradient), the 128‐D vector is normalized togradient), the 128 D vector is normalized tounit length.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 34

SIFTOnce we have extracted features and their descriptorsfrom two or more images the next step is to establishfrom two or more images, the next step is to establishsome preliminary feature matches between theseimagesimages.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 35

SIFTOnce we have extracted features and their descriptorsfrom two or more images the next step is to establishfrom two or more images, the next step is to establishsome preliminary feature matches between theseimagesimages.

SIFT uses a nearest neighbor classifier with a distance ratiomatching criterion We can define this nearest neighbormatching criterion. We can define this nearest neighbordistance ratio as

where d1 and d2 are the nearest and second nearest neighbordistances, and DA…..DC are the target descriptor along with itsclosest two neighbors

Visual Object Recognition                 Perceptual Computing Seminar                        Page 36

closest two neighbors.

SIFT

Visual Object Recognition                 Perceptual Computing Seminar                        Page 37

SIFT

Linear method:

The simplest way to find all correspondingfeature points is to compare all featuresagainst all other features in each pair ofpotentially matching images.

f l h d hUnfortunately, this is quadratic in thenumber of extracted features, which makes itimpractical for some applications.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 38

SIFT

Nearest‐neighbor matching is the majorNearest‐neighbor matching is the majorcomputational bottleneck:

• Linear search performs dn2 operations for nfeature points and d dimensionsfeature points and d dimensions• No exact NN methods are faster than linearsearch for d>10search for d>10• Approximate methods can be much faster, butat the cost of missing some correct matchesat the cost of missing some correct matches.Failure rate gets worse for large datasets.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 39

SIFT

A better approach is to devise an indexing structureA better approach is to devise an indexing structuresuch as a multi‐dimensional search tree or a hashtable to rapidly search for features near a giventable to rapidly search for features near a givenfeature.

For extremely large databases (millions of images ormore), even more efficient structures based onmore), even more efficient structures based onideas from document retrieval (e.g., vocabularytrees) can be used.trees) can be used.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 40

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

The first step is to establish a set of putativeThe first step is to establish a set of putativecorrespondences.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 41

SIFT

How can we discard erroneous correspondences?

Visual Object Recognition                 Perceptual Computing Seminar                        Page 42

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Once we have some hypothetical (putative)Once we have some hypothetical (putative)matches, we can use geometric alignmentt if hi h t h i li dto verify which matches are inliers andwhich ones are outliers.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 43

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

• Extract features

• Compute putative matches

Visual Object Recognition                 Perceptual Computing Seminar                        Page 44

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

• Loop:– Hypothesize transformation T (using a small group of putative 

matches that are related by T)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 45

matches that are related by T)

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

• Loop:• Loop:– Hypothesize transformation T (small group of putative matches that 

are related by T)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 46

– Verify transformation (search for other matches consistent with T)

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 47

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

2D transformation models:2D transformation models:• Similarity

(translation,(translation, scale, rotation)

• Affine

• Projective(homography)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 48

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):

),( ii yx ),( ii yx

Visual Object Recognition                 Perceptual Computing Seminar                        Page 49Slide credit: S. Lazebnik

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):

t

m

m

2

1

0100

2

1

43

21

tt

yx

mmmm

yx

i

i

i

i

i

i

ii

ii

yx

tmm

yxyx

4

3

10000100

tt

2

1

Visual Object Recognition                 Perceptual Computing Seminar                        Page 50Slide credit: S. Lazebnik

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):

• Linear system with six unknowns

• Each match gives us two linearly independent equations: d l h l f h fneed at least three to solve for the transformation 

parameters

C l A b i d i• Can solve Ax=b using pseduo‐inverse:

x = (ATA)‐1ATb      

Visual Object Recognition                 Perceptual Computing Seminar                        Page 51Slide credit: S. Lazebnik

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):

• Linear system with six unknowns

• Each match gives us two linearly independent equations: d l h l f h fneed at least three to solve for the transformation 

parameters

C l A b i d i• Can solve Ax=b using pseduo‐inverse:

x = (ATA)‐1ATb      

Visual Object Recognition                 Perceptual Computing Seminar                        Page 52Slide credit: S. Lazebnik

SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.

The process of selecting a small set of seedmatches and then verifying a larger set isy g goften called random sampling or RANSAC.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 53

RANSACRANSAC was originally formulated in Martin A. Fischler and Robert C. Bolles (June

1981). "Random Sample Consensus: A Paradigm for Model Fitting withApplications to Image Analysis and Automated Cartography". Comm. of thepp g y g p yACM 24: 381–395.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 54

RANSAC“We approached the fitting problem in the opposite way from most previoustechniques. Instead of averaging all the measurements and then trying tothrow out bad ones we used the smallest number of measurements tothrow out bad ones, we used the smallest number of measurements tocompute a model’s unknown parameters and then evaluated theinstantiated model by counting the number of consistent samples”

Visual Object Recognition                 Perceptual Computing Seminar                        Page 55From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.

RANSAC

It’s easy to understand and it’s effective

• It helps solve a common problem (i.e., filter out gross errorsintroduced by automatic techniques)introduced by automatic techniques)

• The number of trials to “guarantee” a high level of success(e g 99 99 probability) is surprisingly small(e.g., 99.99 probability) is surprisingly small

• The dramatic increase in computation speed made it possibleto do a large number of trials (100s or 1000s)

• The algorithm can stop as soon as a good match is computedThe algorithm can stop as soon as a good match is computed(unlike Hough techniques that typically compute a largenumber of examples and then identify matches)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 56From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.

RANSACThe basic idea is to repeat M times the following process:1. A model is fitted to the hypothetical inliers, i.e. all free parameters of theyp , pmodel are reconstructed from the data set.

2. All other data are then tested against the fitted model and, if a point fitswell to the estimated model also considered as a hypothetical inlierwell to the estimated model, also considered as a hypothetical inlier.

3. The estimated model is reasonably good if sufficiently many points havebeen classified as hypothetical inliers.

4. The model is reestimated from all hypothetical inliers, because it has onlybeen estimated from the initial set of hypothetical inliers.

5 Finally the model is evaluated by estimating the error of the inliers relative5. Finally, the model is evaluated by estimating the error of the inliers relativeto the model.

This procedure is repeated a fixed number of times, each time producingeither a model which is rejected because too few points are classified as inliersor a refined model together with a corresponding error measure. In the lattercase, we keep the refined model if its error is lower than the last saved model.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 57From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.

, p

RANSAC

Visual Object Recognition                 Perceptual Computing Seminar                        Page 58

RANSAC

Line fitting example:Line fitting example:

Task:Estimate best line

Visual Object Recognition                 Perceptual Computing Seminar                        Page 59

st ate best e

RANSAC

Line fitting example:Line fitting example:

Sample two points

Visual Object Recognition                 Perceptual Computing Seminar                        Page 60

RANSAC

Line fitting example:Line fitting example:

Fit Line

Visual Object Recognition                 Perceptual Computing Seminar                        Page 61

RANSAC

Line fitting example:Line fitting example:

Total number of points within a threshold of line.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 62

RANSAC

Line fitting example:Line fitting example:

Repeat, until get a good result

Visual Object Recognition                 Perceptual Computing Seminar                        Page 63

good esu t

RANSAC

Line fitting example:Line fitting example:

Repeat, until get a good result

Visual Object Recognition                 Perceptual Computing Seminar                        Page 64

good esu t

RANSAC

Visual Object Recognition                 Perceptual Computing Seminar                        Page 65

RANSAC example: translationp

Putative matches

Visual Object Recognition                 Perceptual Computing Seminar                        Page 66Slide credit: A. Efros

RANSAC example: translationp

Select onematch, count inliers

Visual Object Recognition                 Perceptual Computing Seminar                        Page 67Slide credit: A. Efros

RANSAC example: translationp

Find “average” translation vector

Visual Object Recognition                 Perceptual Computing Seminar                        Page 68Slide credit: A. Efros

RANSACInterest points( / )(500/image)

Putative correspondences (268)(268)

Outliers (117)

Inliers (151)

Final inliers (262)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 69

SIFT Applicationspp

Visual Object Recognition                 Perceptual Computing Seminar                        Page 70

SIFT Applicationspp

Visual Object Recognition                 Perceptual Computing Seminar                        Page 71

SIFT Applicationspp

Visual Object Recognition                 Perceptual Computing Seminar                        Page 72

HDRSoft

SIFT Applicationspp

Visual Object Recognition                 Perceptual Computing Seminar                        Page 73

Matching and Classificationg

SIFT allows reliable real‐time recognition butat a computational cost that severely limitsthe number of points that can be handled.

A standard implementation requires 1 ms perfeature point which limits the number offeature point, which limits the number offeature points to 50 per frame if one‐requires frame rate performancerequires frame‐rate performance.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 74

Matching and Classificationg

An alternative is to rely on statistical learningtechniques to model the set of possibleappearances of a patch.

The major challenge is to use simple modelsto allow for real time efficient recognitionto allow for real‐time, efficient recognition.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 75

Matching and Classificationg

Can we match keypoints using simplerfeatures without intensive preprocessing?

{ }? : { … }We will assume that we have the possibilityp yto train a classifier for each keypoint class.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 76

Matching and ClassificationgSimple binary features I(mi,1)

I( )I(mi,2)

The test compares the intensities of twopixels around the keypoint:pixels around the keypoint:

)I(m)if I(m ii1 21

otherwise

)I(m)if I(mf i,i,

i 01 21

Visual Object Recognition                 Perceptual Computing Seminar                        Page 77

Matching and ClassificationgWithout intensive preprocessing

We can synthetically generate the set ofkeypoint’s possible appearances undervarious perspective, lighting, noise, etc.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 78

Matching and ClassificationgFERN Formulation

We model the class conditional probabilitiesof a large number of binary features whichare estimated by a training phase.y g p

At run time, these probabilities are used toAt run time, these probabilities are used toselect the best match for a given imagepatchpatch.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 79

Matching and ClassificationgFERN Formulation

fi : Binary feature.

Nf : Total number of features in the model.

Ck : Class representing all views of an image patcharound a keypoint.

Given f1 ,..., f Nf select the class k such that

)|()|( CfffPfffCPk )|,,,(maxarg),,,|(maxarg 2121 kNk

Nkk

CfffPfffCPkff

Visual Object Recognition                 Perceptual Computing Seminar                        Page 80

Mustafa Ozuysal, Michael Calonder, Vincent Lepetit, Pascal Fua, "Fast Keypoint Recognition Using RandomFerns," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 99, , 2009

Matching and ClassificationgFERN Formulation

However, it is not practical to model the jointdistribution of all features. We group featuresinto small sets (fern) and assume independencebetween these sets (Semi‐Naïve BayesianClassifier):

Fj : A fern is defined to be the set of S binaryfeatures {f f +S }.features {fr ,..., fr+S }.

M is the number of ferns, Nf = S X M.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 81

Matching and ClassificationgFERN Formulation

NkN CfffP f

f21 !parameters2)|,,,(

fkikN

kN

NCfPCfffP

ffffN

f

21

21

,parameters)|()|,,,(

p)|,,,(

fi

kikN fffff

121

simple. but too

,p)|()|,,,(

M

j

skjkN MCFPCfffP

f1

21 .parameters 2)|()|,,,( j 1

Visual Object Recognition                 Perceptual Computing Seminar                        Page 82

Matching and ClassificationgFERN Implementation

We generate a random set of binary features.A binary feature outputs a binary number

y p y

possibilities

8ibili ipossibilities

A fern with S nodes outputs a number between o and 2S‐1

Visual Object Recognition                 Perceptual Computing Seminar                        Page 83

A fern with S nodes outputs a number between o and 2 ‐1.

Matching and ClassificationgFERN Implementation

When we have multiple patches of the sameclass we can model the output of a fern witha multinomial distribution.

Probability for each possibility.a multinomial distribution. possibility.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 84

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 85Slide Credit: V.Lepetit

Matching and Classificationg

0

1

1

6

Visual Object Recognition                 Perceptual Computing Seminar                        Page 86Slide Credit: V.Lepetit

Matching and Classificationg

10

01

01

1

6

Visual Object Recognition                 Perceptual Computing Seminar                        Page 87Slide Credit: V.Lepetit

Matching and Classificationg

110

001

101

1

65

Visual Object Recognition                 Perceptual Computing Seminar                        Page 88Slide Credit: V.Lepetit

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 89Slide Credit: V.Lepetit

Matching and Classificationg

N liNormalize:P ( f1, f 2 , , f n | C c i )

000001

1

001

111

Visual Object Recognition                 Perceptual Computing Seminar                        Page 90Slide Credit: V.Lepetit

Matching and ClassificationgFERN Implementation

At the end of the training we haveAt the end of the training we havedistributions over possible fern outputs foreach classeach class.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 91

Matching and ClassificationgFERN Implementation

To recognize a new patch the outputs selectsTo recognize a new patch the outputs selectsrows of distributions for each fern and theseare then combined assuming independenceare then combined assuming independencebetween distributions.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 92

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 93

Matching and ClassificationgFERN Implementation

…in 10 lines of code….

1: for(int i = 0; i < H; i++) P[i ] = 0.;2: for(int k = 0; k < M; k++) {3: int index = 0, * d = D + k * 2 * S;4: for(int j = 0; j < S; j++) {5: index <<= 1;6: if (*(K + d[0]) < *(K + d[1]))7: index++;8: d += 2;

}9: p = PF + k * shift2 + index * shift1;10: for(int i = 0; i < H; i++) P[i] += p[i];

}

Visual Object Recognition                 Perceptual Computing Seminar                        Page 94

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 95

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 96

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 97

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 98

Matching and Classificationg

The FERN technique speeds‐up keypointmatching but the training is slow andperformed offline.

Hence, it is not suited for applications thatrequire real‐time online learning orrequire real time online learning orincremental addition of arbitrary numbersof keypoints (f e SLAM)of keypoints (f.e. SLAM).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 99

Matching and Classificationg

This limitation can be removed if we train aFERN classifier to recognize a number ofkeypoints extracted from a referencedatabase and all other keypoints aredatabase and all other keypoints arecharacterized in terms of their response tothese classification ferns (signature)these classification ferns (signature).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 100

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 101

M. Calonder, V. Lepetit, and P. Fua, Keypoint Signatures for Fast Learning and Recognition. In Proceedings of European Conference on Computer Vision, 2008.

Matching and Classificationg

It can be empirically shown that theseIt can be empirically shown that thesesignatures are stable under changes inviewing conditionsviewing conditions.

Signatures are sparse in nature if we apply aSignatures are sparse in nature if we apply athreshold function.

Signatures do not need a training phase andscale well with the number of classesscale well with the number of classes(nearest neighbor).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 102

Matching and Classificationg

However, matching signatures still involvesHowever, matching signatures still involvesmany more elementary operations thanabsolutely necessaryabsolutely necessary.

M l i h i iMoreover, evaluating the signatures requiresstoring many distributions of the same size asthemselves and, therefore, large amounts ofmemory.y

Visual Object Recognition                 Perceptual Computing Seminar                        Page 103

Matching and Classificationg

The full response vector r(p) for all J Ferns is takenp (p)to be: Vectors storing the 

probability that p is one of the N reference points

where Z is a normalizer s.t. its elements sum to one.

the N reference points.

In practice, when p truly corresponds to one of thereference keypoints r(p) contains one element that is closereference keypoints, r(p) contains one element that is closeto one where all others are close to zero.

Otherwise it contains a few relatively large values thatOtherwise, it contains a few relatively large values thatcorrespond to reference keypoints that are similar inappearance and small values elsewhere.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 104

pp

Matching and Classificationg

We can compute a sparse signature by applting ap p g y pp gpoint wise threshold function with a θ value.

It is an N‐dimensional vector with only a few non‐yzero elements that is mostly invariant to differentimaging conditions and therefore presents a usefulg g pdescriptor for matching purposes.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 105

Matching and ClassificationgThe patch

J Ferns

Vectors storingVectors storing the probability that p is one of the N reference points.

Typical parameters: J=50; d=10; N=500

Visual Object Recognition                 Perceptual Computing Seminar                        Page 106

J 50; d 10; N 500

Matching and Classificationg

Typical parameters: J=50; d=10; N=500J 50; d 10; N 500

We need for each of the 2d leaves in each of the J Ferns an N‐dimensional vector of floatsdimensional vector of floats.

The total memory requirement is M=bJ2d N bytes, where b is thenumber of bytes to store a float (8) In practice 100MB!

Visual Object Recognition                 Perceptual Computing Seminar                        Page 107

number of bytes to store a float (8). In practice, 100MB!

Matching and Classificationg

Compressive Sensing literature:Compressive Sensing literature:

• High‐dimensional sparse vectors can beg preconstructed from their linear projections intomuch lower‐dimensional spaces.p

• The Johnson–Lindenstrauss lemma states that all f h h d lsmall set of points in a high‐dimensional space can

be embedded into a space of much lowerdi i i h h di bdimension in such a way that distances betweenthe points are nearly preserved.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 108

Matching and Classificationg

Many kinds of matrices can be used for thisMany kinds of matrices can be used for thispurpouse.

Random Ortho‐Projection (ROP) matricesare a good choice and can be easilyconstructed by applying a Gram‐Schmidty pp y gorthonormalization process to a randommatrixmatrix.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 109

Matching and Classificationg

I th ti th G S h idt iIn mathematics the Gram–Schmidt process is amethod for orthonormalizing a set of vectors in

i d t t lan inner product space, most commonlythe Euclidean space Rn.

The Gram–Schmidt process takes a finite, linearlyi d d t t S { } f k ≤ dindependent set S = {v1, …, vk} for k ≤ n andgenerates an orthogonal set S' = {u1, …, uk} that

th k di i l b f Rn Sspans the same k‐dimensional subspace of Rn as S.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 110

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 111

M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 112

M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.

Matching and Classificationg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 113

M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.

Matching and Classificationg

This approach reduces the memory requirement whenstoring the models: for N=512, M=176, therequirements change from 93.75MB to 175B!The CPU time is 6.3ms per an exhaustive NN matchingof 256 points (256x256)

Visual Object Recognition                 Perceptual Computing Seminar                        Page 114

of 256 points (256x256).

Internet‐scale image databasesg

Visual Object Recognition                 Perceptual Computing Seminar                        Page 115

Min HASH

How can we find similar images inHow can we find similar images in very large datasets? 

Can we get clusters from thesegimages?

Visual Object Recognition                 Perceptual Computing Seminar                        Page 116

Min HASH

Let’s suppose that we choose a LARGE bag‐Let s suppose that we choose a LARGE bagof‐words representation of our images and that we use a binary histogramthat we use a binary histogram.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 117

Min HASH

Given two different images, we canGiven two different images, we cancompute their histogram intersection:

Visual Object Recognition                 Perceptual Computing Seminar                        Page 118

Min HASH

…and their histogram union:…and their histogram union:

Visual Object Recognition                 Perceptual Computing Seminar                        Page 119

Min HASH

Then we can define a set similarityThen we can define a set similaritymeasure in the following way:

That is, the number of times both images have a givenkeypoint in common divided by the total number ofkeypoint in common divided by the total number ofkeypoints that are present in both images.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 120

Min HASH

Visual Object Recognition                 Perceptual Computing Seminar                        Page 121

Min HASHWe can perform clustering or matchingf d d f i h hof an unordered set of images with this

measure, but this can be used only witha limited amount of data!

The method requires 

w

id2

similarity evaluations, where w is the size of the vocabulary and di is th b f i i d t

i

i1

the number of regions assigned to the i‐th visual word. Vocabulary commonly used is w=1 000 000

Visual Object Recognition                 Perceptual Computing Seminar                        Page 122

w=1.000.000. 

Min HASH

From can perform clustering orFrom can perform clustering ormatching of an unordered set of imageswith this measure but this can be usedwith this measure, but this can be usedonly with a limited amount of data!

Observation: histograms for angimage are highly sparse!

Visual Object Recognition                 Perceptual Computing Seminar                        Page 123

Min HASH

The key idea of min‐hash is to mapThe key idea of min hash is to map(“hash”) each row/histogram to a smallamount of data Sig(A) (the signature)amount of data Sig(A) (the signature)such that:

• Sig(A) is small enough.• Rows A1 and A2 are highly similar ifSig(A1) is highly similar to Sig(A2).g 1 g y g 2

Visual Object Recognition                 Perceptual Computing Seminar                        Page 124

Min HASH

Useful convention: we will refer to columns asbeing of four types:

A1: 1 0 1 01A2: 1 1 0 0Type: a b c dyp

We will also use “a” as the number of columns of type a. yp

Notes:  • Sim (A1 A2)=a/(a+b+c)Sim (A1 , A2)=a/(a+b+c)• Most columns are type d.  

Visual Object Recognition                 Perceptual Computing Seminar                        Page 125

Min HASH• Imagine the columns permuted randomly indorder.

• Hash each row A to h(A), the number of thefi l i hi h hfirst column in which row A has a 1.

h(A ) 21 0 0 1 0

1 0 0 0 0

0 1 0 0 1

0 1 0 0 0

π h(A1)=2

h(A2)=2

The probability that h(A1) = h(A2) is1 2a/(a+b+c) = Sim (A1 , A2) (the hash agree if thefirst column with a 1 is a and disagree if it is of type b or c).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 126

Min HASHIf we repeat the experiment with a new

f l l b fpermutation of columns a large number oftimes, say 512, we get a signatureconsisting of 512 column numbers for eachrow.row.

The “similarity” of these lists (fraction ofpositions in which they agree) will be veryclose to the similarity of the rows (=close to the similarity of the rows (similar signatures mean similar rows!).

Visual Object Recognition                 Perceptual Computing Seminar                        Page 127

Min HASHIn fact, it is not necessary to permute the columns: wecan hash each original column with 512 different hashcan hash each original column with 512 different hashfunctions and keep for each row the lowest hash value ofa row in which that column has a 1, independently foreach of the 512 hash functions. Then we look for thecoincidences.

1 0 0 1 0rowsignature

5 1 3 2 4

1 2 5 3 4

3 4 1 5 2

h1

h2h

h1(row)=  2

h2(row)=  1

h (row)= 33 4 1 5 2

2 5 4 1 3

h3h4

h3(row)=  3

h4(row)=  1

Visual Object Recognition                 Perceptual Computing Seminar                        Page 128

Min HASH

1 0 1 1 0

0 1 0 0 1

1 1 0 1 0

Row 1

Row 2

R 3 1 1 0 1 0

1 2 3 4 5

5 4 3 2 1

h1

h

h1(row)=  1 ,  2 , 1

h2(row)= 2 1 2

Row 3

3 4 5 1 2h2h3

h2(row)   2 ,  1 , 2    

h3(row)=  1 ,  2 , 1

Similarities:

Row‐Row Sig‐SigRow Row Sig Sig1‐2:   0/5 0/31‐3:  2/4 3/32‐3:  1/4   0/3

Visual Object Recognition                 Perceptual Computing Seminar                        Page 129

/ /

Min Hash

For efficient retrieval, the min hashes aregrouped into n‐tuples. In this example, we canform the following 2‐tuples:

h1(row)=  1 ,  2 , 1h (row)= 2 1 2h2(row)=  2 ,  1 , 2    h3(row)=  1 ,  2 , 1h4(row)= 3 , 2 , 3

The retrieval procedure then estimates the full

h4(row)   3 ,  2 , 3

similarity for only those image pairs that have atleast h identical tuples out of k tuples.

Visual Object Recognition                 Perceptual Computing Seminar                        Page 130

Min Hash

From 100k imagesFrom 100k images....

Visual Object Recognition                 Perceptual Computing Seminar                        Page 131

Min Hash

From 100k imagesFrom 100k images....

Visual Object Recognition                 Perceptual Computing Seminar                        Page 132

Min Hash

From 100k imagesFrom 100k images....

Representatives of the largest clusters

Visual Object Recognition                 Perceptual Computing Seminar                        Page 133

Min Hash

Automatic localization of different buildings

Visual Object Recognition                 Perceptual Computing Seminar                        Page 134

top related