点击下载
本文文档

当前位置：首页 - 正文

Secrets of Optical Flow Estimation and Their Princ

来源：动视网责编：小OO 时间：2025-09-28 12:52:25

Secrets of Optical Flow Estimation and Their Princ

SecretsofOpticalFlowEstimationandTheirPrinciplesDeqingSunBrownUniversityStefanRothTUDarmstadtMichaelJ.BlackBrownUniversityAbstractTheaccuracyofopticalﬂowestimationalgorithmshasbeenimprovingsteadilyasevidencedbyresultsontheMiddleburyopticalﬂowbenchma

推荐度：

点击下载本文 文档为doc格式

导读SecretsofOpticalFlowEstimationandTheirPrinciplesDeqingSunBrownUniversityStefanRothTUDarmstadtMichaelJ.BlackBrownUniversityAbstractTheaccuracyofopticalﬂowestimationalgorithmshasbeenimprovingsteadilyasevidencedbyresultsontheMiddleburyopticalﬂowbenchma

Secrets of Optical Flow Estimation and Their Principles

Deqing Sun Brown University

Stefan Roth

TU Darmstadt

Michael J.Black

Brown University

Abstract

The accuracy of opticalﬂow estimation algorithms has been improving steadily as evidenced by results on the Middlebury opticalﬂow benchmark.The typical formula-tion,however,has changed little since the work of Horn and Schunck.We attempt to uncover what has made re-cent advances possible through a thorough analysis of how the objective function,the optimization method,and mod-ern implementation practices inﬂuence accuracy.We dis-cover that“classical”ﬂow formulations perform surpris-ingly well when combined with modern optimization and implementation techniques.Moreover,weﬁnd that while medianﬁltering of intermediateﬂowﬁelds during optimiza-tion is a key to recent performance gains,it leads to higher energy solutions.To understand the principles behind this phenomenon,we derive a new objective that formalizes the medianﬁltering heuristic.This objective includes a non-local term that robustly integratesﬂow estimates over large spatial neighborhoods.By modifying this new term to in-clude information aboutﬂow and image boundaries we de-velop a method that ranks at the top of the Middlebury benchmark.

1.Introduction

Theﬁeld of opticalﬂow estimation is making steady progress as evidenced by the increasing accuracy of cur-rent methods on the Middlebury opticalﬂow benchmark [6].After nearly30years of research,these methods have obtained an impressive level of reliability and accuracy [33,34,35,40].But what has led to this progress?The majority of today’s methods strongly resemble the original formulation of Horn and Schunck(HS)[18].They combine a data term that assumes constancy of some image property with a spatial term that models how theﬂow is expected to vary across the image.An objective function combin-ing these two terms is then optimized.Given that this basic structure is unchanged since HS,what has enabled the per-formance gains of modern approaches?

The paper has three parts.In theﬁrst,we perform an ex-tensive study of current opticalﬂow methods and models.The most accurate methods on the Middleburyﬂow dataset make different choices about how to model the objective function,how to approximate this model to make it com-putationally tractable,and how to optimize it.Since most published methods change all of these properties at once, it can be difﬁcult to know which choices are most impor-tant.To address this,we deﬁne a baseline algorithm that is“classical”,in that it is a direct descendant of the original HS formulation,and then systematically vary the model and method using different techniques from the art.The results are surprising.Weﬁnd that only a small number of key choices produce statistically signiﬁcant improvements and that they can be combined into a very simple method that achieves accuracies near the state of the art.More impor-tantly,our analysis reveals what makes currentﬂow meth-ods work so well.

Part two examines the principles behind this success.We ﬁnd that one algorithmic choice produces the most signiﬁ-cant improvements:applying a medianﬁlter to intermedi-ateﬂow values during incremental estimation and warping [33,34].While this heuristic improves the accuracy of the recoveredﬂowﬁelds,it actually increases the energy of the objective function.This suggests that what is being opti-mized is actually a new and different objective.Using ob-servations about medianﬁltering and L1energy minimiza-tion from Li and Osher[23],we formulate a new non-local term that is added to the original,classical objective.This new term goes beyond standard local(pairwise)smoothness to robustly integrate information over large spatial neigh-borhoods.We show that minimizing this new energy ap-proximates the original optimization with the heuristic me-dianﬁltering step.Note,however,that the new objective falls outside our deﬁnition of classical methods.

Finally,once the medianﬁltering heuristic is formulated as a non-local term in the objective,we immediately recog-nize how to modify and improve it.In part three we show how information about image structure andﬂow boundaries can be incorporated into a weighted version of the non-local term to prevent over-smoothing across boundaries.By in-corporating structure from the image,this weighted version does not suffer from some of the errors produced by median ﬁltering.At the time of publication(March2010),the re-sulting approach is ranked1st in both angular and end-point errors in the Middlebury evaluation.

In summary,the contributions of this paper are to(1)an-alyze currentﬂow models and methods to understand which design choices matter;(2)formulate and compare several classical objectives descended from HS using modern meth-ods;(3)formalize one of the key heuristics and derive a new objective function that includes a non-local term;(4)mod-ify this new objective to produce a state-of-the-art method. In doing this,we provide a“recipe”for others studying op-ticalﬂow that can guide their design choices.Finally,to en-able comparison and further innovation,we provide a public M ATLAB implementation[1].

2.Previous Work

It is important to separately analyze the contributions of the objective function that deﬁnes the problem(the model) and the optimization algorithm and implementation used to minimize it(the method).The HS formulation,for example, has long been thought to be highly inaccurate.Barron et al.

[7]reported an average angular error(AAE)of~30degrees on the“Yosemite”sequence.This confounds the objective function with the particular optimization method proposed by Horn and Schunck1.When optimized with today’s meth-ods,the HS objective achieves surprisingly competitive re-sults despite the expected over-smoothing and sensitivity to outliers.

Models:The global formulation of opticalﬂow intro-duced by Horn and Schunck[18]relies on both brightness constancy and spatial smoothness assumptions,but suffers from the fact that the quadratic formulation is not robust to outliers.Black and Anandan[10]addressed this by re-placing the quadratic error function with a robust formula-tion.Subsequently,many different robust functions have been explored[12,22,31]and it remains unclear which is best.We refer to all these spatially-discrete formulations derived from HS as“classical.”We systematically explore variations in the formulation and optimization of these ap-proaches.The surprise is that the classical model,appropri-ately implemented,remains very competitive.

There are many formulations beyond the classical ones that we do not consider here.Signiﬁcant ones use oriented smoothness[25,31,33,40],rigidity constraints[32,33], or image segmentation[9,21,41,37].While they deserve similar careful consideration,we expect many of our con-clusions to carry forward.Note that one can select among a set of models for a given sequence[4],instead ofﬁnding a “best”model for all the sequences.

Methods:Many of the implementation details that are thought to be important date back to the early days of op-1They noted that the correct way to optimize their objective is by solv-ing a system of linear equations as is common today.This was impractical on the computers of the day so they used a heuristic method.ticalﬂow.Current best practices include coarse-to-ﬁne es-timation to deal with large motions[8,13],texture decom-position[32,34]or high-orderﬁlter constancy[3,12,16, 22,40]to reduce the inﬂuence of lighting changes,bicubic interpolation-based warping[22,34],temporal averaging of image derivatives[17,34],graduated non-convexity[11]to minimize non-convex energies[10,31],and medianﬁlter-ing after each incremental estimation step to remove outliers [34].

This medianﬁltering heuristic is of particular interest as it makes non-robust methods more robust and improves the accuracy of all methods we tested.The effect on the objec-tive function and the underlying reason for its success have not previously been analyzed.Least median squares estima-tion can be used to robustly reject outliers inﬂow estimation [5],but previous work has focused on the data term.

Related to medianﬁltering,and our new non-local term, is the use of bilateralﬁltering to prevent smoothing across motion boundaries[36].The approach separates a varia-tional method into twoﬁltering update stages,and replaces the original anisotropic diffusion process with multi-cue driven bilateralﬁltering.As with medianﬁltering,the bi-lateralﬁltering step changes the original energy function.

Models that are formulated with an L1robust penalty are often coupled with specialized total variation(TV)op-timization methods[39].Here we focus on generic opti-mization methods that can apply to any model andﬁnd they perform as well as reported results for specialized methods.

Despite recent algorithmic advances,there is a lack of publicly available,easy to use,and accurateﬂow estimation software.The GPU4Vision project[2]has made a substan-tial effort to change this and provides executableﬁles for several accurate methods[32,33,34,35].The dependence on the GPU and the lack of source code are limitations.We hope that our public M ATLAB code will not only help in un-derstanding the“secrets”of opticalﬂow,but also let others exploit opticalﬂow as a useful tool in computer vision and relatedﬁelds.

3.Classical Models

We write the“classical”opticalﬂow objective function in its spatially discrete form as

E(u,v)=

∑

i,j

{ρD(I1(i,j)−I2(i+u i,j,j+v i,j))(1)

+λ[ρS(u i,j−u i+1,j)+ρS(u i,j−u i,j+1)

+ρS(v i,j−v i+1,j)+ρS(v i,j−v i,j+1)]}, where u and v are the horizontal and vertical components of the opticalﬂowﬁeld to be estimated from images I1and I2,λis a regularization parameter,andρD andρS are the data and spatial penalty functions.We consider three different penalty functions:(1)the quadratic HS penaltyρ(x)=x2;(2)the Charbonnier penaltyρ(x)=√

x2+ 2[13],a dif-

ferentiable variant of the L1norm,the most robust convex

function;and(3)the Lorentzianρ(x)=log(1+x2

2σ2),which

is a non-convex robust penalty used in[10].Note that this classical model is related to a standard pairwise Markov randomﬁeld(MRF)based on a4-neighborhood.

In the remainder of this section we deﬁne a baseline method using several techniques from the literature.This is not the“best”method,but includes modern techniques and will be used for comparison.We only brieﬂy describe the main choices,which are explored in more detail in the following section and the cited references,especially[30].

Quantitative results are presented throughout the remain-der of the text.In all cases we report the average end-point error(EPE)on the Middlebury training and test sets,de-pending on the experiment.Given the extensive nature of the evaluation,only average results are presented in the main body,while the details for each individual sequence are given in[30].

3.1.Baseline methods

To gain robustness against lighting changes,we follow [34]and apply the Rudin-Osher-Fatemi(ROF)structure texture decomposition method[28]to pre-process the in-put sequences and linearly combine the texture and struc-ture components(in the proportion20:1).The parameters are set according to[34].

Optimization is performed using a standard incremental multi-resolution technique(e.g.[10,13])to estimateﬂow ﬁelds with large displacements.The opticalﬂow estimated at a coarse level is used to warp the second image toward theﬁrst at the nextﬁner level,and aﬂow increment is cal-culated between theﬁrst image and the warped second im-age.The standard deviation of the Gaussian anti-aliasing

ﬁlter is set to be1√

2d ,where d denotes the downsampling

factor.Each level is recursively downsampled from its near-est lower level.In building the pyramid,the downsampling factor is not critical as pointed out in the next section and here we use the settings in[31],which uses a factor of0.8 in theﬁnal stages of the optimization.We adaptively de-termine the number of pyramid levels so that the top level has a width or height of around20to30pixels.At each pyramid level,we perform10warping steps to compute the ﬂow increment.

At each warping step,we linearize the data term,which

involves computing terms of the type∂

∂x I2(i+u k i,j,j+v k i,j),

where∂/∂x denotes the partial derivative in the horizon-tal direction,u k and v k denote the currentﬂow estimate at iteration k.As suggested in[34],we compute the deriva-tives of the second image using the5-point derivativeﬁlter

1 12[−180−81],and warp the second image and its deriva-

tives toward theﬁrst using the currentﬂow estimate by bicu-bic interpolation.We then compute the spatial derivatives of

Avg.Rank Avg.EPE

Classic-C14.90.408

HS24.60.501

Classic-L19.80.530

HS[31]35.10.872

BA(Classic-L)[31]30.90.746

Adaptive[33]11.50.401

Complementary OF[40]10.10.485

Table1.Models.Average rank and end-point error(EPE)on the Middlebury test set using different penalty functions.Two current methods are included for comparison.

theﬁrst image,average with the warped derivatives of the second image(c.f.[17]),and use this in place of∂I2

∂x

.For pixels moving out of the image boundaries,we set both their corresponding temporal and spatial derivatives to zero.Af-ter each warping step,we apply a5×5medianﬁlter to the newly computedﬂowﬁeld to remove outliers[34].

For the Charbonnier(Classic-C)and Lorentzian (Classic-L)penalty function,we use a graduated non-convexity(GNC)scheme[11]as described in[31]that lin-early combines a quadratic objective with a robust objective in varying proportions,from fully quadratic to fully robust. Unlike[31],a single regularization weightλis used for both the quadratic and the robust objective functions.

3.2.Baseline results

The regularization parameterλis selected among a set of candidate values to achieve the best average end-point error (EPE)on the Middlebury training set.For the Charbonnier penalty function,the candidate set is[1,3,5,8,10]and 5is optimal.The Charbonnier penalty uses =0.001for both the data and the spatial term in Eq.(1).The Lorentzian usesσ=1.5for the data term,andσ=0.03for the spa-tial term.These parameters areﬁxed throughout the exper-iments,except where mentioned.

Table1summarizes the EPE results of the basic model with three different penalty functions on the Middlebury test set,along with the two top performers at the time of publication(considering only published papers).The clas-sic formulations with two non-quadratic penalty functions (Classic-C)and(Classic-L)achieve competitive results de-spite their simplicity.The baseline optimization of HS and BA(Classic-L)results in signiﬁcantly better accuracy than previously reported for these models[31].Note that the analysis also holds for the training set(Table2).

At the time of publication,Classic-C ranks13th in av-erage EPE and15th in AAE in the Middlebury benchmark despite its simplicity,and it serves as the baseline below.It is worth noting that the spatially discrete MRF formulation taken here is competitive with variational methods such as [33].Moreover,our baseline implementation of HS has a lower average EPE than many more sophisticated methods.Avg.EPE signiﬁcance p-value Classic-C0.298——

HS0.38410.0078

Classic-L0.31910.0078

Classic-C-brightness0.28800.9453

HS-brightness0.38710.0078

Classic-L-brightness0.32500.2969

Gradient0.30500.4609

Table2.Pre-Processing.Average end-point error(EPE)on the Middlebury training set for the baseline method(Classic-C)using different pre-processing techniques.Signiﬁcance is always with respect to Classic-C.

4.Secrets Explored

We evaluate a range of variations from the baseline ap-proach that have appeared in the literature,in order to illu-minate which may be of importance.This analysis is per-formed on the Middlebury training set by changing only one property at a time.Statistical signiﬁcance is determined using a Wilcoxon signed rank test between each modiﬁed method and the baseline Classic-C;a p value less than0.05 indicates a signiﬁcant difference.

Pre-Processing.For each method,we optimize the regu-larization parameterλfor the training sequences and report the results in Table2.The baseline uses a non-linear pre-ﬁltering of the images to reduce the inﬂuence of illumina-tion changes[34].Table2shows the effect of removing this and using a standard brightness constancy model(*-brightness).Classic-C-brightness actually achieves lower EPE on the training set than Classic-C but signiﬁcantly lower accuracy on the test set:Classic-C-brightness= 0.726,HS-brightness=0.759,and Classic-L-brightness =0.603–see Table1for comparison.This disparity sug-gests overﬁtting is more severe for the brightness constancy assumption.Gradient only imposes constancy of the gra-dient vector at each pixel as proposed in[12](i.e.it robustly penalizes Euclidean distance between image gradients)and has similar performance in both training and test sets(c.f. Table8).See[30]for results of more alternatives. Secrets:Some form of imageﬁltering is useful but simple derivative constancy is nearly as good as the more sophisti-cated texture decomposition method.

Coarse-to-ﬁne estimation and GNC.We vary the number of warping steps per pyramid level andﬁnd that3warping steps gives similar results as using10(Table3).For the GNC scheme,[31]uses a downsampling factor of0.8for non-convex optimization.A downsampling factor of0.5 (Down-0.5),however,has nearly identical performance Removing the GNC step for the Charbonnier penalty function(w/o GNC)results in higher EPE on most se-quences and higher energy on all sequences(Table4).This suggests that the GNC method is helpful even for the con-vex Charbonnier penalty function due to the nonlinearity of

Avg.EPE signiﬁcance p-value Classic-C0.298——

3warping steps0.30400.9688

Down-0.50.2980 1.0000

w/o GNC0.35400.1094

Bilinear0.30200.1016

w/o TA VG0.30600.1562

Central derivativeﬁlter0.30000.7266

7-point derivativeﬁlter[13]0.30200.3125

Bicubic-II0.29010.0391

GC-0.45(λ=3)0.29210.0156

GC-0.25(λ=0.7)0.2980 1.0000

MF3×30.30500.1016

MF7×70.30500.5625

2×MF0.3000 1.0000

5×MF0.30500.6875

w/o MF0.35210.0078

Classic++0.28510.0078 Table3.Model and Methods.Average end-point error(EPE)on the Middlebury training set for the baseline method(Classic-C) using different algorithm and modeling

choices.

Figure1.Different penalty functions for the spatial terms:Char-bonnier( =0.001),generalized Charbonnier(a=0.45and a=0.25),and Lorentzian(σ=0.03).

the data term.

Secrets:The downsampling factor does not matter when using a convex penalty;a standard factor of0.5isﬁne. Some form of GNC is useful even for a convex robust penalty like Charbonnier because of the nonlinear data term. Interpolation method and derivatives.Weﬁnd that bicu-bic interpolation is more accurate than bilinear(Table3, Bilinear),as already reported in previous work[34].Re-moving temporal averaging of the gradients(w/o TA VG), using Central differenceﬁlters,or using a7-point deriva-tiveﬁlter[13]all reduce accuracy compared to the base-line,but not signiﬁcantly.The M ATLAB built-in function interp2is based on cubic convolution approximation[20]. The spline-based interpolation scheme[26]is consistently better(Bicubic-II).See[30]for more discussions. Secrets:Use spline-based bicubic interpolation with a5-pointﬁlter.Temporal averaging of the derivatives is proba-bly worthwhile for a small computational expense. Penalty functions.Weﬁnd that the convex Charbonnier penalty performs better than the more robust,non-convex Lorentzian on both the training and test sets.One reason might be that non-convex functions are more difﬁcult to op-timize,causing the optimization scheme toﬁnd a poor local

(a)

With medianﬁltering(b)Without medianﬁltering

Figure2.Estimatedﬂowﬁelds on sequence“RubberWhale”using Classic-C with and without(w/o MF)the medianﬁltering step. Color coding as in[6].(a)(w/MF)energy502,387and(b)(w/o MF)energy449,290.The medianﬁltering step helps reach a so-lution free from outliers but with a higher energy.

optimum.We investigate a generalized Charbonnier penalty functionρ(x)=(x2+ 2)a that is equal to the Charbon-nier penalty when a=0.5,and non-convex when a<0.5 (see Figure1).We optimize the regularization parameterλagain.Weﬁnd a slightly non-convex penalty with a=0.45 (GC-0.45)performs consistently better than the Charbon-nier penalty,whereas more non-convex penalties(GC-0.25 with a=0.25)show no improvement.

Secrets:The less-robust Charbonnier is preferable to the Lorentzian and a slightly non-convex penalty function(GC-0.45)is better still.

Medianﬁltering.The baseline5×5medianﬁlter(MF 5×5)is better than both MF3×3[34]and MF7×7but the difference is not signiﬁcant(Table3).When we perform5×5medianﬁltering twice(2×MF)orﬁve times(5×MF)per warping step,the results are worse.Finally,removing the medianﬁltering step(w/o MF)makes the computedﬂow signiﬁcantly less accurate with larger outliers as shown in Table3and Figure2.

Secrets:Medianﬁltering the intermediateﬂow results once after every warping iteration is the single most important secret;5×5is a goodﬁlter size.

4.1.Best Practices

Combining the analysis above into a single approach means modifying the baseline to use the slightly non-convex generalized Charbonnier and the spline-based bicu-bic interpolation.This leads to a statistically signiﬁcant improvement over the baseline(Table3,Classic++).This method is directly descended from HS and BA,yet updated with the current best optimization practices known to us. This simple method ranks9th in EPE and12th in AAE on the Middlebury test set.

5.Models Underlying Median Filtering

Our analysis reveals the practical importance of median ﬁltering during optimization to denoise theﬂowﬁeld.We ask whether there is a principle underlying this heuristic?

One interesting observation is thatﬂowﬁelds obtained with medianﬁltering have substantially higher energy than those without(Table4and Figure2).If the medianﬁlter is helping to optimize the objective,it should lead to lower energies.Higher energies and more accurate estimates sug-gest that incorporating medianﬁltering changes the objec-tive function being optimized.

The insight that follows from this is that the medianﬁl-tering heuristic is related to the minimization of an objective function that differs from the classical one.In particular the optimization of Eq.(1),with interleaved medianﬁltering, approximately minimizes

E A(u,v,ˆu,ˆv)=(2)

∑

i,j

{

ρD(I1(i,j)−I2(i+u i,j,j+v i,j))

+λ[ρS(u i,j−u i+1,j)+ρS(u i,j−u i,j+1)+

ρS(v i,j−v i+1,j)+ρS(v i,j−v i,j+1)]

}

+λ2(||u−ˆu||2+||v−ˆv||2)

∑

i,j

∑

(i ,j )∈N i,j

λ3(|ˆu i,j−ˆu i ,j |+|ˆv i,j−ˆv i ,j |),

whereˆu andˆv denote an auxiliaryﬂowﬁeld,N i,j is the set of neighbors of pixel(i,j)in a possibly large area andλ2 andλ3are scalar weights.The term in braces is the same as theﬂow energy from Eq.(1),while the last term is new. This non-local term[14,15]imposes a particular smooth-ness assumption within a speciﬁed region of the auxiliary ﬂowﬁeldˆu,ˆv2.Here we take this term to be a5×5rectan-gular region to match the size of the medianﬁlter in Classic-C.A third(coupling)term encouragesˆu,ˆv and u,v to be the same(c.f.[33,39]).

The connection to medianﬁltering(as a denoising method)derives from the fact that there is a direct relation-ship between the median and L1minimization.Consider a simpliﬁed version of Eq.(2)with just the coupling and non-local terms,where E(ˆu)=

λ2||u−ˆu||2+

∑

i,j

∑

(i ,j )∈N i,j

λ3|ˆu i,j−ˆu i ,j |.(3)

While minimizing this is similar to medianﬁltering u,there are two differences.First,the non-local term minimizes the L1distance between the central value and allﬂow values in its neighborhood except itself.Second,Eq.(3)incorpo-rates information about the data term through the coupling equation;medianﬁltering theﬂow ignores the data term.

The formal connection between Eq.(3)and medianﬁl-tering3is provided by Li and Osher[23]who show that min-2Bruhn et al.[13]also integrated information over a local region in a global method but did so for the data term.

3Hsiao et al.[19]established the connection in a slightly different way.

Venus Dimetrodon Hydrangea RubberWhale

Grove2Grove3Urban2Urban3Classic-C 0.50.7480.8660.502 1.816 2.317 1.126 1.424w/o GNC 0.5930.7500.8700.506 1.845 2.518 1.142 1.465w/o MF

0.5170.7010.6680.449 1.418 1.830 1.066 1.395

Table 4.Eq.(1)energy (×106)for the optical ﬂow ﬁelds computed on the Middlebury training set .Note that Classic-C uses graduated non-convexity (GNC),which reduces the energy,and median ﬁltering,which increases it.

imizing Eq.(3)is related to a different median computation

ˆu (k +1)

i,j

=median (Neighbors (k )∪Data)

(4)

where Neighbors (k )={ˆu (k )

i ,j }for (i ,j )∈N i,j and ˆu (0)=u as well as Data ={u i,j ,u i,j ±

λ3

λ2,u i,j

2λ3λ2

···,u i,j ±

|N i,j |λ3

2λ2},

where |N i,j |denotes the (even)number of neighbors of (i,j ).Note that the set of “data”values is balanced with an equal number of elements on either side of the value u i,j and that information about the data term is included through u i,j .Repeated application of Eq.(4)converges rapidly [23].Observe that,as λ3/λ2increases,the weighted data val-ues on either side of u i,j move away from the values of Neighbors and cancel each other out.As this happens,Eq.(4)approximates the median at the ﬁrst iteration

ˆu (1)

i,j ≈median (Neighbors (0)∪{u i,j }).

(5)

Eq.(2)thus combines the original objective with an ap-proximation to the median,the inﬂuence of which is con-trolled by λ3/λ2.Note in practice the weight λ2on the

coupling term is usually small or is steadily increased from small values [34,39].We optimize the new objective (2)by alternately minimizing

E O (u ,v )=∑

i,j

ρD (I 1(i,j )−I 2(i +u i,j ,j +v i,j ))

+λ[ρS (u i,j −u i +1,j )+ρS (u i,j −u i,j +1)+ρS (v i,j −v i +1,j )+ρS (v i,j −v i,j +1)]+λ2(||u −ˆu ||2+||v −ˆv ||2)

(6)

and

E M (ˆu ,ˆv )=λ2(||u −ˆu ||2+||v −ˆv ||2)(7)

+∑i,j ∑(i ,j )∈N i,j

λ3(|ˆu i,j −ˆu i ,j |+|ˆv i,j −ˆv i ,j |).

Note that an alternative formulation would drop the cou-pling term and impose the non-local term directly on u and v .We ﬁnd that optimization of the coupled set of equations is superior in terms of EPE performance.

The alternating optimization strategy ﬁrst holds ˆu ,ˆv ﬁxed and minimizes Eq.(6)w.r.t.u ,v .Then,with u ,v ﬁxed,we minimize Eq.(7)w.r.t.ˆu ,ˆv .Note that Eqs.(3)and

Avg.EPE signiﬁcance

p -value Classic-C

0.298——Classic-C-A

0.30500.8125

Table 5.Average end-point error (EPE)on the Middlebury train-ing set is shown for the new model with alternating optimization (Classic-C-A ).

(7)can be minimized by repeated application of Eq.(4);we

use this approach with 5iterations.We perform 10steps of alternating optimizations at every pyramid level and change λ2logarithmically from 10−4to 102.During the ﬁrst and second GNC stages,we set u ,v to be ˆu ,ˆv after every warp-ing step (this step helps reach solutions with lower energy and EPE [30]).In the end,we take ˆu ,ˆv as the ﬁnal ﬂow ﬁeld estimate.The other parameters are λ=5,λ3=1.Alternatingly optimizing this new objective function (Classic-C-A )leads to similar results as the baseline Classic-C (Table 5).We also compare the energy of these solutions using the new objective and ﬁnd the alternat-ing optimization produces the lowest energy solutions,as shown in Table 6.To do so,we set both the ﬂow ﬁeld u ,v and the auxiliary ﬂow ﬁeld ˆu ,ˆv to be the same in Eq.(2).In summary,we show that the heuristic median ﬁlter-ing step in Classic-C can now be viewed as energy min-imization of a new objective with a non-local term.The explicit formulation emphasizes the value of robustly inte-grating information over large neighborhoods and enables the improved model described below.

6.Improved Model

By formalizing the median ﬁltering heuristic as an ex-plicit objective function,we can ﬁnd ways to improve it.While median ﬁltering in a large neighborhood has advan-tages as we have seen,it also has problems.A neighborhood centered on a corner or thin structure is dominated by the surround and computing the median results in oversmooth-ing as illustrated in Figure 3(a).

Examining the non-local term suggests a solution.For a given pixel,if we know which other pixels in the area be-long to the same surface,we can weight them more highly.The modiﬁcation to the objective function is achieved by introducing a weight into the non-local term [14,15]:∑i,j ∑

(i ,j )∈N i,j

w i,j,i ,j (|ˆu i,j −ˆu i ,j |+|ˆv i,j −ˆv i ,j |),(8)

where w i,j,i ,j represents how likely pixel i ,j is to belong

to the same surface as i,j .

Venus Dimetrodon Hydrangea RubberWhale Grove2Grove3Urban2Urban3 Classic-C0.8170.903 1.2020.674 2.166 3.144 1.954 2.153 Classic-C w/o MF0.8860.945 1.2990.725 2.315 3.513 2.234 2.712 Classic-C-A0.7840.8 1.1390.666 2.0 2.976 1.922 2.049

Table

6.Eq.(2)energy(×106)for the computedﬂowﬁelds on the Middlebury training set.The alternating optimization strategy(Classic-C-A)produces the lowest energy solutions.

(a)(b)

Figure3.Medianﬁltering over-smoothes the riﬂe in the“Army”

sequence,while the proposed weighted non-local term preserves

the detail.Results of(a)Classic++(b)Classic+NL.

Of course,we do not know w i,j,i ,j ,but can approximate

it.We draw ideas from[29,36,38]to deﬁne the weights ac-

cording to their spatial distance,their color-value distance,

and their occlusion state as w i,j,i ,j ∝

exp

{

−|i−i |2+|j−j |2

2σ2

−|I(i,j)−I(i ,j )|2

2σ2

}o(i ,j )

o(i,j)

,(9)

where the occlusion variable o(i,j)is calculated using

Eq.(22)in[29],I(i,j)is the color vector in the Lab space,

andσ1=7,σ2=7.Examples of such weights are shown

for several15×15neighborhoods in Figure4;bright val-

ues indicate higher weights.Note the neighborhood labeled

d,corresponding to the riﬂe.Since pixels on the riﬂe are

in the minority,an unweighted median would oversmooth.

The weighted term instead robustly estimates the motion

using values on the riﬂe.A closely related piece of work is

[27],which uses the intervening contour to deﬁne afﬁnities

among neighboring pixels for the local Lucas and Kanade

[24]method.However it only uses this scheme to estimate

motion for sparse points and then interpolates the dense

ﬂowﬁeld.

We approximately solve Eq.(9)forˆu,ˆv as the following

weighted median problem

min

ˆu i,j

∑

(i ,j )∈N i,j∪{i,j}

w i,j,i ,j |ˆu i,j−u i ,j |,(10)

using the formula(3.13)in[23]for all the pixels

(Classic+NL-Full).Note if all the weights are equal,the

solution is just the median.In practice,we can adopt a fast

version(Classic+NL)without performance loss.Given a

current estimate of theﬂow,we detect motion boundaries

using a Sobel edge detector and dilate these edges with a

5×5mask to obtainﬂow boundary regions.In these re-

gions we use the weighting in Eq.(9)in a15×15neighbor-

hood.In the non-boundary regions,we use equal weights in

a5×5neighoborhood to compute the median.

(a)(b)(c)(d)(e)

Figure4.Neighbor weights of the proposed weighted non-local

term at different positions in the“Army”sequence.

Avg.EPE signiﬁcance p-value

Classic+NL0.221——

Classic+NL-Full0.22200.8203

Table7.Average end-point error(EPE)on the Middlebury training

set is shown for the fast and full versions of the improved model.

Avg.Rank Avg.EPE

Classic++13.40.406

Classic++Gradient15.10.430

Classic+NL 6.20.319

Classic+NL-Full 6.60.316

Table8.Average end-point error(EPE)on the Middlebury test set

for the Classic++model with two different preprocessing tech-

niques and its improved model.

Tables7and8show that the weighted non-local term

(Classic+NL)improves the accuracy on both the training

and the test sets.Note that theﬁne detail of the“riﬂe”is

preserved in Figure3(b).At the time of publication,Clas-

sic+NL ranks1st in both AAE and EPE in the Middlebury

evaluation and has the lowest average AAE and EPE among

all listed algorithms.The running time on the test“Ur-

ban”sequence is about70minutes for Classic+NL-Full

and about16miniutes for Classic+NL in M ATLAB.

7.Conclusions

Implemented using modern practices,classical optical

ﬂow formulations produce competitive results on the Mid-dlebury training and test sets.To understand the“secrets”that help such basic formulations work well,we quantita-tively studied various aspects ofﬂow approaches from the literature,including their implementation details.Among the good practices,we found that using medianﬁltering to denoise theﬂow after every warping step is key to improv-ing accuracy,but that it increases the energy of theﬁnal re-sult.Exploiting connections between medianﬁltering and L1-based denoising,we showed that algorithms relying on a medianﬁltering step are approximately optimizing a differ-ent objective that regularizesﬂow over a large spatial neigh-borhood.This principle enables us to design and optimize improved models that weight the neighbors adaptively in an extended image region.At the time of publication(March 2010),the resulting algorithm ranks1st in both angular and end-point errors in the Middlebury evaluation.The M AT-LAB code is publicly available[1].

How far can the2-frame classical methods be pushed? Our sense is that they are likely to improve incrementally for several years to come,but that the big gains will come from methods that go beyond the classical formulation to reason more explicitly about surfaces and boundaries and how they move over time.

Acknowledgments.DS and MJB were supported by a gift from Intel Corp.and NSF CRCNS award IIS-0904875.We thank the reviewers for constructive comments,especially the connec-tion between our original“area”term and non-local regulariza-tion,P.Yadollahpour for his early work on implementing the HS method,S.Zufﬁfor suggesting the color version of the non-local term,T.Brox,A.Wedel,and M.Werlberger for clarifying details about their papers,and D.Scharstein for maintaining the online opticalﬂow benchmark.

References

[1]http://www.cs.brown.edu/people/dqsun/

[2]http://gpu4vision.icg.tugraz.at/

[3] E.H.Adelson,C.H.Anderson,J.R.Bergen,P.J.Burt,and J.M.

Ogden.Pyramid methods in image processing.RCA Engineer, 29(6):33–41,1984.

[4]O.M.Aodha,G.Brostow,and M.Pollefeys.Segmenting video into

classes of algorithm-suitability.In CVPR,2010.

[5] A.Bab-Hadiashar and D.Suter.Robust opticﬂow computation.

IJCV,29(1):59–77,1998.

[6]S.Baker, D.Scharstein,J.Lewis,S.Roth,M.J.Black,and

R.Szeliski.A database and evaluation methodology for opticalﬂow.

In ICCV,2007.

[7]J.Barron,D.Fleet,S.Beauchemin,and T.Burkitt.Performance of

opticalﬂow techniques.IJCV,92:236–242,1994.

[8]J.Bergen,P.Anandan,K.Hanna,and R.Hingorani.Hierarchical

model-based motion estimation.In ECCV,pages237–252,1992. [9]M.Black and A.Jepson.Estimating optical-ﬂow in segmented im-

ages using variable-order parametric models with local deformations.

PAMI,18(10):972–986,1996.

[10]M.J.Black and P.Anandan.The robust estimation of multiple mo-

tions:Parametric and piecewise-smoothﬂowﬁelds.CVIU,63:75–104,1996.[11] A.Blake and A.Zisserman.Visual Reconstruction.The MIT Press,

Cambridge,Massachusetts,1987.

[12]T.Brox,A.Bruhn,N.Papenberg,and J.Weickert.High accuracy

opticalﬂow estimation based on a theory for warping.In ECCV, pages25–36,2004.

[13] A.Bruhn,J.Weickert,and C.Schn¨o rr.Lucas/Kanade meets

Horn/Schunck:combining local and global opticﬂow methods.

IJCV,61(3):211–231,2005.

[14] A.Buades,B.Coll,and J.Morel.A non-local algorithm for image

denoising.In CVPR,2005.

[15]G.Gilboa and S.Osher.Nonlocal operators with applications to im-

age processing.Multiscale Model.Simul.,7:1005–1028,2008. [16] F.Glaer,G.Reynolds,and P.Anandan.Scene matching by hierar-

chical correlation.In CVPR,pages432–441,1983.

[17] B.Horn.Robot Vision.MIT Press,1986.

[18] B.Horn and B.Schunck.Determining opticalﬂow.Artiﬁcial Intelli-

gence,16:185–203,Aug.1981.

[19]I.Hsiao, A.Rangarajan,and G.Gindi.A new convex edge-

preserving median prior with applications to tomography.IEEE TMI, 22(5):580–585,2003.

[20]R.G.Keys.Cubic convolution interpolation for digital image pro-

cessing.IEEE TASSP,29:1153–1160,1981.

[21] C.Lei and Y.-H.Yang.Opticalﬂow estimation on coarse-to-ﬁne

region-trees using discrete optimization.In ICCV,2009.

[22]V.Lempitsky,S.Roth,and C.Rother.FusionFlow:Discrete-

continuous optimization for opticalﬂow estimation.In CVPR,2008.

[23]Y.Li and S.Osher.A new median formula with applications to PDE

based denoising.Commun.Math.Sci.,7(3):741–753,2009. [24] B.Lucas and T.Kanade.An iterative image registration technique

with an application to stereo vision.In IJCAI,pages674–679,1981.

[25]H.-H.Nagel and W.Enkelmann.An investigation of smoothness

constraints for the estimation of displacement vectorﬁelds from im-age sequences.PAMI,8(5):565–593,1986.

[26]W.H.Press,W.T.Vetterling,S.A.Teukolsky,and B.P.Flannery.

Numerical Recipes in C++:the art of scientiﬁc computing.Cam-bridge University Press,New York,NY,USA,2002.

[27]X.Ren.Local grouping for opticalﬂow.In CVPR,2008.

[28]L.I.Rudin,S.Osher,and E.Fatemi.Nonlinear total variation based

noise removal algorithms.Phys.D,60(1-4):259–268,1992. [29]P.Sand and S.Teller.Particle video:Long-range motion estimation

using point trajectories.IJCV,80(1):72–91,2008.

[30] D.Sun,S.Roth,and M.J.Black.A quantitative analysis of current

practices in opticalﬂow estimation and the principles behind them.

Technical report,Brown-CS-10-03,2010.

[31] D.Sun,S.Roth,J.Lewis,and M.J.Black.Learning opticalﬂow.In

ECCV,volume3,pages83–97,2008.

[32] A.Wedel,T.Pock,J.Braun,U.Franke,and D.Cremers.Duality

TV-L1ﬂow with fundamental matrix prior.In IVCNZ,2008. [33] A.Wedel,T.Pock,and D.Cremers.Structure-and motion-adaptive

regularization for high accuracy opticﬂow.In ICCV,2009.

[34] A.Wedel,T.Pock,C.Zach,D.Cremers,and H.Bischof.An im-

proved algorithm for TV-L1opticalﬂow.In Dagstuhl Motion Work-shop,2008.

[35]M.Werlberger,W.Trobin,T.Pock,A.Wedel,D.Cremers,and

H.Bischof.Anisotropic Huber-L1opticalﬂow.In BMVC,2009.

[36]J.Xiao,H.Cheng,H.Sawhney,C.Rao,and M.Isnardi.Bilateral

ﬁltering-based opticalﬂow estimation with occlusion detection.In ECCV,pages I:211–224,2006.

[37]L.Xu,J.Chen,and J.Jia.A segmentation based variational model

for accurate opticalﬂow estimation.In ECCV,pages671–684,2008.

[38]K.Yoon and I.Kweon.Adaptive support-weight approach for corre-

spondence search.PAMI,28(4):650–656,2006.

[39] C.Zach,T.Pock,and H.Bischof.A duality based approach for

realtime TV-L1opticalﬂow.In DAGM,2007.

[40]H.Zimmer, A.Bruhn,J.Weickert,L.Valgaerts, A.Salgado,

B.Rosenhahn,and H.-P.Seidel.Complementary opticﬂow.In

EMMCVPR,2009.

[41] C.Zitnick,N.Jojic,and S.B.Kang.Consistent segmentation for

opticalﬂow estimation.In ICCV,pages1308–1315,2005.

Secrets of Optical Flow Estimation and Their Princ

推荐度：

点击下载本文 文档为doc格式

热门焦点

Secrets of Optical Flow Estimation and Their Princ

Secrets of Optical Flow Estimation and Their Princ

Secrets of Optical Flow Estimation and Their Princ

最新推荐

猜你喜欢

热门推荐