最新文章专题视频专题问答1问答10问答100问答1000问答2000关键字专题1关键字专题50关键字专题500关键字专题1500TAG最新视频文章推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37视频文章20视频文章30视频文章40视频文章50视频文章60 视频文章70视频文章80视频文章90视频文章100视频文章120视频文章140 视频2关键字专题关键字专题tag2tag3文章专题文章专题2文章索引1文章索引2文章索引3文章索引4文章索引5123456789101112131415文章专题3
当前位置: 首页 - 正文

Secrets of Optical Flow Estimation and Their Princ

来源:动视网 责编:小OO 时间:2025-09-28 12:52:25
文档

Secrets of Optical Flow Estimation and Their Princ

SecretsofOpticalFlowEstimationandTheirPrinciplesDeqingSunBrownUniversityStefanRothTUDarmstadtMichaelJ.BlackBrownUniversityAbstractTheaccuracyofopticalflowestimationalgorithmshasbeenimprovingsteadilyasevidencedbyresultsontheMiddleburyopticalflowbenchma
推荐度:
导读SecretsofOpticalFlowEstimationandTheirPrinciplesDeqingSunBrownUniversityStefanRothTUDarmstadtMichaelJ.BlackBrownUniversityAbstractTheaccuracyofopticalflowestimationalgorithmshasbeenimprovingsteadilyasevidencedbyresultsontheMiddleburyopticalflowbenchma
Secrets of Optical Flow Estimation and Their Principles

Deqing Sun Brown University

Stefan Roth

TU Darmstadt

Michael J.Black

Brown University

Abstract

The accuracy of opticalflow estimation algorithms has been improving steadily as evidenced by results on the Middlebury opticalflow benchmark.The typical formula-tion,however,has changed little since the work of Horn and Schunck.We attempt to uncover what has made re-cent advances possible through a thorough analysis of how the objective function,the optimization method,and mod-ern implementation practices influence accuracy.We dis-cover that“classical”flow formulations perform surpris-ingly well when combined with modern optimization and implementation techniques.Moreover,wefind that while medianfiltering of intermediateflowfields during optimiza-tion is a key to recent performance gains,it leads to higher energy solutions.To understand the principles behind this phenomenon,we derive a new objective that formalizes the medianfiltering heuristic.This objective includes a non-local term that robustly integratesflow estimates over large spatial neighborhoods.By modifying this new term to in-clude information aboutflow and image boundaries we de-velop a method that ranks at the top of the Middlebury benchmark.

1.Introduction

Thefield of opticalflow estimation is making steady progress as evidenced by the increasing accuracy of cur-rent methods on the Middlebury opticalflow benchmark [6].After nearly30years of research,these methods have obtained an impressive level of reliability and accuracy [33,34,35,40].But what has led to this progress?The majority of today’s methods strongly resemble the original formulation of Horn and Schunck(HS)[18].They combine a data term that assumes constancy of some image property with a spatial term that models how theflow is expected to vary across the image.An objective function combin-ing these two terms is then optimized.Given that this basic structure is unchanged since HS,what has enabled the per-formance gains of modern approaches?

The paper has three parts.In thefirst,we perform an ex-tensive study of current opticalflow methods and models.The most accurate methods on the Middleburyflow dataset make different choices about how to model the objective function,how to approximate this model to make it com-putationally tractable,and how to optimize it.Since most published methods change all of these properties at once, it can be difficult to know which choices are most impor-tant.To address this,we define a baseline algorithm that is“classical”,in that it is a direct descendant of the original HS formulation,and then systematically vary the model and method using different techniques from the art.The results are surprising.Wefind that only a small number of key choices produce statistically significant improvements and that they can be combined into a very simple method that achieves accuracies near the state of the art.More impor-tantly,our analysis reveals what makes currentflow meth-ods work so well.

Part two examines the principles behind this success.We find that one algorithmic choice produces the most signifi-cant improvements:applying a medianfilter to intermedi-ateflow values during incremental estimation and warping [33,34].While this heuristic improves the accuracy of the recoveredflowfields,it actually increases the energy of the objective function.This suggests that what is being opti-mized is actually a new and different objective.Using ob-servations about medianfiltering and L1energy minimiza-tion from Li and Osher[23],we formulate a new non-local term that is added to the original,classical objective.This new term goes beyond standard local(pairwise)smoothness to robustly integrate information over large spatial neigh-borhoods.We show that minimizing this new energy ap-proximates the original optimization with the heuristic me-dianfiltering step.Note,however,that the new objective falls outside our definition of classical methods.

Finally,once the medianfiltering heuristic is formulated as a non-local term in the objective,we immediately recog-nize how to modify and improve it.In part three we show how information about image structure andflow boundaries can be incorporated into a weighted version of the non-local term to prevent over-smoothing across boundaries.By in-corporating structure from the image,this weighted version does not suffer from some of the errors produced by median filtering.At the time of publication(March2010),the re-sulting approach is ranked1st in both angular and end-point errors in the Middlebury evaluation.

In summary,the contributions of this paper are to(1)an-alyze currentflow models and methods to understand which design choices matter;(2)formulate and compare several classical objectives descended from HS using modern meth-ods;(3)formalize one of the key heuristics and derive a new objective function that includes a non-local term;(4)mod-ify this new objective to produce a state-of-the-art method. In doing this,we provide a“recipe”for others studying op-ticalflow that can guide their design choices.Finally,to en-able comparison and further innovation,we provide a public M ATLAB implementation[1].

2.Previous Work

It is important to separately analyze the contributions of the objective function that defines the problem(the model) and the optimization algorithm and implementation used to minimize it(the method).The HS formulation,for example, has long been thought to be highly inaccurate.Barron et al.

[7]reported an average angular error(AAE)of~30degrees on the“Yosemite”sequence.This confounds the objective function with the particular optimization method proposed by Horn and Schunck1.When optimized with today’s meth-ods,the HS objective achieves surprisingly competitive re-sults despite the expected over-smoothing and sensitivity to outliers.

Models:The global formulation of opticalflow intro-duced by Horn and Schunck[18]relies on both brightness constancy and spatial smoothness assumptions,but suffers from the fact that the quadratic formulation is not robust to outliers.Black and Anandan[10]addressed this by re-placing the quadratic error function with a robust formula-tion.Subsequently,many different robust functions have been explored[12,22,31]and it remains unclear which is best.We refer to all these spatially-discrete formulations derived from HS as“classical.”We systematically explore variations in the formulation and optimization of these ap-proaches.The surprise is that the classical model,appropri-ately implemented,remains very competitive.

There are many formulations beyond the classical ones that we do not consider here.Significant ones use oriented smoothness[25,31,33,40],rigidity constraints[32,33], or image segmentation[9,21,41,37].While they deserve similar careful consideration,we expect many of our con-clusions to carry forward.Note that one can select among a set of models for a given sequence[4],instead offinding a “best”model for all the sequences.

Methods:Many of the implementation details that are thought to be important date back to the early days of op-1They noted that the correct way to optimize their objective is by solv-ing a system of linear equations as is common today.This was impractical on the computers of the day so they used a heuristic method.ticalflow.Current best practices include coarse-to-fine es-timation to deal with large motions[8,13],texture decom-position[32,34]or high-orderfilter constancy[3,12,16, 22,40]to reduce the influence of lighting changes,bicubic interpolation-based warping[22,34],temporal averaging of image derivatives[17,34],graduated non-convexity[11]to minimize non-convex energies[10,31],and medianfilter-ing after each incremental estimation step to remove outliers [34].

This medianfiltering heuristic is of particular interest as it makes non-robust methods more robust and improves the accuracy of all methods we tested.The effect on the objec-tive function and the underlying reason for its success have not previously been analyzed.Least median squares estima-tion can be used to robustly reject outliers inflow estimation [5],but previous work has focused on the data term.

Related to medianfiltering,and our new non-local term, is the use of bilateralfiltering to prevent smoothing across motion boundaries[36].The approach separates a varia-tional method into twofiltering update stages,and replaces the original anisotropic diffusion process with multi-cue driven bilateralfiltering.As with medianfiltering,the bi-lateralfiltering step changes the original energy function.

Models that are formulated with an L1robust penalty are often coupled with specialized total variation(TV)op-timization methods[39].Here we focus on generic opti-mization methods that can apply to any model andfind they perform as well as reported results for specialized methods.

Despite recent algorithmic advances,there is a lack of publicly available,easy to use,and accurateflow estimation software.The GPU4Vision project[2]has made a substan-tial effort to change this and provides executablefiles for several accurate methods[32,33,34,35].The dependence on the GPU and the lack of source code are limitations.We hope that our public M ATLAB code will not only help in un-derstanding the“secrets”of opticalflow,but also let others exploit opticalflow as a useful tool in computer vision and relatedfields.

3.Classical Models

We write the“classical”opticalflow objective function in its spatially discrete form as

E(u,v)=

i,j

{ρD(I1(i,j)−I2(i+u i,j,j+v i,j))(1)

+λ[ρS(u i,j−u i+1,j)+ρS(u i,j−u i,j+1)

+ρS(v i,j−v i+1,j)+ρS(v i,j−v i,j+1)]}, where u and v are the horizontal and vertical components of the opticalflowfield to be estimated from images I1and I2,λis a regularization parameter,andρD andρS are the data and spatial penalty functions.We consider three different penalty functions:(1)the quadratic HS penaltyρ(x)=x2;(2)the Charbonnier penaltyρ(x)=√

x2+ 2[13],a dif-

ferentiable variant of the L1norm,the most robust convex

function;and(3)the Lorentzianρ(x)=log(1+x2

2σ2),which

is a non-convex robust penalty used in[10].Note that this classical model is related to a standard pairwise Markov randomfield(MRF)based on a4-neighborhood.

In the remainder of this section we define a baseline method using several techniques from the literature.This is not the“best”method,but includes modern techniques and will be used for comparison.We only briefly describe the main choices,which are explored in more detail in the following section and the cited references,especially[30].

Quantitative results are presented throughout the remain-der of the text.In all cases we report the average end-point error(EPE)on the Middlebury training and test sets,de-pending on the experiment.Given the extensive nature of the evaluation,only average results are presented in the main body,while the details for each individual sequence are given in[30].

3.1.Baseline methods

To gain robustness against lighting changes,we follow [34]and apply the Rudin-Osher-Fatemi(ROF)structure texture decomposition method[28]to pre-process the in-put sequences and linearly combine the texture and struc-ture components(in the proportion20:1).The parameters are set according to[34].

Optimization is performed using a standard incremental multi-resolution technique(e.g.[10,13])to estimateflow fields with large displacements.The opticalflow estimated at a coarse level is used to warp the second image toward thefirst at the nextfiner level,and aflow increment is cal-culated between thefirst image and the warped second im-age.The standard deviation of the Gaussian anti-aliasing

filter is set to be1√

2d ,where d denotes the downsampling

factor.Each level is recursively downsampled from its near-est lower level.In building the pyramid,the downsampling factor is not critical as pointed out in the next section and here we use the settings in[31],which uses a factor of0.8 in thefinal stages of the optimization.We adaptively de-termine the number of pyramid levels so that the top level has a width or height of around20to30pixels.At each pyramid level,we perform10warping steps to compute the flow increment.

At each warping step,we linearize the data term,which

involves computing terms of the type∂

∂x I2(i+u k i,j,j+v k i,j),

where∂/∂x denotes the partial derivative in the horizon-tal direction,u k and v k denote the currentflow estimate at iteration k.As suggested in[34],we compute the deriva-tives of the second image using the5-point derivativefilter

1 12[−180−81],and warp the second image and its deriva-

tives toward thefirst using the currentflow estimate by bicu-bic interpolation.We then compute the spatial derivatives of

Avg.Rank Avg.EPE

Classic-C14.90.408

HS24.60.501

Classic-L19.80.530

HS[31]35.10.872

BA(Classic-L)[31]30.90.746

Adaptive[33]11.50.401

Complementary OF[40]10.10.485

Table1.Models.Average rank and end-point error(EPE)on the Middlebury test set using different penalty functions.Two current methods are included for comparison.

thefirst image,average with the warped derivatives of the second image(c.f.[17]),and use this in place of∂I2

∂x

.For pixels moving out of the image boundaries,we set both their corresponding temporal and spatial derivatives to zero.Af-ter each warping step,we apply a5×5medianfilter to the newly computedflowfield to remove outliers[34].

For the Charbonnier(Classic-C)and Lorentzian (Classic-L)penalty function,we use a graduated non-convexity(GNC)scheme[11]as described in[31]that lin-early combines a quadratic objective with a robust objective in varying proportions,from fully quadratic to fully robust. Unlike[31],a single regularization weightλis used for both the quadratic and the robust objective functions.

3.2.Baseline results

The regularization parameterλis selected among a set of candidate values to achieve the best average end-point error (EPE)on the Middlebury training set.For the Charbonnier penalty function,the candidate set is[1,3,5,8,10]and 5is optimal.The Charbonnier penalty uses =0.001for both the data and the spatial term in Eq.(1).The Lorentzian usesσ=1.5for the data term,andσ=0.03for the spa-tial term.These parameters arefixed throughout the exper-iments,except where mentioned.

Table1summarizes the EPE results of the basic model with three different penalty functions on the Middlebury test set,along with the two top performers at the time of publication(considering only published papers).The clas-sic formulations with two non-quadratic penalty functions (Classic-C)and(Classic-L)achieve competitive results de-spite their simplicity.The baseline optimization of HS and BA(Classic-L)results in significantly better accuracy than previously reported for these models[31].Note that the analysis also holds for the training set(Table2).

At the time of publication,Classic-C ranks13th in av-erage EPE and15th in AAE in the Middlebury benchmark despite its simplicity,and it serves as the baseline below.It is worth noting that the spatially discrete MRF formulation taken here is competitive with variational methods such as [33].Moreover,our baseline implementation of HS has a lower average EPE than many more sophisticated methods.Avg.EPE significance p-value Classic-C0.298——

HS0.38410.0078

Classic-L0.31910.0078

Classic-C-brightness0.28800.9453

HS-brightness0.38710.0078

Classic-L-brightness0.32500.2969

Gradient0.30500.4609

Table2.Pre-Processing.Average end-point error(EPE)on the Middlebury training set for the baseline method(Classic-C)using different pre-processing techniques.Significance is always with respect to Classic-C.

4.Secrets Explored

We evaluate a range of variations from the baseline ap-proach that have appeared in the literature,in order to illu-minate which may be of importance.This analysis is per-formed on the Middlebury training set by changing only one property at a time.Statistical significance is determined using a Wilcoxon signed rank test between each modified method and the baseline Classic-C;a p value less than0.05 indicates a significant difference.

Pre-Processing.For each method,we optimize the regu-larization parameterλfor the training sequences and report the results in Table2.The baseline uses a non-linear pre-filtering of the images to reduce the influence of illumina-tion changes[34].Table2shows the effect of removing this and using a standard brightness constancy model(*-brightness).Classic-C-brightness actually achieves lower EPE on the training set than Classic-C but significantly lower accuracy on the test set:Classic-C-brightness= 0.726,HS-brightness=0.759,and Classic-L-brightness =0.603–see Table1for comparison.This disparity sug-gests overfitting is more severe for the brightness constancy assumption.Gradient only imposes constancy of the gra-dient vector at each pixel as proposed in[12](i.e.it robustly penalizes Euclidean distance between image gradients)and has similar performance in both training and test sets(c.f. Table8).See[30]for results of more alternatives. Secrets:Some form of imagefiltering is useful but simple derivative constancy is nearly as good as the more sophisti-cated texture decomposition method.

Coarse-to-fine estimation and GNC.We vary the number of warping steps per pyramid level andfind that3warping steps gives similar results as using10(Table3).For the GNC scheme,[31]uses a downsampling factor of0.8for non-convex optimization.A downsampling factor of0.5 (Down-0.5),however,has nearly identical performance Removing the GNC step for the Charbonnier penalty function(w/o GNC)results in higher EPE on most se-quences and higher energy on all sequences(Table4).This suggests that the GNC method is helpful even for the con-vex Charbonnier penalty function due to the nonlinearity of

Avg.EPE significance p-value Classic-C0.298——

3warping steps0.30400.9688

Down-0.50.2980 1.0000

w/o GNC0.35400.1094

Bilinear0.30200.1016

w/o TA VG0.30600.1562

Central derivativefilter0.30000.7266

7-point derivativefilter[13]0.30200.3125

Bicubic-II0.29010.0391

GC-0.45(λ=3)0.29210.0156

GC-0.25(λ=0.7)0.2980 1.0000

MF3×30.30500.1016

MF7×70.30500.5625

2×MF0.3000 1.0000

5×MF0.30500.6875

w/o MF0.35210.0078

Classic++0.28510.0078 Table3.Model and Methods.Average end-point error(EPE)on the Middlebury training set for the baseline method(Classic-C) using different algorithm and modeling

choices.

Figure1.Different penalty functions for the spatial terms:Char-bonnier( =0.001),generalized Charbonnier(a=0.45and a=0.25),and Lorentzian(σ=0.03).

the data term.

Secrets:The downsampling factor does not matter when using a convex penalty;a standard factor of0.5isfine. Some form of GNC is useful even for a convex robust penalty like Charbonnier because of the nonlinear data term. Interpolation method and derivatives.Wefind that bicu-bic interpolation is more accurate than bilinear(Table3, Bilinear),as already reported in previous work[34].Re-moving temporal averaging of the gradients(w/o TA VG), using Central differencefilters,or using a7-point deriva-tivefilter[13]all reduce accuracy compared to the base-line,but not significantly.The M ATLAB built-in function interp2is based on cubic convolution approximation[20]. The spline-based interpolation scheme[26]is consistently better(Bicubic-II).See[30]for more discussions. Secrets:Use spline-based bicubic interpolation with a5-pointfilter.Temporal averaging of the derivatives is proba-bly worthwhile for a small computational expense. Penalty functions.Wefind that the convex Charbonnier penalty performs better than the more robust,non-convex Lorentzian on both the training and test sets.One reason might be that non-convex functions are more difficult to op-timize,causing the optimization scheme tofind a poor local

(a)

With medianfiltering(b)Without medianfiltering

Figure2.Estimatedflowfields on sequence“RubberWhale”using Classic-C with and without(w/o MF)the medianfiltering step. Color coding as in[6].(a)(w/MF)energy502,387and(b)(w/o MF)energy449,290.The medianfiltering step helps reach a so-lution free from outliers but with a higher energy.

optimum.We investigate a generalized Charbonnier penalty functionρ(x)=(x2+ 2)a that is equal to the Charbon-nier penalty when a=0.5,and non-convex when a<0.5 (see Figure1).We optimize the regularization parameterλagain.Wefind a slightly non-convex penalty with a=0.45 (GC-0.45)performs consistently better than the Charbon-nier penalty,whereas more non-convex penalties(GC-0.25 with a=0.25)show no improvement.

Secrets:The less-robust Charbonnier is preferable to the Lorentzian and a slightly non-convex penalty function(GC-0.45)is better still.

Medianfiltering.The baseline5×5medianfilter(MF 5×5)is better than both MF3×3[34]and MF7×7but the difference is not significant(Table3).When we perform5×5medianfiltering twice(2×MF)orfive times(5×MF)per warping step,the results are worse.Finally,removing the medianfiltering step(w/o MF)makes the computedflow significantly less accurate with larger outliers as shown in Table3and Figure2.

Secrets:Medianfiltering the intermediateflow results once after every warping iteration is the single most important secret;5×5is a goodfilter size.

4.1.Best Practices

Combining the analysis above into a single approach means modifying the baseline to use the slightly non-convex generalized Charbonnier and the spline-based bicu-bic interpolation.This leads to a statistically significant improvement over the baseline(Table3,Classic++).This method is directly descended from HS and BA,yet updated with the current best optimization practices known to us. This simple method ranks9th in EPE and12th in AAE on the Middlebury test set.

5.Models Underlying Median Filtering

Our analysis reveals the practical importance of median filtering during optimization to denoise theflowfield.We ask whether there is a principle underlying this heuristic?

One interesting observation is thatflowfields obtained with medianfiltering have substantially higher energy than those without(Table4and Figure2).If the medianfilter is helping to optimize the objective,it should lead to lower energies.Higher energies and more accurate estimates sug-gest that incorporating medianfiltering changes the objec-tive function being optimized.

The insight that follows from this is that the medianfil-tering heuristic is related to the minimization of an objective function that differs from the classical one.In particular the optimization of Eq.(1),with interleaved medianfiltering, approximately minimizes

E A(u,v,ˆu,ˆv)=(2)

i,j

{

ρD(I1(i,j)−I2(i+u i,j,j+v i,j))

+λ[ρS(u i,j−u i+1,j)+ρS(u i,j−u i,j+1)+

ρS(v i,j−v i+1,j)+ρS(v i,j−v i,j+1)]

}

+λ2(||u−ˆu||2+||v−ˆv||2)

+

i,j

(i ,j )∈N i,j

λ3(|ˆu i,j−ˆu i ,j |+|ˆv i,j−ˆv i ,j |),

whereˆu andˆv denote an auxiliaryflowfield,N i,j is the set of neighbors of pixel(i,j)in a possibly large area andλ2 andλ3are scalar weights.The term in braces is the same as theflow energy from Eq.(1),while the last term is new. This non-local term[14,15]imposes a particular smooth-ness assumption within a specified region of the auxiliary flowfieldˆu,ˆv2.Here we take this term to be a5×5rectan-gular region to match the size of the medianfilter in Classic-C.A third(coupling)term encouragesˆu,ˆv and u,v to be the same(c.f.[33,39]).

The connection to medianfiltering(as a denoising method)derives from the fact that there is a direct relation-ship between the median and L1minimization.Consider a simplified version of Eq.(2)with just the coupling and non-local terms,where E(ˆu)=

λ2||u−ˆu||2+

i,j

(i ,j )∈N i,j

λ3|ˆu i,j−ˆu i ,j |.(3)

While minimizing this is similar to medianfiltering u,there are two differences.First,the non-local term minimizes the L1distance between the central value and allflow values in its neighborhood except itself.Second,Eq.(3)incorpo-rates information about the data term through the coupling equation;medianfiltering theflow ignores the data term.

The formal connection between Eq.(3)and medianfil-tering3is provided by Li and Osher[23]who show that min-2Bruhn et al.[13]also integrated information over a local region in a global method but did so for the data term.

3Hsiao et al.[19]established the connection in a slightly different way.

Venus Dimetrodon Hydrangea RubberWhale

Grove2Grove3Urban2Urban3Classic-C 0.50.7480.8660.502 1.816 2.317 1.126 1.424w/o GNC 0.5930.7500.8700.506 1.845 2.518 1.142 1.465w/o MF

0.5170.7010.6680.449 1.418 1.830 1.066 1.395

Table 4.Eq.(1)energy (×106)for the optical flow fields computed on the Middlebury training set .Note that Classic-C uses graduated non-convexity (GNC),which reduces the energy,and median filtering,which increases it.

imizing Eq.(3)is related to a different median computation

ˆu (k +1)

i,j

=median (Neighbors (k )∪Data)

(4)

where Neighbors (k )={ˆu (k )

i ,j }for (i ,j )∈N i,j and ˆu (0)=u as well as Data ={u i,j ,u i,j ±

λ3

λ2,u i,j

±

2λ3λ2

···,u i,j ±

|N i,j |λ3

2λ2},

where |N i,j |denotes the (even)number of neighbors of (i,j ).Note that the set of “data”values is balanced with an equal number of elements on either side of the value u i,j and that information about the data term is included through u i,j .Repeated application of Eq.(4)converges rapidly [23].Observe that,as λ3/λ2increases,the weighted data val-ues on either side of u i,j move away from the values of Neighbors and cancel each other out.As this happens,Eq.(4)approximates the median at the first iteration

ˆu (1)

i,j ≈median (Neighbors (0)∪{u i,j }).

(5)

Eq.(2)thus combines the original objective with an ap-proximation to the median,the influence of which is con-trolled by λ3/λ2.Note in practice the weight λ2on the

coupling term is usually small or is steadily increased from small values [34,39].We optimize the new objective (2)by alternately minimizing

E O (u ,v )=∑

i,j

ρD (I 1(i,j )−I 2(i +u i,j ,j +v i,j ))

+λ[ρS (u i,j −u i +1,j )+ρS (u i,j −u i,j +1)+ρS (v i,j −v i +1,j )+ρS (v i,j −v i,j +1)]+λ2(||u −ˆu ||2+||v −ˆv ||2)

(6)

and

E M (ˆu ,ˆv )=λ2(||u −ˆu ||2+||v −ˆv ||2)(7)

+∑i,j ∑(i ,j )∈N i,j

λ3(|ˆu i,j −ˆu i ,j |+|ˆv i,j −ˆv i ,j |).

Note that an alternative formulation would drop the cou-pling term and impose the non-local term directly on u and v .We find that optimization of the coupled set of equations is superior in terms of EPE performance.

The alternating optimization strategy first holds ˆu ,ˆv fixed and minimizes Eq.(6)w.r.t.u ,v .Then,with u ,v fixed,we minimize Eq.(7)w.r.t.ˆu ,ˆv .Note that Eqs.(3)and

Avg.EPE significance

p -value Classic-C

0.298——Classic-C-A

0.30500.8125

Table 5.Average end-point error (EPE)on the Middlebury train-ing set is shown for the new model with alternating optimization (Classic-C-A ).

(7)can be minimized by repeated application of Eq.(4);we

use this approach with 5iterations.We perform 10steps of alternating optimizations at every pyramid level and change λ2logarithmically from 10−4to 102.During the first and second GNC stages,we set u ,v to be ˆu ,ˆv after every warp-ing step (this step helps reach solutions with lower energy and EPE [30]).In the end,we take ˆu ,ˆv as the final flow field estimate.The other parameters are λ=5,λ3=1.Alternatingly optimizing this new objective function (Classic-C-A )leads to similar results as the baseline Classic-C (Table 5).We also compare the energy of these solutions using the new objective and find the alternat-ing optimization produces the lowest energy solutions,as shown in Table 6.To do so,we set both the flow field u ,v and the auxiliary flow field ˆu ,ˆv to be the same in Eq.(2).In summary,we show that the heuristic median filter-ing step in Classic-C can now be viewed as energy min-imization of a new objective with a non-local term.The explicit formulation emphasizes the value of robustly inte-grating information over large neighborhoods and enables the improved model described below.

6.Improved Model

By formalizing the median filtering heuristic as an ex-plicit objective function,we can find ways to improve it.While median filtering in a large neighborhood has advan-tages as we have seen,it also has problems.A neighborhood centered on a corner or thin structure is dominated by the surround and computing the median results in oversmooth-ing as illustrated in Figure 3(a).

Examining the non-local term suggests a solution.For a given pixel,if we know which other pixels in the area be-long to the same surface,we can weight them more highly.The modification to the objective function is achieved by introducing a weight into the non-local term [14,15]:∑i,j ∑

(i ,j )∈N i,j

w i,j,i ,j (|ˆu i,j −ˆu i ,j |+|ˆv i,j −ˆv i ,j |),(8)

where w i,j,i ,j represents how likely pixel i ,j is to belong

to the same surface as i,j .

Venus Dimetrodon Hydrangea RubberWhale Grove2Grove3Urban2Urban3 Classic-C0.8170.903 1.2020.674 2.166 3.144 1.954 2.153 Classic-C w/o MF0.8860.945 1.2990.725 2.315 3.513 2.234 2.712 Classic-C-A0.7840.8 1.1390.666 2.0 2.976 1.922 2.049

Table

6.Eq.(2)energy(×106)for the computedflowfields on the Middlebury training set.The alternating optimization strategy(Classic-C-A)produces the lowest energy solutions.

(a)(b)

Figure3.Medianfiltering over-smoothes the rifle in the“Army”

sequence,while the proposed weighted non-local term preserves

the detail.Results of(a)Classic++(b)Classic+NL.

Of course,we do not know w i,j,i ,j ,but can approximate

it.We draw ideas from[29,36,38]to define the weights ac-

cording to their spatial distance,their color-value distance,

and their occlusion state as w i,j,i ,j ∝

exp

{

−|i−i |2+|j−j |2

2σ2

1

−|I(i,j)−I(i ,j )|2

2σ2

2

}o(i ,j )

o(i,j)

,(9)

where the occlusion variable o(i,j)is calculated using

Eq.(22)in[29],I(i,j)is the color vector in the Lab space,

andσ1=7,σ2=7.Examples of such weights are shown

for several15×15neighborhoods in Figure4;bright val-

ues indicate higher weights.Note the neighborhood labeled

d,corresponding to the rifle.Since pixels on the rifle are

in the minority,an unweighted median would oversmooth.

The weighted term instead robustly estimates the motion

using values on the rifle.A closely related piece of work is

[27],which uses the intervening contour to define affinities

among neighboring pixels for the local Lucas and Kanade

[24]method.However it only uses this scheme to estimate

motion for sparse points and then interpolates the dense

flowfield.

We approximately solve Eq.(9)forˆu,ˆv as the following

weighted median problem

min

ˆu i,j

(i ,j )∈N i,j∪{i,j}

w i,j,i ,j |ˆu i,j−u i ,j |,(10)

using the formula(3.13)in[23]for all the pixels

(Classic+NL-Full).Note if all the weights are equal,the

solution is just the median.In practice,we can adopt a fast

version(Classic+NL)without performance loss.Given a

current estimate of theflow,we detect motion boundaries

using a Sobel edge detector and dilate these edges with a

5×5mask to obtainflow boundary regions.In these re-

gions we use the weighting in Eq.(9)in a15×15neighbor-

hood.In the non-boundary regions,we use equal weights in

a5×5neighoborhood to compute the median.

(a)(b)(c)(d)(e)

Figure4.Neighbor weights of the proposed weighted non-local

term at different positions in the“Army”sequence.

Avg.EPE significance p-value

Classic+NL0.221——

Classic+NL-Full0.22200.8203

Table7.Average end-point error(EPE)on the Middlebury training

set is shown for the fast and full versions of the improved model.

Avg.Rank Avg.EPE

Classic++13.40.406

Classic++Gradient15.10.430

Classic+NL 6.20.319

Classic+NL-Full 6.60.316

Table8.Average end-point error(EPE)on the Middlebury test set

for the Classic++model with two different preprocessing tech-

niques and its improved model.

Tables7and8show that the weighted non-local term

(Classic+NL)improves the accuracy on both the training

and the test sets.Note that thefine detail of the“rifle”is

preserved in Figure3(b).At the time of publication,Clas-

sic+NL ranks1st in both AAE and EPE in the Middlebury

evaluation and has the lowest average AAE and EPE among

all listed algorithms.The running time on the test“Ur-

ban”sequence is about70minutes for Classic+NL-Full

and about16miniutes for Classic+NL in M ATLAB.

7.Conclusions

Implemented using modern practices,classical optical

flow formulations produce competitive results on the Mid-dlebury training and test sets.To understand the“secrets”that help such basic formulations work well,we quantita-tively studied various aspects offlow approaches from the literature,including their implementation details.Among the good practices,we found that using medianfiltering to denoise theflow after every warping step is key to improv-ing accuracy,but that it increases the energy of thefinal re-sult.Exploiting connections between medianfiltering and L1-based denoising,we showed that algorithms relying on a medianfiltering step are approximately optimizing a differ-ent objective that regularizesflow over a large spatial neigh-borhood.This principle enables us to design and optimize improved models that weight the neighbors adaptively in an extended image region.At the time of publication(March 2010),the resulting algorithm ranks1st in both angular and end-point errors in the Middlebury evaluation.The M AT-LAB code is publicly available[1].

How far can the2-frame classical methods be pushed? Our sense is that they are likely to improve incrementally for several years to come,but that the big gains will come from methods that go beyond the classical formulation to reason more explicitly about surfaces and boundaries and how they move over time.

Acknowledgments.DS and MJB were supported by a gift from Intel Corp.and NSF CRCNS award IIS-0904875.We thank the reviewers for constructive comments,especially the connec-tion between our original“area”term and non-local regulariza-tion,P.Yadollahpour for his early work on implementing the HS method,S.Zuffifor suggesting the color version of the non-local term,T.Brox,A.Wedel,and M.Werlberger for clarifying details about their papers,and D.Scharstein for maintaining the online opticalflow benchmark.

References

[1]http://www.cs.brown.edu/people/dqsun/

[2]http://gpu4vision.icg.tugraz.at/

[3] E.H.Adelson,C.H.Anderson,J.R.Bergen,P.J.Burt,and J.M.

Ogden.Pyramid methods in image processing.RCA Engineer, 29(6):33–41,1984.

[4]O.M.Aodha,G.Brostow,and M.Pollefeys.Segmenting video into

classes of algorithm-suitability.In CVPR,2010.

[5] A.Bab-Hadiashar and D.Suter.Robust opticflow computation.

IJCV,29(1):59–77,1998.

[6]S.Baker, D.Scharstein,J.Lewis,S.Roth,M.J.Black,and

R.Szeliski.A database and evaluation methodology for opticalflow.

In ICCV,2007.

[7]J.Barron,D.Fleet,S.Beauchemin,and T.Burkitt.Performance of

opticalflow techniques.IJCV,92:236–242,1994.

[8]J.Bergen,P.Anandan,K.Hanna,and R.Hingorani.Hierarchical

model-based motion estimation.In ECCV,pages237–252,1992. [9]M.Black and A.Jepson.Estimating optical-flow in segmented im-

ages using variable-order parametric models with local deformations.

PAMI,18(10):972–986,1996.

[10]M.J.Black and P.Anandan.The robust estimation of multiple mo-

tions:Parametric and piecewise-smoothflowfields.CVIU,63:75–104,1996.[11] A.Blake and A.Zisserman.Visual Reconstruction.The MIT Press,

Cambridge,Massachusetts,1987.

[12]T.Brox,A.Bruhn,N.Papenberg,and J.Weickert.High accuracy

opticalflow estimation based on a theory for warping.In ECCV, pages25–36,2004.

[13] A.Bruhn,J.Weickert,and C.Schn¨o rr.Lucas/Kanade meets

Horn/Schunck:combining local and global opticflow methods.

IJCV,61(3):211–231,2005.

[14] A.Buades,B.Coll,and J.Morel.A non-local algorithm for image

denoising.In CVPR,2005.

[15]G.Gilboa and S.Osher.Nonlocal operators with applications to im-

age processing.Multiscale Model.Simul.,7:1005–1028,2008. [16] F.Glaer,G.Reynolds,and P.Anandan.Scene matching by hierar-

chical correlation.In CVPR,pages432–441,1983.

[17] B.Horn.Robot Vision.MIT Press,1986.

[18] B.Horn and B.Schunck.Determining opticalflow.Artificial Intelli-

gence,16:185–203,Aug.1981.

[19]I.Hsiao, A.Rangarajan,and G.Gindi.A new convex edge-

preserving median prior with applications to tomography.IEEE TMI, 22(5):580–585,2003.

[20]R.G.Keys.Cubic convolution interpolation for digital image pro-

cessing.IEEE TASSP,29:1153–1160,1981.

[21] C.Lei and Y.-H.Yang.Opticalflow estimation on coarse-to-fine

region-trees using discrete optimization.In ICCV,2009.

[22]V.Lempitsky,S.Roth,and C.Rother.FusionFlow:Discrete-

continuous optimization for opticalflow estimation.In CVPR,2008.

[23]Y.Li and S.Osher.A new median formula with applications to PDE

based denoising.Commun.Math.Sci.,7(3):741–753,2009. [24] B.Lucas and T.Kanade.An iterative image registration technique

with an application to stereo vision.In IJCAI,pages674–679,1981.

[25]H.-H.Nagel and W.Enkelmann.An investigation of smoothness

constraints for the estimation of displacement vectorfields from im-age sequences.PAMI,8(5):565–593,1986.

[26]W.H.Press,W.T.Vetterling,S.A.Teukolsky,and B.P.Flannery.

Numerical Recipes in C++:the art of scientific computing.Cam-bridge University Press,New York,NY,USA,2002.

[27]X.Ren.Local grouping for opticalflow.In CVPR,2008.

[28]L.I.Rudin,S.Osher,and E.Fatemi.Nonlinear total variation based

noise removal algorithms.Phys.D,60(1-4):259–268,1992. [29]P.Sand and S.Teller.Particle video:Long-range motion estimation

using point trajectories.IJCV,80(1):72–91,2008.

[30] D.Sun,S.Roth,and M.J.Black.A quantitative analysis of current

practices in opticalflow estimation and the principles behind them.

Technical report,Brown-CS-10-03,2010.

[31] D.Sun,S.Roth,J.Lewis,and M.J.Black.Learning opticalflow.In

ECCV,volume3,pages83–97,2008.

[32] A.Wedel,T.Pock,J.Braun,U.Franke,and D.Cremers.Duality

TV-L1flow with fundamental matrix prior.In IVCNZ,2008. [33] A.Wedel,T.Pock,and D.Cremers.Structure-and motion-adaptive

regularization for high accuracy opticflow.In ICCV,2009.

[34] A.Wedel,T.Pock,C.Zach,D.Cremers,and H.Bischof.An im-

proved algorithm for TV-L1opticalflow.In Dagstuhl Motion Work-shop,2008.

[35]M.Werlberger,W.Trobin,T.Pock,A.Wedel,D.Cremers,and

H.Bischof.Anisotropic Huber-L1opticalflow.In BMVC,2009.

[36]J.Xiao,H.Cheng,H.Sawhney,C.Rao,and M.Isnardi.Bilateral

filtering-based opticalflow estimation with occlusion detection.In ECCV,pages I:211–224,2006.

[37]L.Xu,J.Chen,and J.Jia.A segmentation based variational model

for accurate opticalflow estimation.In ECCV,pages671–684,2008.

[38]K.Yoon and I.Kweon.Adaptive support-weight approach for corre-

spondence search.PAMI,28(4):650–656,2006.

[39] C.Zach,T.Pock,and H.Bischof.A duality based approach for

realtime TV-L1opticalflow.In DAGM,2007.

[40]H.Zimmer, A.Bruhn,J.Weickert,L.Valgaerts, A.Salgado,

B.Rosenhahn,and H.-P.Seidel.Complementary opticflow.In

EMMCVPR,2009.

[41] C.Zitnick,N.Jojic,and S.B.Kang.Consistent segmentation for

opticalflow estimation.In ICCV,pages1308–1315,2005.

文档

Secrets of Optical Flow Estimation and Their Princ

SecretsofOpticalFlowEstimationandTheirPrinciplesDeqingSunBrownUniversityStefanRothTUDarmstadtMichaelJ.BlackBrownUniversityAbstractTheaccuracyofopticalflowestimationalgorithmshasbeenimprovingsteadilyasevidencedbyresultsontheMiddleburyopticalflowbenchma
推荐度:
  • 热门焦点

最新推荐

猜你喜欢

热门推荐

专题
Top