人物與像素的視頻模型 Video Models of People and Pixels_第1頁
人物與像素的視頻模型 Video Models of People and Pixels_第2頁
人物與像素的視頻模型 Video Models of People and Pixels_第3頁
人物與像素的視頻模型 Video Models of People and Pixels_第4頁
人物與像素的視頻模型 Video Models of People and Pixels_第5頁
已閱讀5頁,還剩130頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

VideoModelsofPeopleandPixels

JathushanRajasegaran

ElectricalEngineeringandComputerSciencesUniversityofCalifornia,Berkeley

TechnicalReportNo.UCB/EECS-2025-65

/Pubs/TechRpts/2025/EECS-2025-65.html

May15,2025

Copyright?2025,bytheauthor(s).

Allrightsreserved.

Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor

personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare

notmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecificpermission.

Acknowledgement

IthankmyadvisorJitendraMalikforguidingmethroughthiswonderfulPhDjourney.IthankmycommitteemembersAngjooKanazawa,AlyoshaEfros

andBrunoOlshausenforgivingvaluablefeedbackovertheyearsandhelpingmewithresearch.FinallyIthankmyparentsfortheirunconditionalloveandsupport.

VideoModelsofPeopleandPixels

By

JathushanRajasegaran

Adissertationsubmittedinpartialsatisfactionofthe

requirementsforthedegreeof

DoctorofPhilosophy

in

Engineering-ElectricalEngineeringandComputerSciences

inthe

GraduateDivision

ofthe

UniversityofCalifornia,Berkeley

Committeeincharge:

ArthurJ.ChickProfessorJitendraMalik,Chair

AssistantProfessorAngjooKanazawa

HowardFriesenProfessorAlexei(Alyosha)Efros

ProfessorBrunoOlshausen

Spring2025

VideoModelsofPeopleandPixels

Copyright2025

by

JathushanRajasegaran

1

Abstract

VideoModelsofPeopleandPixels

by

JathushanRajasegaran

DoctorofPhilosophyinEngineering-ElectricalEngineeringandComputerSciences

UniversityofCalifornia,Berkeley

ArthurJ.ChickProfessorJitendraMalik,Chair

Fromthemomentweareborn,wecontinuouslywitnessthe“video”ofourownlives—hundredsofthousandsofhoursofrich,unfoldingscenes.Thesevisualexperiences,streaminginseamlesslyovertime,formthefoundationofhowweunderstandtheworld:bytrackingmotion,recognizingpeople,andanticipatingwhatcomesnext.Inmanyways,ourperceptionbeginswithtracking—followingapixel,aperson,oramotion—enablinghigher-orderunderstandingsuchasobjectpermanence,socialinteraction,andphysicalcausality.Thisthesisexploreshowtobuildvisualmodelsthatcantrack,recognize,andpredict.

FirstIwilldiscussabouttrackingpeopleinmonocularvideoswithPHALP(PredictingHumanAppearance,Location,andPose).Byaggregating3Drepresentationsintotracklets,temporalmodelspredictfuturestates,enablingpersistenttracking.Next,IwilldiscusshumanactionrecognitionfromaLagrangianperspectiveusingthesetracklets.LART(LagrangianActionRecognitionwithTracking),atransformer-basedmodel,demonstratesthebenefitsofexplicit3Dpose(SMPL)andlocationforpredictingactions.LARTfuses3Dposedynamicswithcontextualizedappearancefeaturesalongtracklets,significantlyimprovingperformanceontheAVAdataset,especiallyforinteractiveandcomplexactions.Finally,Iwilldiscussaboutlarge-scaleself-supervisedlearningthroughautoregressivevideopredictionwithToto,afamilyofcausaltransformers.Trainedonnext-tokenpredictionusingoveratrillionvisualtokensfromdiverseimageandvideodatasets,Totolearnspowerful,general-purposevisualrepresentationswithminimalinductivebiases.Anempiricalstudyofarchitecturalandtokenizationchoicesshowstheserepresentationsachievecompetitiveperformanceondownstreamtasksincludingclassification,tracking,objectpermanence,androbotics.Wealsoanalyzethepower-lawscalingofthesevideomodels.

i

ToAppaandAmma

foryourunconditionalloveandsupport

ii

Contents

Contents

ii

1Introduction

1

2TrackingPeoplebyPredicting3DAppearance,LocationandPose

4

2.1Introduction

4

2.2Relatedwork

7

2.3Method

8

2.4Experiments

15

2.5Discussion

18

2.6AdditionalDetails

18

2.7Implementationdetails

18

2.8Experimentaldetails

19

2.9Failurecases

20

3OntheBenefitsof3DTrackingandPoseforHumanActionRecognition

22

3.1Introduction

22

3.2RelatedWork

24

3.3Method

26

3.4Experiments

29

3.5Conclusion

35

3.6AdditionalResults

36

3.7Implementationdetails

37

4AnEmpiricalStudyofAutoregressivePre-trainingfromVideos

42

4.1Introduction

42

4.2Relatedwork

44

4.3Approach

44

4.4Experiments

48

4.5Limitations

59

4.6Conclusion

59

4.7Acknowledgments

59

iii

4.8AdditionalDetails

60

5Conclusion

66

Bibliography

67

iv

Acknowledgments

Thisdissertationwouldn’thavebeenpossiblewithouttheincrediblepeoplewho’vebeenpartofthisjourney.Whatfollowsisjustasmallattempttothankthemfortheconstantsupport,encouragement,andinspirationthey’vegivenmealongtheway.

Myadvisor,JitendraMalik.Istillrememberthefirstcallwehadduringadmissions—itwasaturningpointinmylife.ItisreallyhardtoputintowordshowgratefulIamtoyouforgivingmetheseamazingfiveyearsatBerkeley.Youtaughtmesomanythings:howtorunproperablations,howtowriteagoodintroduction,howtogiveatalk,howtothinkaboutlong-termresearch,andcountlessotherpiecesofadviceIwillcontinuetorealizeovertime.Iwillforevercherishourlongweekendmeetings;theyaresomeofthebestmemoriesofmylife.Thanksforallthebookrecommendationsandgelatos.YouarethereasonIfoundamazingfriends,greatcollaborators,beautifulmemories,andalltheresearchoutcomes—thankyou,Jitendra,foreverything.

ThewelcomingcultureatBerkeleyenabledsomanygreatinteractionsandcollaborations.IwassofortunatetoworkwithAngjooKanazawa,andforevergratefultoherforfindingmyapplication.ItwasadarktimeduringCOVID,butworkingwithAngjoosomehowbalancedit,herenergyandpositivitymademeproductiveinresearch.AlyoshaEfros,thanksforallthelate-nightdiscussionsandbuyingfoodforthelate-nightcrew,andofcourse,takingustoallthebeautifulplacesinBerkeleyandSF.Iwouldcherishallthehikesandthefunconversationswehadduringthehikesandgelatos.Thanks,Alyosha,formakingthisjourneyabeautifulone.BrunoOlshausen,Imethimfirstduringmyquals,fromthatpointonwardheissokindtome,alwayspushingmeandaskingchallengingquestions.ThanksBrunoformakingthisdissertationamemorableoneforme.IwasalsofortunatetoworkwithotherfacultymembersinBerkeley-TrevorDarrellandSergeyLevine.Itwassuchagreatexperienceworkingwiththem,andthanksformakingBAIRanamazingplaceforcollaboration.

GeorgiosPavlakosisthefirstpersonIgotowheneverIhaveaproblem,whetheritisaboutSMPLmeshoraboutlife.Hehasbeenagreatfriend,collaborator,andmentor.Iamgratefulforhissupportduringmyearlyyears.Withouthim,Icannotimaginehowmyfirstyearswouldhavebeen—orhowIwouldhaverotatedameshwithouthimpointingmetotherightcodebase.MylastyearsatMetaweremadebeautifulthankstoChristophFeichtenhofer.Thanksforlettingmeexplorenewideasandalloftheactionablefeedback.

MyundergraduateadvisorRangaRodrigo,thanksforfindingmeandpushingmetodoresearchduringmyundergraddays.HeguidemethroughtoughtimesandwasalwaystherewheneverIneededhim,ThankyouRangaSirforeverything.SurangaSeneviratnegavemeaninternshipatdat61,andIamforevergratefulforthisamazingopportunity.ChamiraEdusooiyaandTharakaSamarasingheweresosupportiveofmeduringmytimeatMoratuwaandafter,thankyousomuch.KirthevasanKandasamywasthefirstpersonIstarteddoingresearchwith,afunfact,itwasaboutstyletransfer.Thanks,Kandasamy,forfindingmeandpushingmetodoresearchduringmyundergraddays.IwrotemyfirstpaperwithSameeraRamasinghe.Thanksforallowingmetoexploreandtrynewthingsduringthistime.Aftermyundergraduate,Iwasbitlost,SalmanKhan,gavemeahandandgavemeawonderfulopportunityworkwithhimandgivemethefreedomtodoresearchandexplorenewideas.ThanksSalmanforbeingthereforme,whenIneededthemost.

v

Beyondthesupportfrommymentors,I’vebeenluckytoworkwithsometrulyamazingcollaborators,whohavebecomefriendsforever.Thisthesiswouldn’texistwithouttheirhardwork,energy,andconstantpushtokeepgoing.I’mespeciallygratefultothemforputtingupwithmyflawsandalwayspushingmetodobetter.Shubhamisagreatfriendandmentor—hetaughtmehowtowritegoodcode.Weusedtostayuplateatnightanddebugmodels,andplaycricket.ItwasapleasuretohavecrossedpathswithShub.Sashaisanamazingfriend,agreatco-opplayer,andsometimesevenanin-housetherapist.Thanks,Sasha,forhelpingmebeatEldenRing.WorkingwithIlijawasagreatexperience,thanksforlettingmeplaywithrobotsandworkonsomethingoutofmycomfortzone,isagreatlearningcurve.Thanks,Karthik,fortakingmetoniceplacesinBerkeleyduringCOVIDtimesandteachingmeabouttransformers.VonganiisagreatfriendwhoalwayslistenstomewheneverIhaveproblems;hercalmnaturealwaysamazesme.Thanks,Von,forbeingagreatfriendandsupportingmethroughhardtimes.Thanks,Himanshu,forcomingwithmeonlongwalksandforthefruitfuldiscussiononresearch.IonlystartedworkingwithJaneinmylastsemester,butshehasbeenanamazingcollaboratorandagreatfriend.Hercalmnatureduringthedeadlinetimekeptuscalm.Boyi,workingwithherwassuchafunexperience.Mydeskmate,Lea—thankyouforbeingthereduringtoughtimesandforteachingmetennis.Yossi,beenagreatfriend,hadagreattimeworkingwithhim,teachingCS182,andgoingonhikes.OurlastrowinVisionBaywouldnotbecompletewithoutAntonio,wehadsomuchfuntellingjokesandtalkingaboutresearch.AnothermemberofourlastrowvisionbayisNeerja;herskillsinorganizationandenthusiasmareunmatched.Amil,thanksforthethoughtfuldiscussionsonlosslandscapes.Evonneshehasbeenagreatfriendfromdayone!Thanks,Evonne,forlaughingatmyjokes.Vickiewasthereformeduringthehardtime.Iwasabletopassthesedaysonlybecauseofthesupportofmyfriends.Ruilongissuchanamazingfriend,hewouldalwayshelpmewheneverIhavequestionsonhowtoinstallCUDA,alwayssmiling,andthanksfordrivingmeonlongwalksinNapa.Hang,thanksforalltheinterestingdiscussionsonAIoverourlongwalksinSF.ThanksSuzieandMedhiniformanagingtositnexttomeeverydayandenduringmybadjokes.AmirisanothercorememberofourVisionBaylastrow.Thanksforallthefuneveningsatthegymandself-supervisedlearningdiscussions.ItwasagreatexperienceworkingwithShiry,hadamazingdiscussionsonbothrealandartificialneurons.Iknowthisisalonglist,butthesepeoplemademylifeatBerkeleyabeautifulmemory.Iamthankfultoallofthemforever.

TomyundergradstudentsfromBerkeleyRahul,thanksforchoosingtoworkwithme,weexploredsomanyideastogether.Itwassuchbeautifulexperiencetoseeyougrowfromastudenttoresearcher,IamgladIplayedasmallpartinyourresearchcarrier.Tanish,thanksforworkingwithmelastyear,itwasagreatexperiencetotrynewideasandexplorewithyou.

Tomymetamates,Andrea,Bernie,Danial,andVincent,itwasanamazingtimeworkingwiththem,sharingideas,andhavingwhiteboardsessions.Thankyou,Xinlei,forexploringnewdirectionsandtryingnewideaswithme.

TomybestfriendVinoj—it’sbeenabouttwoyearswithoutyou.Wewroteourfirstpapertogether,whentothefirstconferencetogether,itishardtoimagineresearchwithoutyou.Iwouldhaveneverimaginedwritingthispartinmythesis.Youwereamentor,afriend,andabrothertome.EveryCVPRwillremindyou.lifewithoutyouisgoingtobehard,butIwillcarrythememoriesofyourforever.Missyouman!

vi

ThankstotheBAIRAdminteam-Angie,Roxana,andAmi-formakingBWWlifepleasantandshieldingmefrombureaucracyandlogistics,oftenwithoutmeevenknowingit.ThankstoWasimYounis,fromBIO,whohelpedmesomanytimesandalwaysmadesurethatmyapplicationsareapprovedontime,thankyousomuchWasim.

IamthankfultomyhousematesAdwait,Naman,andAayanformanagingtolivewithmeforthelast5years.Thanksforteachingmehowtocook,thanksforthereallylongwalks,andlate-nightgelatos.Thanks,AayanandArpita,forcookingmeamazingfoodafterdeadlinesandwatchingMarvelmovies.

Hirunima,AthifandSanduru,thanksforcallingregularlyandcheckingonme,listeningtomyproblemsandtravelingwithme.MyFriendsfromMoratuwa:Priyanthan,Thuvakaran,Mathushan,Nilakshan,Thivakaran,Keerthanan,Sivaneeban,Nirukan,Sudeera,Danial,Ravindu,Shehan,Kasthuri,andHasitha,thankyouforbeingthereforme,anditwasagreatexperiencetostudyandbuildwithallofyou.

ThisjourneystartedwellbeforemytimeatBerkeley,andIowealottothemanypeoplewho’veshapedmealongtheway.Friends,teachers,andprofessorswhohavesparkedmycuriosityandloveforlearningearlyon,andI’mdeeplygratefulfortherolethey’veplayedinwhoIamtoday.

MyTamilteacher,Mr.Thangavel,notonlytaughtmethelanguagebutalsolessonsinlife,kindness,andethicalliving.MrMurukavel,thefirstpersonwhosparkedmyscientificcuriosity,tolookforthestars!Wemadetelescopes,webuiltasmalllabinmyhousetodomanyexperi-ments.Mr.Ladchumanan,myMathteacher—funfact:hecanwritewithbothhandsatthesametime—challengedmetopushmylimitsandhelpedmebecomeafasterthinker.Mr.Sothilingam,myPhysicsteacher,alwaysencouragedlogicalthinkingandcuriosity.AndMr.Mukunthan,myEnglishteacher,trulycaredaboutme—hestillcallstocheckinandseehowI’mdoing,thankyou.

Tomy

schoolfriends1

—Gowsi,Vibishanth,Kuruparan,Guruparan,Ramraj,Athavaloshan,Athavan,Santhoshan,Parthipan,Sajinthan,Sarma,Tharsan,Suthagar,andVennilavan—thankyouforbeingthereformeatdifferentpointsinmylife.You’vehelpedmetacklechallenges,bothonpaperandinlife.Gratefulforallofyou!

Tomyparents,therearen’tenoughwordsbigenoughtocapturehowgratefulIam.Yourlove,sacrifices,andunshakablebeliefinmehavebeenthefoundationofeverything.Youtaughtmewhatresilience,humility,andhardworktrulymean.VadivambikaiRajasegaran—Amma,Iloveyousomuch!YouhavecaredformefromthedayIwasborn,youleftyourjobtotakecareofme,youmovedwithmewheneverImovedtoanewschool,leavingbehindyourhome,family,andfriends.RajasegaranBalakrishar–Appa,myfirstteacher,yousacrificedsomuchforme,foryearsyoutookmeonyourbicyclealloverJaffna,hidingallthepain,spendingallthemoneywehadformyeducation.Loveyou,Amma,Appa!

WhatamI,ifnotsculptedbymyteachers?Everyonementionedabovewasatsomepointateachertome.IcrediteverythingIhavedoneandwilldotoallmyteachers.

1SorryifIforgottothanksomeone.Ifyouknowme,youknowIamabitofaforgetfulperson

1

Chapter1

Introduction

Figure1.1:AnEveningWalk:Wegooneveningwalk,nearBerkeley,weseeaniceparkandpeopledoingvariousactivities.Justbylookingatthisvideo,wecananswermanyquestionsaboutthevideo.whichtreeisneartous?,whereisthebicyclistgoing?,whatthepersononthebenchdoing?etc.

.

ItisspringtimeinBerkeley.Thedaysarewarmandgolden,andthesunsetsslowlyaround8p.m.IoftengoforawalktowardAlbany.Onmyway,Iseepeopleheadinghomeafterwork,longqueuesformingoutsidecheeseboardpizza,andonSolanostreetrestaurantsarebuzzingwithenergy.Thesidewalksarelively—strollers,cyclists,couples,dogs,andlaughter.Eventually,Ipassbyasmallpark—somethingliketheoneshowninFig.

1.1

.Isitdownandwatchthesunset.Iseepeoplearedoingsomanyactivities.Bywatchingthisseenforawhile(’avideo’),Icananswerallthequestionsaboutthisvideo.Fromwhichtreeisneartous?,whereisthebicyclistgoing?,whatthepersononthebenchdoing?whattimeitcouldbe?howtoimitateawalkingstyleofaperson?

2

CHAPTER1.INTRODUCTION

Figure1.2:Thestructureofthisthesis:Inthisthesis,wewilllookintothesethreeproblems,tracking,actionrecognitionandprediction.Wewilllookatthemandseehoweachproblemcouldconnectedtootherandhowtheyneedmorecompute(x-axis)asweremoveinductivebias(y-axis).

whatthekidnearthetreewoulddonext?etc.Wecouldsayallofthesequestionsarepartoftheuniversalproblem’learningfromvideos’or’videounderstanding’.

Inthisthesis,wearegoingtodefinevideounderstandingasthegainininformationafterwatchingavideo.Thiscouldbehumanswatchingavideoandtheirabilitytoanswermanyquestionsaboutthevideo,oratthemodelinferencethechangeofmodelactivationsorchangedweightsattraintime.Wewilltreatalloftheseasanunderstandingofthevideo.

AllthequestionsweseeintheFig

1.1

,hasbeenstudiedundervariousareas,forexample:whereisthebicyclistgoing?isatrackingproblem,whatthepersononthebenchdoing?isaactionrecognitionproblem,howtoimitateawalkingstyleofaperson?couldberoboticsproblem,whatthekidnearthetreewoulddonext?isapredictionproblem.Inthisthesis,wewillcover3suchproblems,tracking,actionrecognitionandprediction.Fig

1.2

showsthesethreeproblemswithincreasingscaleofcompute(x-axis).Wecanthingthisintermsofgoingfrom3Dmodels(tracking)tofusing3D+2D(actionrecognition)tofully2Dbasedapproach(prediction).Wecanlookatthemashumancentricmodels(tracking)tomoregeneralmodel.Also,verywelldefinedproblems(trackingisaverywelldefinedproblem)tonosowelldefinedevaluationofthepredictionmodels(apartfromthepredictiontask,restarenotsowelldefined.)

TrackingHumansin3D[

111

,

110

]:ThefirstpartofthisthesisintroducesPHALP(PredictingHumanAppearance,Location,andPose),atrackingsystemdesignedformonocularvideo.Unlikeconventional2Dtrackingapproaches,PHALPreasonsdirectlyin3Dspacebyliftingsingle-framedetectionsinto3DrepresentationsusingSMPL-basedmodels.Theserepresentationscapturenotonlywhereapersonisbutalsohowtheylookandmoveovertime.Byaggregatingthisinformationintotracklets,PHALPbuildsdynamicmodelsforeachidentityandpredictstheirfuturestates.

3

CHAPTER1.INTRODUCTION

Thispredictivecapability—acrossappearance,pose,andlocation—enablesPHALPtomaintainidentityacrossframes,eventhroughocclusionsandshottransitions.PHALPincorporateslearnedappearanceembeddingsbasedon3Dtexturemap,linearizedmotionprediction,andtransformer-basedposeforecasting.Itachievesstate-of-the-artresultsonseveralbenchmarksanddemonstrateshowagood3Drepresentationcansimplifiesidentitypersistenceincomplex,real-worldscenes.RecognizingActionsoverTrajectories[

109

]:Oncewecanreliablytrackindividualsin3D,wecanaskricherquestions:Whatisthispersondoing?Whoaretheyinteractingwith?Canwepredicttheirfutureactions?

Tothisend,thesecondpartofthisthesisintroducesLART(LagrangianActionRecognitionwithTracking),atransformer-basedmodelthattreatseachhumantrajectoryasatemporallyevolvingentity.Insteadofanalyzingvideosfromafixedviewpoint—asmostimage-orgrid-basedvideomodelsdo—LARTadoptsaLagrangianperspective,followingeachpersonthroughtimeandfusingtheir3Dposeandappearanceacrosstheirtrajectory.Thisperson-centricrepresentationallowsLARTtoreasonaboutactionsashigh-leveldynamicsratherthanaslocalpatterns.

Themodelisparticularlypowerfulforrecognizinginteractiveactions,suchas“hugging","dancing",or"kissing,"whichofteninvolvesubtletemporaldependenciesandspatialcontext.OntheAVAdataset,LARTachievessignificantimprovementsoverpriormethods,especiallyindifficultactionclassesthatrequireunderstandingmotion,pose,andhuman-objectinteraction.Italsodemonstratesthatcombininggeometry(pose)andsemantics(appearance)inatemporallystructuredwayleadstomorerobustandinterpretablemodels.

LearningfromMassiveVideoData[

108

]:Thefinalpartofthisthesisexploreshowlarge-scale,self-supervisedvideomodelscanbetrainedusingsimpleobjectives.WeintroduceToto,afamilyofcausaltransformermodelstrainedvianext-tokenpredictiononoveratrillionvisualtokensfromvideosandimages.Unliketypicalsupervisedvideomodels,Totomakesminimalassumptionsaboutinductivebiases.Itistrainedinafullyautoregressivemanner,learningtopredictthenextpatch.

Westudyhowdesignchoices—suchastokenizationgranularity,framesampling,andmodelarchitecture—affecttheemergenceofusefulrepresentations.Totodemonstratesstrongperformanceonawiderangeofdownstreamtasks,includingvideoclassification,tracking,objectpermanence,androbotics—oftenmatchingorexceedingmoretask-specificmodels.Perhapsmoreimportantly,thesimplicityofthelearningobjectiverevealsclearpower-lawscalingbehavior(whichcloserbutisstillslowerthanlanguage),suggestingapathforwardfortrainingeverlargerandmoregeneralvideomodels.

Summary:Thethreepartsofthisthesisofferacoherentapproachtovideounderstanding—startingfromprecisetracking,movingtostructuredrecognition,andfinallytolarge-scalepredictivelearning.Thecoreinsightisthattemporalitymatters:effectivevideomodelsmustreasonnotonlyaboutwhatisvisiblebuthowitchanges,persists,andunfoldsovertime.Bygroundingthesemodelsin3Dhuman-centricrepresentationsandscalingthemwithminimalsupervision.

Inthechaptersthatfollow,wedevelopeachoftheseideasindetail,supportedbyempiricalresults,andopen-sourcesystems.Ihopeisthatthisworkcontributestothebroadergoalofbuildingbettervideomodels.

4

Chapter2

TrackingPeoplebyPredicting3DAppearance,LocationandPose

Wepresentanapproachfortrackingpeopleinmonocularvideosbypredictingtheirfuture3Drepresentations.Toachievethis,wefirstliftpeopleto3Dfromasingleframeinarobustmanner.Thisliftingincludesinformationaboutthe3Dposeoftheperson,theirlocationinthe3Dspace,andthe3Dappearance.Aswetrackaperson,wecollect3Dobservationsovertimeinatrackletrepresentation.Giventhe3Dnatureofourobservations,webuildtemporalmodelsforeachoneofthepreviousattributes.Weusethesemodelstopredictthefuturestateofthetracklet,including3Dappearance,3Dlocation,and3Dpose.Forafutureframe,wecomputethesimilaritybetweenthepredictedstateofatrackletandthesingleframeobservationsinaprobabilisticmanner.AssociationissolvedwithsimpleHungarianmatching,andthematchesareusedtoupdatetherespectivetracklets.Weevaluateourapproachonvariousbenchmarksandreportstate-of-the-artresults.Codeandmodelsareavailableat:

https://brjathu.github.io/PHALP

.

2.1Introduction

Whenwewatchavideo,wecansegmentoutindividualpeople,cars,orotherobjectsandtrackthemovertime.Thecorrespondingtaskincomputervisionhasbeenstudiedforseveraldecadesnow,withafundamentalchoicebeingwhethertodothetrackingin2Dintheimageplane,orof3Dobjectsintheworld.Theformerseemssimplerbecauseitobviatestheneedforinferring3D,butifwedotakethestepofback-projectingfromtheimagetotheworld,otheraspectssuchasdealingwithocclusionbecomeeasier.Inthe3Dworldthetrackedobjectdoesn’tdisappear,andevenyounginfantsareawareofitspersistencebehindtheoccluder.Inourrecentwork[

111

],wepresentedexperimentalevidencethatperformanceisbetterwith3Drepresentations.Inthischapter,wewilltakethisasgranted,andproceedtodevelopasysteminthe3Dsettingoftheproblem.Whileourapproachbroadlyappliestoanyobjectcategorywhereparameterized3Dmodelsareavailableandcanbeinferredfromimages,wewilllimitourselvesinthischaptertostudyingpeople,themostimportantcaseinpractice.

CHAPTER2.TRACKINGPEOPLEBYPREDICTING3DAPPEARANCE,LOCATIONAND

POSE5

Figure2.1:Trackingpeoplebypredictingandmatchingin3D:Thetoprowshowsourtrackingresultsatthreedifferentframes.Theresultsarevisualizedbyacoloredhead-maskforuniqueidentities.Thesecondandthirdrowsshowrenderingsofthe3Dstatesofthetwopeopleintheirassociatedtracklets.Thebottomrowshowsthebottom-updetectionsineachimageframewhich,afterbeingliftedto3D,willbematchedwiththe3Dpredictionsofeachtrackletinthecorrespondingframe.Notehowinthemiddleframeofsecondrow,the3Drepresentationofthepersonpersistseventhoughtheyareoccludedintheimage.Readersareencouragedtowatchthevideosatthe

projectwebsite

.

Oncewehaveacceptedthephilosophythatwearetracking3Dobjectsina3Dworld,butfrom2Dimagesasrawdata,itisnaturaltoadoptthevocabularyfromcontroltheoryandestimation

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論