




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
VideoModelsofPeopleandPixels
JathushanRajasegaran
ElectricalEngineeringandComputerSciencesUniversityofCalifornia,Berkeley
TechnicalReportNo.UCB/EECS-2025-65
/Pubs/TechRpts/2025/EECS-2025-65.html
May15,2025
Copyright?2025,bytheauthor(s).
Allrightsreserved.
Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor
personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare
notmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecificpermission.
Acknowledgement
IthankmyadvisorJitendraMalikforguidingmethroughthiswonderfulPhDjourney.IthankmycommitteemembersAngjooKanazawa,AlyoshaEfros
andBrunoOlshausenforgivingvaluablefeedbackovertheyearsandhelpingmewithresearch.FinallyIthankmyparentsfortheirunconditionalloveandsupport.
VideoModelsofPeopleandPixels
By
JathushanRajasegaran
Adissertationsubmittedinpartialsatisfactionofthe
requirementsforthedegreeof
DoctorofPhilosophy
in
Engineering-ElectricalEngineeringandComputerSciences
inthe
GraduateDivision
ofthe
UniversityofCalifornia,Berkeley
Committeeincharge:
ArthurJ.ChickProfessorJitendraMalik,Chair
AssistantProfessorAngjooKanazawa
HowardFriesenProfessorAlexei(Alyosha)Efros
ProfessorBrunoOlshausen
Spring2025
VideoModelsofPeopleandPixels
Copyright2025
by
JathushanRajasegaran
1
Abstract
VideoModelsofPeopleandPixels
by
JathushanRajasegaran
DoctorofPhilosophyinEngineering-ElectricalEngineeringandComputerSciences
UniversityofCalifornia,Berkeley
ArthurJ.ChickProfessorJitendraMalik,Chair
Fromthemomentweareborn,wecontinuouslywitnessthe“video”ofourownlives—hundredsofthousandsofhoursofrich,unfoldingscenes.Thesevisualexperiences,streaminginseamlesslyovertime,formthefoundationofhowweunderstandtheworld:bytrackingmotion,recognizingpeople,andanticipatingwhatcomesnext.Inmanyways,ourperceptionbeginswithtracking—followingapixel,aperson,oramotion—enablinghigher-orderunderstandingsuchasobjectpermanence,socialinteraction,andphysicalcausality.Thisthesisexploreshowtobuildvisualmodelsthatcantrack,recognize,andpredict.
FirstIwilldiscussabouttrackingpeopleinmonocularvideoswithPHALP(PredictingHumanAppearance,Location,andPose).Byaggregating3Drepresentationsintotracklets,temporalmodelspredictfuturestates,enablingpersistenttracking.Next,IwilldiscusshumanactionrecognitionfromaLagrangianperspectiveusingthesetracklets.LART(LagrangianActionRecognitionwithTracking),atransformer-basedmodel,demonstratesthebenefitsofexplicit3Dpose(SMPL)andlocationforpredictingactions.LARTfuses3Dposedynamicswithcontextualizedappearancefeaturesalongtracklets,significantlyimprovingperformanceontheAVAdataset,especiallyforinteractiveandcomplexactions.Finally,Iwilldiscussaboutlarge-scaleself-supervisedlearningthroughautoregressivevideopredictionwithToto,afamilyofcausaltransformers.Trainedonnext-tokenpredictionusingoveratrillionvisualtokensfromdiverseimageandvideodatasets,Totolearnspowerful,general-purposevisualrepresentationswithminimalinductivebiases.Anempiricalstudyofarchitecturalandtokenizationchoicesshowstheserepresentationsachievecompetitiveperformanceondownstreamtasksincludingclassification,tracking,objectpermanence,androbotics.Wealsoanalyzethepower-lawscalingofthesevideomodels.
i
ToAppaandAmma
foryourunconditionalloveandsupport
ii
Contents
Contents
ii
1Introduction
1
2TrackingPeoplebyPredicting3DAppearance,LocationandPose
4
2.1Introduction
4
2.2Relatedwork
7
2.3Method
8
2.4Experiments
15
2.5Discussion
18
2.6AdditionalDetails
18
2.7Implementationdetails
18
2.8Experimentaldetails
19
2.9Failurecases
20
3OntheBenefitsof3DTrackingandPoseforHumanActionRecognition
22
3.1Introduction
22
3.2RelatedWork
24
3.3Method
26
3.4Experiments
29
3.5Conclusion
35
3.6AdditionalResults
36
3.7Implementationdetails
37
4AnEmpiricalStudyofAutoregressivePre-trainingfromVideos
42
4.1Introduction
42
4.2Relatedwork
44
4.3Approach
44
4.4Experiments
48
4.5Limitations
59
4.6Conclusion
59
4.7Acknowledgments
59
iii
4.8AdditionalDetails
60
5Conclusion
66
Bibliography
67
iv
Acknowledgments
Thisdissertationwouldn’thavebeenpossiblewithouttheincrediblepeoplewho’vebeenpartofthisjourney.Whatfollowsisjustasmallattempttothankthemfortheconstantsupport,encouragement,andinspirationthey’vegivenmealongtheway.
Myadvisor,JitendraMalik.Istillrememberthefirstcallwehadduringadmissions—itwasaturningpointinmylife.ItisreallyhardtoputintowordshowgratefulIamtoyouforgivingmetheseamazingfiveyearsatBerkeley.Youtaughtmesomanythings:howtorunproperablations,howtowriteagoodintroduction,howtogiveatalk,howtothinkaboutlong-termresearch,andcountlessotherpiecesofadviceIwillcontinuetorealizeovertime.Iwillforevercherishourlongweekendmeetings;theyaresomeofthebestmemoriesofmylife.Thanksforallthebookrecommendationsandgelatos.YouarethereasonIfoundamazingfriends,greatcollaborators,beautifulmemories,andalltheresearchoutcomes—thankyou,Jitendra,foreverything.
ThewelcomingcultureatBerkeleyenabledsomanygreatinteractionsandcollaborations.IwassofortunatetoworkwithAngjooKanazawa,andforevergratefultoherforfindingmyapplication.ItwasadarktimeduringCOVID,butworkingwithAngjoosomehowbalancedit,herenergyandpositivitymademeproductiveinresearch.AlyoshaEfros,thanksforallthelate-nightdiscussionsandbuyingfoodforthelate-nightcrew,andofcourse,takingustoallthebeautifulplacesinBerkeleyandSF.Iwouldcherishallthehikesandthefunconversationswehadduringthehikesandgelatos.Thanks,Alyosha,formakingthisjourneyabeautifulone.BrunoOlshausen,Imethimfirstduringmyquals,fromthatpointonwardheissokindtome,alwayspushingmeandaskingchallengingquestions.ThanksBrunoformakingthisdissertationamemorableoneforme.IwasalsofortunatetoworkwithotherfacultymembersinBerkeley-TrevorDarrellandSergeyLevine.Itwassuchagreatexperienceworkingwiththem,andthanksformakingBAIRanamazingplaceforcollaboration.
GeorgiosPavlakosisthefirstpersonIgotowheneverIhaveaproblem,whetheritisaboutSMPLmeshoraboutlife.Hehasbeenagreatfriend,collaborator,andmentor.Iamgratefulforhissupportduringmyearlyyears.Withouthim,Icannotimaginehowmyfirstyearswouldhavebeen—orhowIwouldhaverotatedameshwithouthimpointingmetotherightcodebase.MylastyearsatMetaweremadebeautifulthankstoChristophFeichtenhofer.Thanksforlettingmeexplorenewideasandalloftheactionablefeedback.
MyundergraduateadvisorRangaRodrigo,thanksforfindingmeandpushingmetodoresearchduringmyundergraddays.HeguidemethroughtoughtimesandwasalwaystherewheneverIneededhim,ThankyouRangaSirforeverything.SurangaSeneviratnegavemeaninternshipatdat61,andIamforevergratefulforthisamazingopportunity.ChamiraEdusooiyaandTharakaSamarasingheweresosupportiveofmeduringmytimeatMoratuwaandafter,thankyousomuch.KirthevasanKandasamywasthefirstpersonIstarteddoingresearchwith,afunfact,itwasaboutstyletransfer.Thanks,Kandasamy,forfindingmeandpushingmetodoresearchduringmyundergraddays.IwrotemyfirstpaperwithSameeraRamasinghe.Thanksforallowingmetoexploreandtrynewthingsduringthistime.Aftermyundergraduate,Iwasbitlost,SalmanKhan,gavemeahandandgavemeawonderfulopportunityworkwithhimandgivemethefreedomtodoresearchandexplorenewideas.ThanksSalmanforbeingthereforme,whenIneededthemost.
v
Beyondthesupportfrommymentors,I’vebeenluckytoworkwithsometrulyamazingcollaborators,whohavebecomefriendsforever.Thisthesiswouldn’texistwithouttheirhardwork,energy,andconstantpushtokeepgoing.I’mespeciallygratefultothemforputtingupwithmyflawsandalwayspushingmetodobetter.Shubhamisagreatfriendandmentor—hetaughtmehowtowritegoodcode.Weusedtostayuplateatnightanddebugmodels,andplaycricket.ItwasapleasuretohavecrossedpathswithShub.Sashaisanamazingfriend,agreatco-opplayer,andsometimesevenanin-housetherapist.Thanks,Sasha,forhelpingmebeatEldenRing.WorkingwithIlijawasagreatexperience,thanksforlettingmeplaywithrobotsandworkonsomethingoutofmycomfortzone,isagreatlearningcurve.Thanks,Karthik,fortakingmetoniceplacesinBerkeleyduringCOVIDtimesandteachingmeabouttransformers.VonganiisagreatfriendwhoalwayslistenstomewheneverIhaveproblems;hercalmnaturealwaysamazesme.Thanks,Von,forbeingagreatfriendandsupportingmethroughhardtimes.Thanks,Himanshu,forcomingwithmeonlongwalksandforthefruitfuldiscussiononresearch.IonlystartedworkingwithJaneinmylastsemester,butshehasbeenanamazingcollaboratorandagreatfriend.Hercalmnatureduringthedeadlinetimekeptuscalm.Boyi,workingwithherwassuchafunexperience.Mydeskmate,Lea—thankyouforbeingthereduringtoughtimesandforteachingmetennis.Yossi,beenagreatfriend,hadagreattimeworkingwithhim,teachingCS182,andgoingonhikes.OurlastrowinVisionBaywouldnotbecompletewithoutAntonio,wehadsomuchfuntellingjokesandtalkingaboutresearch.AnothermemberofourlastrowvisionbayisNeerja;herskillsinorganizationandenthusiasmareunmatched.Amil,thanksforthethoughtfuldiscussionsonlosslandscapes.Evonneshehasbeenagreatfriendfromdayone!Thanks,Evonne,forlaughingatmyjokes.Vickiewasthereformeduringthehardtime.Iwasabletopassthesedaysonlybecauseofthesupportofmyfriends.Ruilongissuchanamazingfriend,hewouldalwayshelpmewheneverIhavequestionsonhowtoinstallCUDA,alwayssmiling,andthanksfordrivingmeonlongwalksinNapa.Hang,thanksforalltheinterestingdiscussionsonAIoverourlongwalksinSF.ThanksSuzieandMedhiniformanagingtositnexttomeeverydayandenduringmybadjokes.AmirisanothercorememberofourVisionBaylastrow.Thanksforallthefuneveningsatthegymandself-supervisedlearningdiscussions.ItwasagreatexperienceworkingwithShiry,hadamazingdiscussionsonbothrealandartificialneurons.Iknowthisisalonglist,butthesepeoplemademylifeatBerkeleyabeautifulmemory.Iamthankfultoallofthemforever.
TomyundergradstudentsfromBerkeleyRahul,thanksforchoosingtoworkwithme,weexploredsomanyideastogether.Itwassuchbeautifulexperiencetoseeyougrowfromastudenttoresearcher,IamgladIplayedasmallpartinyourresearchcarrier.Tanish,thanksforworkingwithmelastyear,itwasagreatexperiencetotrynewideasandexplorewithyou.
Tomymetamates,Andrea,Bernie,Danial,andVincent,itwasanamazingtimeworkingwiththem,sharingideas,andhavingwhiteboardsessions.Thankyou,Xinlei,forexploringnewdirectionsandtryingnewideaswithme.
TomybestfriendVinoj—it’sbeenabouttwoyearswithoutyou.Wewroteourfirstpapertogether,whentothefirstconferencetogether,itishardtoimagineresearchwithoutyou.Iwouldhaveneverimaginedwritingthispartinmythesis.Youwereamentor,afriend,andabrothertome.EveryCVPRwillremindyou.lifewithoutyouisgoingtobehard,butIwillcarrythememoriesofyourforever.Missyouman!
vi
ThankstotheBAIRAdminteam-Angie,Roxana,andAmi-formakingBWWlifepleasantandshieldingmefrombureaucracyandlogistics,oftenwithoutmeevenknowingit.ThankstoWasimYounis,fromBIO,whohelpedmesomanytimesandalwaysmadesurethatmyapplicationsareapprovedontime,thankyousomuchWasim.
IamthankfultomyhousematesAdwait,Naman,andAayanformanagingtolivewithmeforthelast5years.Thanksforteachingmehowtocook,thanksforthereallylongwalks,andlate-nightgelatos.Thanks,AayanandArpita,forcookingmeamazingfoodafterdeadlinesandwatchingMarvelmovies.
Hirunima,AthifandSanduru,thanksforcallingregularlyandcheckingonme,listeningtomyproblemsandtravelingwithme.MyFriendsfromMoratuwa:Priyanthan,Thuvakaran,Mathushan,Nilakshan,Thivakaran,Keerthanan,Sivaneeban,Nirukan,Sudeera,Danial,Ravindu,Shehan,Kasthuri,andHasitha,thankyouforbeingthereforme,anditwasagreatexperiencetostudyandbuildwithallofyou.
ThisjourneystartedwellbeforemytimeatBerkeley,andIowealottothemanypeoplewho’veshapedmealongtheway.Friends,teachers,andprofessorswhohavesparkedmycuriosityandloveforlearningearlyon,andI’mdeeplygratefulfortherolethey’veplayedinwhoIamtoday.
MyTamilteacher,Mr.Thangavel,notonlytaughtmethelanguagebutalsolessonsinlife,kindness,andethicalliving.MrMurukavel,thefirstpersonwhosparkedmyscientificcuriosity,tolookforthestars!Wemadetelescopes,webuiltasmalllabinmyhousetodomanyexperi-ments.Mr.Ladchumanan,myMathteacher—funfact:hecanwritewithbothhandsatthesametime—challengedmetopushmylimitsandhelpedmebecomeafasterthinker.Mr.Sothilingam,myPhysicsteacher,alwaysencouragedlogicalthinkingandcuriosity.AndMr.Mukunthan,myEnglishteacher,trulycaredaboutme—hestillcallstocheckinandseehowI’mdoing,thankyou.
Tomy
schoolfriends1
—Gowsi,Vibishanth,Kuruparan,Guruparan,Ramraj,Athavaloshan,Athavan,Santhoshan,Parthipan,Sajinthan,Sarma,Tharsan,Suthagar,andVennilavan—thankyouforbeingthereformeatdifferentpointsinmylife.You’vehelpedmetacklechallenges,bothonpaperandinlife.Gratefulforallofyou!
Tomyparents,therearen’tenoughwordsbigenoughtocapturehowgratefulIam.Yourlove,sacrifices,andunshakablebeliefinmehavebeenthefoundationofeverything.Youtaughtmewhatresilience,humility,andhardworktrulymean.VadivambikaiRajasegaran—Amma,Iloveyousomuch!YouhavecaredformefromthedayIwasborn,youleftyourjobtotakecareofme,youmovedwithmewheneverImovedtoanewschool,leavingbehindyourhome,family,andfriends.RajasegaranBalakrishar–Appa,myfirstteacher,yousacrificedsomuchforme,foryearsyoutookmeonyourbicyclealloverJaffna,hidingallthepain,spendingallthemoneywehadformyeducation.Loveyou,Amma,Appa!
WhatamI,ifnotsculptedbymyteachers?Everyonementionedabovewasatsomepointateachertome.IcrediteverythingIhavedoneandwilldotoallmyteachers.
1SorryifIforgottothanksomeone.Ifyouknowme,youknowIamabitofaforgetfulperson
1
Chapter1
Introduction
Figure1.1:AnEveningWalk:Wegooneveningwalk,nearBerkeley,weseeaniceparkandpeopledoingvariousactivities.Justbylookingatthisvideo,wecananswermanyquestionsaboutthevideo.whichtreeisneartous?,whereisthebicyclistgoing?,whatthepersononthebenchdoing?etc.
.
ItisspringtimeinBerkeley.Thedaysarewarmandgolden,andthesunsetsslowlyaround8p.m.IoftengoforawalktowardAlbany.Onmyway,Iseepeopleheadinghomeafterwork,longqueuesformingoutsidecheeseboardpizza,andonSolanostreetrestaurantsarebuzzingwithenergy.Thesidewalksarelively—strollers,cyclists,couples,dogs,andlaughter.Eventually,Ipassbyasmallpark—somethingliketheoneshowninFig.
1.1
.Isitdownandwatchthesunset.Iseepeoplearedoingsomanyactivities.Bywatchingthisseenforawhile(’avideo’),Icananswerallthequestionsaboutthisvideo.Fromwhichtreeisneartous?,whereisthebicyclistgoing?,whatthepersononthebenchdoing?whattimeitcouldbe?howtoimitateawalkingstyleofaperson?
2
CHAPTER1.INTRODUCTION
Figure1.2:Thestructureofthisthesis:Inthisthesis,wewilllookintothesethreeproblems,tracking,actionrecognitionandprediction.Wewilllookatthemandseehoweachproblemcouldconnectedtootherandhowtheyneedmorecompute(x-axis)asweremoveinductivebias(y-axis).
whatthekidnearthetreewoulddonext?etc.Wecouldsayallofthesequestionsarepartoftheuniversalproblem’learningfromvideos’or’videounderstanding’.
Inthisthesis,wearegoingtodefinevideounderstandingasthegainininformationafterwatchingavideo.Thiscouldbehumanswatchingavideoandtheirabilitytoanswermanyquestionsaboutthevideo,oratthemodelinferencethechangeofmodelactivationsorchangedweightsattraintime.Wewilltreatalloftheseasanunderstandingofthevideo.
AllthequestionsweseeintheFig
1.1
,hasbeenstudiedundervariousareas,forexample:whereisthebicyclistgoing?isatrackingproblem,whatthepersononthebenchdoing?isaactionrecognitionproblem,howtoimitateawalkingstyleofaperson?couldberoboticsproblem,whatthekidnearthetreewoulddonext?isapredictionproblem.Inthisthesis,wewillcover3suchproblems,tracking,actionrecognitionandprediction.Fig
1.2
showsthesethreeproblemswithincreasingscaleofcompute(x-axis).Wecanthingthisintermsofgoingfrom3Dmodels(tracking)tofusing3D+2D(actionrecognition)tofully2Dbasedapproach(prediction).Wecanlookatthemashumancentricmodels(tracking)tomoregeneralmodel.Also,verywelldefinedproblems(trackingisaverywelldefinedproblem)tonosowelldefinedevaluationofthepredictionmodels(apartfromthepredictiontask,restarenotsowelldefined.)
TrackingHumansin3D[
111
,
110
]:ThefirstpartofthisthesisintroducesPHALP(PredictingHumanAppearance,Location,andPose),atrackingsystemdesignedformonocularvideo.Unlikeconventional2Dtrackingapproaches,PHALPreasonsdirectlyin3Dspacebyliftingsingle-framedetectionsinto3DrepresentationsusingSMPL-basedmodels.Theserepresentationscapturenotonlywhereapersonisbutalsohowtheylookandmoveovertime.Byaggregatingthisinformationintotracklets,PHALPbuildsdynamicmodelsforeachidentityandpredictstheirfuturestates.
3
CHAPTER1.INTRODUCTION
Thispredictivecapability—acrossappearance,pose,andlocation—enablesPHALPtomaintainidentityacrossframes,eventhroughocclusionsandshottransitions.PHALPincorporateslearnedappearanceembeddingsbasedon3Dtexturemap,linearizedmotionprediction,andtransformer-basedposeforecasting.Itachievesstate-of-the-artresultsonseveralbenchmarksanddemonstrateshowagood3Drepresentationcansimplifiesidentitypersistenceincomplex,real-worldscenes.RecognizingActionsoverTrajectories[
109
]:Oncewecanreliablytrackindividualsin3D,wecanaskricherquestions:Whatisthispersondoing?Whoaretheyinteractingwith?Canwepredicttheirfutureactions?
Tothisend,thesecondpartofthisthesisintroducesLART(LagrangianActionRecognitionwithTracking),atransformer-basedmodelthattreatseachhumantrajectoryasatemporallyevolvingentity.Insteadofanalyzingvideosfromafixedviewpoint—asmostimage-orgrid-basedvideomodelsdo—LARTadoptsaLagrangianperspective,followingeachpersonthroughtimeandfusingtheir3Dposeandappearanceacrosstheirtrajectory.Thisperson-centricrepresentationallowsLARTtoreasonaboutactionsashigh-leveldynamicsratherthanaslocalpatterns.
Themodelisparticularlypowerfulforrecognizinginteractiveactions,suchas“hugging","dancing",or"kissing,"whichofteninvolvesubtletemporaldependenciesandspatialcontext.OntheAVAdataset,LARTachievessignificantimprovementsoverpriormethods,especiallyindifficultactionclassesthatrequireunderstandingmotion,pose,andhuman-objectinteraction.Italsodemonstratesthatcombininggeometry(pose)andsemantics(appearance)inatemporallystructuredwayleadstomorerobustandinterpretablemodels.
LearningfromMassiveVideoData[
108
]:Thefinalpartofthisthesisexploreshowlarge-scale,self-supervisedvideomodelscanbetrainedusingsimpleobjectives.WeintroduceToto,afamilyofcausaltransformermodelstrainedvianext-tokenpredictiononoveratrillionvisualtokensfromvideosandimages.Unliketypicalsupervisedvideomodels,Totomakesminimalassumptionsaboutinductivebiases.Itistrainedinafullyautoregressivemanner,learningtopredictthenextpatch.
Westudyhowdesignchoices—suchastokenizationgranularity,framesampling,andmodelarchitecture—affecttheemergenceofusefulrepresentations.Totodemonstratesstrongperformanceonawiderangeofdownstreamtasks,includingvideoclassification,tracking,objectpermanence,androbotics—oftenmatchingorexceedingmoretask-specificmodels.Perhapsmoreimportantly,thesimplicityofthelearningobjectiverevealsclearpower-lawscalingbehavior(whichcloserbutisstillslowerthanlanguage),suggestingapathforwardfortrainingeverlargerandmoregeneralvideomodels.
Summary:Thethreepartsofthisthesisofferacoherentapproachtovideounderstanding—startingfromprecisetracking,movingtostructuredrecognition,andfinallytolarge-scalepredictivelearning.Thecoreinsightisthattemporalitymatters:effectivevideomodelsmustreasonnotonlyaboutwhatisvisiblebuthowitchanges,persists,andunfoldsovertime.Bygroundingthesemodelsin3Dhuman-centricrepresentationsandscalingthemwithminimalsupervision.
Inthechaptersthatfollow,wedevelopeachoftheseideasindetail,supportedbyempiricalresults,andopen-sourcesystems.Ihopeisthatthisworkcontributestothebroadergoalofbuildingbettervideomodels.
4
Chapter2
TrackingPeoplebyPredicting3DAppearance,LocationandPose
Wepresentanapproachfortrackingpeopleinmonocularvideosbypredictingtheirfuture3Drepresentations.Toachievethis,wefirstliftpeopleto3Dfromasingleframeinarobustmanner.Thisliftingincludesinformationaboutthe3Dposeoftheperson,theirlocationinthe3Dspace,andthe3Dappearance.Aswetrackaperson,wecollect3Dobservationsovertimeinatrackletrepresentation.Giventhe3Dnatureofourobservations,webuildtemporalmodelsforeachoneofthepreviousattributes.Weusethesemodelstopredictthefuturestateofthetracklet,including3Dappearance,3Dlocation,and3Dpose.Forafutureframe,wecomputethesimilaritybetweenthepredictedstateofatrackletandthesingleframeobservationsinaprobabilisticmanner.AssociationissolvedwithsimpleHungarianmatching,andthematchesareusedtoupdatetherespectivetracklets.Weevaluateourapproachonvariousbenchmarksandreportstate-of-the-artresults.Codeandmodelsareavailableat:
https://brjathu.github.io/PHALP
.
2.1Introduction
Whenwewatchavideo,wecansegmentoutindividualpeople,cars,orotherobjectsandtrackthemovertime.Thecorrespondingtaskincomputervisionhasbeenstudiedforseveraldecadesnow,withafundamentalchoicebeingwhethertodothetrackingin2Dintheimageplane,orof3Dobjectsintheworld.Theformerseemssimplerbecauseitobviatestheneedforinferring3D,butifwedotakethestepofback-projectingfromtheimagetotheworld,otheraspectssuchasdealingwithocclusionbecomeeasier.Inthe3Dworldthetrackedobjectdoesn’tdisappear,andevenyounginfantsareawareofitspersistencebehindtheoccluder.Inourrecentwork[
111
],wepresentedexperimentalevidencethatperformanceisbetterwith3Drepresentations.Inthischapter,wewilltakethisasgranted,andproceedtodevelopasysteminthe3Dsettingoftheproblem.Whileourapproachbroadlyappliestoanyobjectcategorywhereparameterized3Dmodelsareavailableandcanbeinferredfromimages,wewilllimitourselvesinthischaptertostudyingpeople,themostimportantcaseinpractice.
CHAPTER2.TRACKINGPEOPLEBYPREDICTING3DAPPEARANCE,LOCATIONAND
POSE5
Figure2.1:Trackingpeoplebypredictingandmatchingin3D:Thetoprowshowsourtrackingresultsatthreedifferentframes.Theresultsarevisualizedbyacoloredhead-maskforuniqueidentities.Thesecondandthirdrowsshowrenderingsofthe3Dstatesofthetwopeopleintheirassociatedtracklets.Thebottomrowshowsthebottom-updetectionsineachimageframewhich,afterbeingliftedto3D,willbematchedwiththe3Dpredictionsofeachtrackletinthecorrespondingframe.Notehowinthemiddleframeofsecondrow,the3Drepresentationofthepersonpersistseventhoughtheyareoccludedintheimage.Readersareencouragedtowatchthevideosatthe
projectwebsite
.
Oncewehaveacceptedthephilosophythatwearetracking3Dobjectsina3Dworld,butfrom2Dimagesasrawdata,itisnaturaltoadoptthevocabularyfromcontroltheoryandestimation
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年康養(yǎng)旅游行業(yè)當(dāng)前發(fā)展現(xiàn)狀及增長策略研究報(bào)告
- 2025年電力建設(shè)行業(yè)當(dāng)前發(fā)展趨勢與投資機(jī)遇洞察報(bào)告
- 2025年資料員之資料員基礎(chǔ)知識(shí)通關(guān)考試題庫帶答案解析
- 2025年全國大學(xué)生525心理健康知識(shí)競賽考核題庫及答案
- 2025年初級會(huì)計(jì)考試試題題庫解析及答案
- 2025年施工員之裝修施工基礎(chǔ)知識(shí)考試題庫附答案ab卷
- 2025至2030年中國亞麻籽油市場競爭態(tài)勢及投資戰(zhàn)略規(guī)劃研究報(bào)告
- 2025年護(hù)士資格證考試試題(附答案)
- 2025監(jiān)理工程師繼續(xù)教育必修課試題(含答案)
- 2025年社會(huì)工作者之初級社會(huì)綜合能力能力提升試卷A卷附答案
- 2025年匹克球裁判試題及答案
- 2025規(guī)范家居裝修協(xié)議
- 2025年廣西繼續(xù)教育公需科目考試試題及答案貫徹創(chuàng)新驅(qū)動(dòng)發(fā)展戰(zhàn)略打造
- 2025秋蘇教版科學(xué)三年級上冊教學(xué)設(shè)計(jì)(附目錄)
- 《初中必讀名著導(dǎo)讀:《水滸傳》核心知識(shí)點(diǎn)與深度解讀》
- “安全生產(chǎn)責(zé)任制”培訓(xùn)試題及答案
- 地調(diào)考試試題及答案2025
- 診斷學(xué)血管檢查
- 2025年騰訊智慧零售日化行業(yè)數(shù)字化解決方案-騰訊云
- 項(xiàng)目投資評估管理辦法
- 哪個(gè)團(tuán)隊(duì)收益大+課件2025-2026學(xué)年+北師大版(2024)八年級數(shù)學(xué)上冊
評論
0/150
提交評論