




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
HardwareAcceleratorforConvolutionalRestricted
BoltzmannMachines
JunghoonHan
ElectricalEngineeringandComputerSciencesUniversityofCalifornia,Berkeley
TechnicalReportNo.UCB/EECS-2025-29
/Pubs/TechRpts/2025/EECS-2025-29.html
May1,2025
Copyright?2025,bytheauthor(s).
Allrightsreserved.
Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor
personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare
notmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecificpermission.
Acknowledgement
IwouldliketothankProfessorSayeefSalahuddinforhiscontinued
mentorshipandgeneroussponsorshipduringmymaster’sprogram.IthankPratikBrahma,whopioneeredthisresearchtopic,forhiscloseguidance,
ideas,andhelponthisproject.ThankstotherestoftheUnconventional
Computinggroupmembers,ChiragGarg,SaavanPatel,andPhilipCanoza,inhelpingoutwiththisprojectinvariousways.
Iamprofoundlygratefulfortheexceptionalsupportofmyfamily
throughouttheyears.Andthankstoallmyfriends,especiallymyfellowRa-Onbandmembers,whomademygraduateprogramfruitful.
HardwareAcceleratorfor
ConvolutionalRestrictedBoltzmannMachines
byJunghoonHan
ResearchProject
SubmittedtotheDepartmentofElectricalEngineeringandComputerSciences,
UniversityofCaliforniaatBerkeley,inpartialsatisfactionoftherequirementsforthedegreeofMasterofScience,PlanII.
ApprovalfortheReportandComprehensiveExamination:
Committee:
ProfessorSayeefSalahuddinResearchAdvisor
5.07.2024
(Date)
*******
ProfessorSophiaShaoSecondReader
5/6/2024
(Date)
Copyright2024
by
JunghoonHan
Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalor
classroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedfor
pro?torcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthe
?rstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,
requirespriorspeci?cpermission.
HardwareAcceleratorforConvolutional
RestrictedBoltzmannMachines
by
JunghoonHan
Abstract
RestrictedBoltzmannMachines(RBMs)havegainedattentionfortheirstrengthinaidingMonteCarlosimulationsforCombinatorialOptimization,QuantumApplications,andMa-chineLearningproblems.ConvolutionalRBM(CRBM),avariantofRBM,hassparkedinterestduetoitslowerparametercountsande?cientperformancefortranslationally-symmetricproblems.However,thestochasticnatureofCRBMoftenmakesittakelongdurationtoreachtheground-statesolution,demandinganapproachtoacceleratethecom-putationprocess.
Inthiswork,wedemonstrateourhardwareacceleratorforCRBM,implementedinRTLandprogrammedonFPGA.Softwareapplicationscanharnesstheacceleratorbysimplyprogrammingtheweights,bias,andlatticesizes.WeshowthatforsolvingfrustratedclassicalHamiltoniansforIsingShastry-Sutherlandmodel,ourhardwareacceleratesthereachingofground-statesolutionbyupto5ordersofmagnitudecomparedtoGPUs.
i
Contents
Contentsi
1Introduction1
1.1Background 1
1.2MotivationsandPreviouswork 2
2ConvolutionalRBM(CRBM)3
2.1RestrictedBoltzmannMachine(RBM) 3
2.2ConvolutionalRBM(CRBM) 3
2.3CRBMComputationLogic 5
2.4Shastry-Sutherlandmodelmapping 8
3CRBMHardwareAccelerator10
3.1Background 10
3.2Architecture 10
3.3InputandOutput(I/O)andProgrammingLogic 13
3.4Testing 14
3.5Analysis 14
4Results16
4.1TimetoSolution 16
4.2RuntimeResults 16
4.3Evaluation 17
5Conclusion19
5.1FutureSteps 19
5.2Conclusion 20
Bibliography21
1
Chapter1
Introduction
1.1Background
Intheever-evolvinglandscapeofIsingmodels,thequestfore?cientandrobustmodelscapableofprocessingcomplexdataremainsparamount.AmongthemyriadoftechniquesthathaveemergedformappingIsingmodels,RestrictedBoltzmannMachines(RBMs)standoutasafundamentalbuildingblockintherealmofunsupervisedlearning.Withtheirabilitytocaptureintricatepatterns,parallelizegibbssampling,andmaprelationshipsbetweendi?erentneurons,RBMshavegarneredconsiderableattentionandacclaiminthe?eldofCombinatorialOptimization,Quantumproblems,andclassicalIsingmodels.
PartofthisattentionisascribedtoConvolutionalRestrictedBoltzmannMachines(CRBMs).
CRBMsharnessthepowerofprobabilisticinferencetoexploresolutionspacesmoree?ec-tively,therebyenablingthediscoveryofoptimalornear-optimalsolutionsincomputation-allychallengingproblems.CRBMs,aconvolutionalvariantofRBMs,havelowerparametercounts,therebyincreasingthecomputee?ciencyfortrainingandinference.Recentworkhassparkedinterestsinitsabilitytooptimallymaptranslationallysymmetricproblems,inwhichconvolutionweightsarerepeatedeverystride.
ThetransformativepotentialofCRBMshasimmensepracticalsigni?canceinaddressingreal-worldchallengeswithprofoundimplications.Inmaterialsscience,theabilitytoexplorevastsolutionspaceswithprobabilisticmethodologiesenablesresearcherstoexpeditethesearchfornovelcompoundsandmaterialswithdesiredproperties.ThispaperwillpartiallyincludedemonstrationofmappingaclassicalIsingShastry-SutherlandmodeltoCRBMstoacceleratethesamplingcomputationstoreachground-statesolution.
Duetotheirstochasticnature,CRBMsmayrequiresigni?cantiterationsofsamplingtoreachthedesiredground-statesolution.TherequiredsamplingcountalsoincreaseswiththenumberofneuronsintheCRBM.Thus,toharnessthepowerofCRBMswithinareasonablecomputetime,ane?cientimplementationisessential.ThismotivatesourapproachtodesigninghardwareacceleratorsforCRBMstoimprovecomputetimeandenergye?ciency.
2
CHAPTER1.INTRODUCTION
1.2MotivationsandPreviouswork
MotivationforHardwareAcceleration
MappingthemathematicallogicdirectlyintodigitalRegisterTransfer-level(RTL)logic,ratherthanencodingthemtoinstructionsforgeneralpurposecomputers,canspeedupthecalculationsbyseveralordersofmagnitude.Thisprocesscannotonlysavecomputationtime,butalsoreducetheenergyrequiredtocomputeadesiredprogram.
SamelogicfollowsfordesigningacustomdigitalhardwareacceleratorforConvolutionalRBMs.Transistorlogiccanbecustomizedandoptimizedtosuitthespeci?crequirementsofCRBMs,suchasoptimizingmemoryaccesspatterns,exploitingspatialparallelismatthehardwarelevel,andimplementingspeci?cmodulestailoredforGibbssamplingcomputations.ThedetailsofthehardwareimplementationarenotedinChapter3.
RelevantPreviouswork
ThisresearchispartofSalahuddinLab’sUnconventionalComputingsubgroup,whichhasbeenusingRBMsforNP-Hardcombinatorialoptimizations.Ourteam’sformermembershavedemonstratedtheusedofhardwareacceleratedRBMsforsolvingoptimizationproblemssuchasMAX-CUTproblemandSherrington-Kirkpatrickspinglass.TheFPGA-mappedRBMhasdemonstratedsimilarorbetterscalingperformancecomparedtoQuantumCom-puterssuchasDWave200QQuantumAdiabaticComputer[1].SubsequentworkhasusedtheRBMHardwareacceleratorforintegerfactorizationof16-bitnumbers.Thisworkshowedastaggeringruntimeimprovementof10000xoverCPUsand1000xoverGPUs.[2]
Aspreviousresearchonhardware-acceleratedRBMshavebeenmeaningful,ourgroupwasmotivatedtodesignhardwareacceleratorstospeci?cvariantsofRBMs,notablyCRBMs.Inthispaper,weembarkonajourneytoexplorethedesign,implementation,andevaluationofahardwareacceleratortailoredspeci?callyforConvolutionalRestrictedBoltzmannMa-chinesfornon-deterministicpolynomial-timecomputing.ThroughRTL-leveldescriptionsandFPGAmappings,wedemonstratethee?cacyandversatilityofhardware-acceleratedCRBMsinsolvingcombinatorialoptimizationproblems.
3
Chapter2
ConvolutionalRBM(CRBM)
2.1RestrictedBoltzmannMachine(RBM)
TheRestrictedBoltzmannMachine(RBM)isastochastic2-layergraphneuralnetwork.The2layersareeachcalled”visible”and”hidden”layers,whichareall-to-allconnected,containingtheformofabipartitegraph.RBMsareusedbyblockGibbssamplingbetweenthe2layersrepeatedly,thentrackthevisiblelayervalueseverysampletoderivetheprob-abilitydistributionoftheresultingnode(neuron)values.RBMisanenergy-basedmodel,whichmeansthattheobjectiveofsamplingistominimizetheenergyvalueassociatedwiththeweight,bias,andnodevalues.[3]
Allnodevaluesarebinary:0or1.Thenextvalueofanodeisdeterminedbyderivingaprobabilityforittobeofvalue1andconductingrandomsamplingaccordingtotheprobability.Thenextsetofvaluesforeachlayersissampledbytheconditionalprobabilitydependentontheotherlayer.Thevaluesofallnodesinasinglelayeraresampledjointly;thenextsetofhiddennodeswillbesampledbyprobabilityp(h|v),andthevisiblenodesbyprobabilityp(v|h).ThisformofsimultaneoussamplingiscalledblockGibbssampling.
ThenodesandedgesoftheRBMcorrespondtoneuronsandsynapticconnections.Thus,whenwemapdiferentproblemstoRBM,wecanassignthevisiblenodestorepresentphysicalvariables(suchasspins,direction,groupassignment)andthehiddennodestointeractionsbetweenthem(suchasspininteractions).
2.2ConvolutionalRBM(CRBM)
WhileRBMsareassumedtohavefully-connectededgesbetweenthevisibleandhiddenlayers,CRBMsworkwithstridesandconvolution.CRBMsshowtranslationalinvariance,wherethepatternofweightsareidenticalacrossdiferentpartsofthenodes.Astheall-to-allconnectionofRBMcanbememory-heavyandcompute-heavy,CRBMhelpsrelaxthelogicbyusingonlyasetofconnectionstofullyrepresenttheprobabilitiesforblockGibbssampling.
4
CHAPTER2.CONVOLUTIONALRBM(CRBM)
Figure2.1:PictorialrepresentationofRBMandCRBM.
The2.1showsthestructureofRBMandCRBM.Asseenontherightofthe?gure,CRBMshavethesameweightsrepeatedeveryastride(inthiscase,strideequalto1).The?gurealsonotesperiodicity,whichmeanswhenthestridegoesoutofboundsofthevisiblenodes,itwrapsbacktothe?rstindexofthehiddennodes(inthiscase,connectingv4withh1).Periodicitycanbeturnedonorof,dependingontheproblemformulation.
CRBMscanhavemultiplesetofweights.Forexample,asperFigure2.1,the?rstsetofweightscanbew1=(e1,e2)=(1,2),whilethesecondsetofweightscanbew2=(e1,e2)=(3,4).Eachsetofweightswillproduceagroupofhiddennodes.Anothersetofweightswillproduceaseparategroupofhiddennodes.Hereon,wewillnotethemasconvolutiongroups.
EnergyandProbabilityformulation
ThefollowingformulasarederivedbyconvertingthegeneralRBMenergyandprobabilityequationstore?ecttheconvolutionalnatureofCRBM.
Here,thenotationsare:vijisthevisiblenodeatthei-throwandj-thcolumn.k
representstheconvolutionalgroup?,whichcorrespondstothekthsetofweights,alsoknown
as’?lters’.Wkisthek-th?lter.Wkisthek-th?lter,?ippedinbothhorizontalandverticalaxes.hkijinturnrepresentsthehiddennodeatgroupk,i-throwandj-thcolumn.bisthehiddenbiasandcisthevisiblebias.?istheelement-wiseproductfollowedbysummation:
A?B=trATB.*operatordenotesconvolution.σdenotesthesigmoidoperator.[4]
P(hkij=1|v)=σ((Wk*v)ij+b)(2.2)
5
CHAPTER2.CONVOLUTIONALRBM(CRBM)
TheobjectiveofourCRBMistosamplerepeatedlyuntiltheenergyreachestheground-statesolution.(Theground-statesolutionisalsotheoutputwithhighest-likelihood).Theprobabilitiesareusedtosampleeachofthevisibleandhiddennodevalues.Thisprobabilityisusedtorandomlysamplethenodevalueof0or1,therebydeterminingthenextvalueofthenodes.
2.3CRBMComputationLogic
TheCRBMcomputationlogicandsequenceisillustratedinFigure2.2.Notethatthelogic?owsfromvisiblenodes→hiddennodes→visiblenodes,andrepeats.
2.3.1.Visiblenodes
Thesamplingstartswiththeinitialstateofvisiblenodes.Inoursetting,thevisiblelayeriscon?guredasa2-dimensionalarrayofbinarynodes.
Figure2.2startswithvisiblenodesofsize3x3.
2.3.2.Wrapping
Wrappingisdonetoensureperiodicityisincorporatedintotheconvolutionlogic.Assumethatthe?ltersizeisMxM.Ifperiodicityis’on’inthecolumndirection,the?rstM-1columnsiscopiedtothelastcolumnindex.Ifperiodicityis’of’inthecolumndirection,therewillbeM-1columnsofzerosinserted.Thesamelogicholdsfortherowdirection.
Figure2.2notesthewrappinglogicfora2x2size?lterandperiodicityoninbothcolumn
androwdirection.Thewrappednodesaredenotedincolororange.
2.3.3.Convolution-Forward
Forwardconvolutionnotestheconvolutionlogicnecessaryforsamplinghiddennodesfromvisiblenodes(visible→hidden).Convolutionhereoccursasanelement-wisematrixmultiplywiththe?lterandcurrentposition’svisiblenodes,followedbyaccumulation(mac).Thisoperationisconductedrepeatedlywithastride,whichmovesthe?ltertothenextrespectivelocation.Thestrideoccursinbothcolumnandrowdirection,andtheprocessisrepeateduntileachdirection’sindexisoutofbounds.
Thecompleteprocessmentionedaboveisidenticalforalldiferent?lters.Thenumberofoutputgroupswillbeequaltothenumberofdiferent?lters.
6
CHAPTER2.CONVOLUTIONALRBM(CRBM)
Figure2.2:CRBMComputationlogic
7
CHAPTER2.CONVOLUTIONALRBM(CRBM)
Figure2.2illustratestheconvolutionlogicfor3diferent2x2size?lterswithastrideof
2.For4x4visiblenodes,thisprocesscreatesa2x2resultforeach?ltergroup.
2.3.4.ProbabilityandSampling-Forward
TheconvolutionresultissenttoasigmoidoperatortoobtaintheprobabilityofP(h|v).Thesigmoidisappliedelement-wisetoeachoftheoutputsoftheconvolution.
Sigmoidwillprovideaprobabilityvaluebetween0and1,whichisin-turnusedforrandomsampling.Thesamplerwilltaketheprobabilityasthelikelihoodofresultnodebeingequalto1.Then,thesampler’sresult,either0or1,willbethenextvalueofthehiddennodes.Inpractice,thisprocessisdonebygeneratingarandom?oatingpointvaluebetween0and1,comparingittothesigmoidoutput,andsettingtheresultvalueto1iftherandomnumberislessthanthesigmoidoutput.
2.3.5.Hiddennodes
Thesampledvalueswillbethenexthiddennodevalues.WithNdiferent?lters,therewillbeNgroupsofhiddennodes.Allhiddennodevaluesarebinaryaswell.
2.3.6.ZeroPadding
Weconductazero-paddingtechniquetoensurethattheresultingreversesampling(hidden→visible)hasthesamedimensionasthestartingvisiblenodedimension.Thatis,weinsertzerosbetweenthehiddennodesinalldirections.
Similarlytothewrappingstep,zeropaddingalsoincludescopyingthelastcolumnsandrowstothebeginningcolumnandrow.Ifperiodicityison,wecopythehiddennodevaluesalongwithpaddedzerios.Ifperiodicityisof,wesimplyzerosareaddedtothebeginningcolumnandrowpositions.
2.3.7.Convolution-Reverse
Theconvolutionlogichereissimilartothatoftheconvolutioninforwarddirection.Thekeydiferencehereisthatthe?ltersappliedare?ippedinhorizontalandverticaldirections.Moreover,thestridevalueisalwaysequalto1inthereversedirection.
2.3.8.Accumulation
ForNdiferentconvolutiongroups,therewillbeNdiferentconvolutionoutputs.Thisstepaccumulatesallthenodevaluesfromtheconvolutionoutput,element-wise.Thedimensionoftheoutputfromthisstepisequaltothatofthevisiblenodes.
8
CHAPTER2.CONVOLUTIONALRBM(CRBM)
Figure2.3:Shastry-SutherlandMagnetizationPhases
2.3.10.ProbabilityandSampling-reverse
Similartotheforwarddirectionprocess,thesigmoidisappliedtoproducetheprobability,whichisusedforrandomlysamplingthenextsetofvisiblenodes.Thisstepproducesthenextsetofvisiblenodevalues,whichcompletesthefullcycle.
2.4Shastry-Sutherlandmodelmapping
Inourwork,wemaptheclassicalIsingShastry-SutherlandmodelontheCRBMstructuretosolvefrustratedclassicalHamiltonian.OurresultsdemonstratethattheCRBMcanbeusedtosimulateanykindoftranslationally-symmetricclassicalHamiltonian.TheShastry-SutherlandLatticehasdiscretetranslationalsymmetry,wherecertainsetofspininteractionsarerepeatedelsewhereonthelattice.TheShastry-SutherlandmodelcanbemappedtoCRBMinthefollowingway:thevisiblenodescanrepresentphsyicalvariables,inthiscasethemagneticspins.Thehiddennodescanrepresentinteractionsbetweenthespins.
TomaptheShastry-SutherlandIsingmodeltotheCRBMframework,weequatethephysicallattice’sBoltzmanndistributiontoRBM’smarginaldistribution.TheRBMweightsarethenmappedtobeuniqueonlyuptotheunitcellonthelattice,ofsize3x3.Thus,the
9
CHAPTER2.CONVOLUTIONALRBM(CRBM)
?ltersizesare3x3.TheShastry-Sutherlandcontainsunique10repeatedinteractions,leadingtotheformulationof10diferent?lters.
Wefocusedourexperimenton4oftheShastry-SutherlandMagnetizationphases,asnotedinFigure2.3.Eachnode,mappedtothevisiblenodesofCRBM,representthemagnetizationspins.Theemptycirclesarerepresentedas1,and?lledcirclesarerepresentedas0.Diferentphaseproblemsproducediferent?ltersandbiases.
?AFMPhase:Anti-FerromagneticPhase.Everynon-diagonalnodeshavetheoppositespins.
?FMPhase:FerromagneticPhase.Allnodeshavethesamemagnetizationspins
?1/3FractionalPhase:therowsofthelatticeshowapatternofFMphaserowsand-wichedbetweentwoAFMphaserows
?DimerPhase:certaindiagonalsetofnodesareexpectedtobeoppositespinsofeachother(markedingreenboxes)
DetailedmappingresultoftheShastry-SutherlandtoCRBMwillbeillustratedinacomingpaperfromtheSalahuddinGroup,inaworkpioneeredbyPratikBrahma.
10
Chapter3
CRBMHardwareAccelerator
3.1Background
TheobjectiveoftheCRBMhardwareacceleratoristosigni?cantlyreducetheruntimeofreachingtheground-statesolutionofCRBM.
ThehardwaredesignisimplementedinRTL(RegisterTransferLevel)andmappedtoFieldProgrammableGateArray(FPGA).WeusedtheVirtexUltrascale+FPGAdevice(VCU118),aproductofXilinx-AMD.ThisFPGArepresentsacutting-edgesolutioninthe?eldofFPGAswith14nm/16nmFinFETprocesstechnology,dynamicpowermanagement,andintegratedGen3x16PCIeblocks.WeusethisFPGAjointlywiththeexperimentserverwith11thGenIntelCorei9-11900K@3.50GHzand135GBRAM.ToprogramtheFPGA,weuseXilinx’sVivadotools.
3.2Architecture
Thehardwarearchitecture,asdenotedinFigure3.1,mapseachstepofCRBMintorespectivehardwaremodules.NotethattherearecorrespondingmodulestothedescribedstepsinFigure2.2.
Thehardwareispipelinedwith2stages:forwardandreverse.Theforwardstagecontainslogicofsamplingfromvisiblenodes→hiddennodes(stages3.2.1to3.2.5).Thereversestagecontainslogicofsamplingfromhiddennodes→visiblenodes(stages3.2.5to3.2.9).
3.2.1.VisibleNodeRegisters
The2Dvisiblenodelayerisrepresentedinasingleregister.Asthenodevaluesarebinary,theytakeupasinglebitintheregister.ThistechniqueminimizestheLUTresourceusageontheFPGA.
11
CHAPTER3.CRBMHARDWAREACCELERATOR
Figure3.1:CRBMHardwareAcceleratorArchitecture
ThedimensionofthelatticesizeisnotedasLxL,whichnotesLrowsandLcolumnsofvisiblenodes,makingatotalofLxLvisiblenodes.Thus,thereareLxLbitsinthevisiblenoderegister.
3.2.2.Wrapper
Thewrappermodulefollowsthelogicofwrappingtechniquenotedinsection2.3.2.Ittakesinthevisiblenoderegisterandperiodicitysignalasinputs,andcopiesorzerosouttherespectivecolumnsandrowsaccordingly.
3.2.3.Convoluter-Forward
Theconvolutermoduletakesinthewrapperand?lterstoconductconvolutionlogicasnotedinsection2.3.3.The?ltervaluesareprovidedbytheuser’ssoftwareviaPCIe.
Inthisimplementation,theconvolutertakesadvantageofspatialparallelism.Itcontainsconvolutionlogicofmultiplyandaccumulateinplaceforcorrespondingpositions.Samelogiciscopiedtootherpositionsthatareseparatedinadistanceequaltothestridevalue
12
CHAPTER3.CRBMHARDWAREACCELERATOR
inalldirections.Insummary,allconvolutioncomputationiscontainedinasinglespatiallyparallelisedcombinationallogic.
3.2.4.SigmoidandLFSR-Forward
Thesigmoidmodulesaresynthesizedwiththeinputandoutputbitcountparameters,whichareusedtodeterminethelevelofprecisionoftheinputandoutput.Theinputistheresultoftheconvolution.Theoutputisthecorrespondingsigmoidvalue.Thesigmoidmoduleinternallycontainsapre-codedLUTwhichisidenticaltoadictionaryofkeyandvalue,inputandoutput.Themoduleselectstheclosestcorrespondingsigmoidvaluethatwassynthesizedwiththegivenprecisionparameters.
TheLinearFeedbackShiftRegister(LFSR)moduleissynthesizedaccordingtoasetseedvalue.Theinternalregisterinthemodule,initializedwiththeseedvalue,isshu?edeverycycletoproducerandomizedbits.TheLFSRoutputisconvertedtoavaluebetween0and1.
ForN?ltergroups,thereareNsigmoidmodulesandNLFSRmodules.ThesigmoidvalueiscomparedwiththeoutputoftheLFSRmodule.Ifthesigmoidvalueisgreater,thecorrespondinghiddennodewillcontainvalue1.Otherwise,itwillcontainvalue0.Thislogiccompletesthe?rstpipelinestage.
TheSigmoidandLFSRhardwaremodulesarepioneeredbyourformerresearchers,Saa-vanPatelandPhilipCanoza.
3.2.5.HiddenNodeRegisters
ForN?lters,thereareNhiddennodegroupsproduced.Eachofthemwillhaveadimensionofb(L+1)/stridecxb(L+1)/stridec.Thus,thehiddennoderegisterwillcontainatotalofNxb(L+1)/stridecxb(L+1)/stridecbits.
3.2.6.ZeroPadder
Thezeropaddermoduleimplementsthelogicnotedinsection2.3.6.Thehardwarekeepsanarrayofzeroswithemptyslotsforpositionsthattakeinhiddennodevalues.Thehiddennodevaluesareinsertedinaspatiallyparallelmanner.ForN?lters,thereareNzeropaddermodules.
3.2.7.Convoluter-Reverse
Theconvolutermoduleinthereversedirectionisthesamemoduleusedintheforwarddirection(section3.2.3).The?ippedweightsareinputstothismodule,whichareprovidedbytheusersoftwareviaPCIe.ForN?lters,thereareNreverseconvolutermodules.
13
CHAPTER3.CRBMHARDWAREACCELERATOR
3.2.8.Accumulator
Theoutputoftheconvolutermoduleisaccumulatedinthismodule.Astheaccumulationisdoneelement-wise,itissimpletocreateacombinationallogicthataddsupthevaluesforthesamepositionsinNgroups.Followingtheaccumulatormodule,theN?ltergroupsareaggregatedtoasinglegroup.
Moreover,visiblebiasisappliedinthismodule.Thebiasvaluesareprovidedbytheusersoftware.Weprovideanoptiontouseoddbiasandevenbias,whichallowsdiferentbiasvaluestobeappliedforoddcolumnsandevencolumns.
3.2.9.SigmoidandLFSR-Reverse
ThesigmoidandLFSRmodulesusedinthereversedirectionarethesamemodulesusedintheforwarddirection(section3.2.4).Theoutputofthesemodulesdeterminethenextvisiblenodevalues.
Thismodulecompletesthesecondpipelinestage,andcompletesafullcycleofsampling.
3.3InputandOutput(I/O)andProgrammingLogic
ThehostmachineandtheFPGAcommunicatesoverthex16PCIe.WeimplementtheInputandOutput(I/O)logicofthePCIethroughanopensourcemodulenamedXillybus.XillybusprovidesbothanFPGAIPcoreandadriverforthehostPC’soperatingsystem.ItprovidescustomizedbundlesfordiferentFPGAmodels.
Althoughourhardwareneednotcommunicatelargedatawithineachtimesteps,thehostmachineandFPGArunondiferentclockfrequencies,producingaclockdomaincross-ing.Thus,weuseaFirst-In-First-Out(FIFO)moduletoenablesequentialcommunicationbetweenthehostandFPGA.
TheXillybusdrivercreates2devices?lesoftheFPGA:oneforwritingandoneforreading.Theusersoftwarewritesandreadsthefollowingvaluestoandfromthedevice?les:
?Write(PCtoFPGA):weights,?ippedweights,biases,latticesizes(visiblelayerdi-mensions),periodicity,andclearlasthiddenrowssignal(someapplicationsrequireclampingthelastrowofhiddennodestozero)
?Read(fromFPGA):Visiblenodevaluesofeachcycle
Afterthehostmachinereadsthevisiblenodevalues,theusersoftwarecalculatestheenergyofthenodes.
14
CHAPTER3.CRBMHARDWAREACCELERATOR
Figure3.2:CRBMHardwareAcceleratorcomplexities
3.4Testing
Eachhardwaremodulewentthroughbehavioraltestin
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 自動(dòng)駕駛與定位系統(tǒng)測試題附答案
- 藥學(xué)情景模擬考試題庫及答案
- 化學(xué)史(化學(xué)反應(yīng)原理發(fā)現(xiàn))試題
- 化學(xué)創(chuàng)新人才早期培養(yǎng)試題
- 2025年高考物理圖像法提取信息試題
- 保潔崗位面試題目及答案
- 新疆中考綜合試卷及答案
- 2025年普洱法院面試真題及答案
- 2025年高二物理下學(xué)期虛擬仿真實(shí)驗(yàn)試題
- 2025年陜西國網(wǎng)三批招聘已發(fā)布(59人)模擬試卷及答案詳解(必刷)
- 術(shù)后患者管理制度、術(shù)后患者處理工作流程
- 高中體考筆試試題及答案
- 辦公室管理-形考任務(wù)二(第一~第二章)-國開-參考資料
- 2025年無線電裝接工(中級)職業(yè)技能考試題(附答案)
- 2024年秋季新北師大版七年級上冊數(shù)學(xué)全冊教案設(shè)計(jì)
- 2025年地磅租賃合同協(xié)議樣本
- 2018天成消防B-TG-TC5000火災(zāi)報(bào)警控制器消防聯(lián)動(dòng)控制器安裝使用說明書
- (高清版)DB32∕T 4443-2023 罐區(qū)內(nèi)在役危險(xiǎn)化學(xué)品(常低壓)儲罐管理規(guī)范
- 醫(yī)院培訓(xùn)課件:《輸液泵》
- 量子通信金融應(yīng)用研究報(bào)告
- DBJ51-T 184-2021 四川省預(yù)成孔植樁技術(shù)標(biāo)準(zhǔn)
評論
0/150
提交評論