卷積受限玻爾茲曼機(jī)的硬件加速器 Hardware Accelerator for Convolutional Restricted Boltzmann Machines_第1頁
卷積受限玻爾茲曼機(jī)的硬件加速器 Hardware Accelerator for Convolutional Restricted Boltzmann Machines_第2頁
卷積受限玻爾茲曼機(jī)的硬件加速器 Hardware Accelerator for Convolutional Restricted Boltzmann Machines_第3頁
卷積受限玻爾茲曼機(jī)的硬件加速器 Hardware Accelerator for Convolutional Restricted Boltzmann Machines_第4頁
卷積受限玻爾茲曼機(jī)的硬件加速器 Hardware Accelerator for Convolutional Restricted Boltzmann Machines_第5頁
已閱讀5頁,還剩39頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

HardwareAcceleratorforConvolutionalRestricted

BoltzmannMachines

JunghoonHan

ElectricalEngineeringandComputerSciencesUniversityofCalifornia,Berkeley

TechnicalReportNo.UCB/EECS-2025-29

/Pubs/TechRpts/2025/EECS-2025-29.html

May1,2025

Copyright?2025,bytheauthor(s).

Allrightsreserved.

Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor

personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare

notmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecificpermission.

Acknowledgement

IwouldliketothankProfessorSayeefSalahuddinforhiscontinued

mentorshipandgeneroussponsorshipduringmymaster’sprogram.IthankPratikBrahma,whopioneeredthisresearchtopic,forhiscloseguidance,

ideas,andhelponthisproject.ThankstotherestoftheUnconventional

Computinggroupmembers,ChiragGarg,SaavanPatel,andPhilipCanoza,inhelpingoutwiththisprojectinvariousways.

Iamprofoundlygratefulfortheexceptionalsupportofmyfamily

throughouttheyears.Andthankstoallmyfriends,especiallymyfellowRa-Onbandmembers,whomademygraduateprogramfruitful.

HardwareAcceleratorfor

ConvolutionalRestrictedBoltzmannMachines

byJunghoonHan

ResearchProject

SubmittedtotheDepartmentofElectricalEngineeringandComputerSciences,

UniversityofCaliforniaatBerkeley,inpartialsatisfactionoftherequirementsforthedegreeofMasterofScience,PlanII.

ApprovalfortheReportandComprehensiveExamination:

Committee:

ProfessorSayeefSalahuddinResearchAdvisor

5.07.2024

(Date)

*******

ProfessorSophiaShaoSecondReader

5/6/2024

(Date)

Copyright2024

by

JunghoonHan

Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalor

classroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedfor

pro?torcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthe

?rstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,

requirespriorspeci?cpermission.

HardwareAcceleratorforConvolutional

RestrictedBoltzmannMachines

by

JunghoonHan

Abstract

RestrictedBoltzmannMachines(RBMs)havegainedattentionfortheirstrengthinaidingMonteCarlosimulationsforCombinatorialOptimization,QuantumApplications,andMa-chineLearningproblems.ConvolutionalRBM(CRBM),avariantofRBM,hassparkedinterestduetoitslowerparametercountsande?cientperformancefortranslationally-symmetricproblems.However,thestochasticnatureofCRBMoftenmakesittakelongdurationtoreachtheground-statesolution,demandinganapproachtoacceleratethecom-putationprocess.

Inthiswork,wedemonstrateourhardwareacceleratorforCRBM,implementedinRTLandprogrammedonFPGA.Softwareapplicationscanharnesstheacceleratorbysimplyprogrammingtheweights,bias,andlatticesizes.WeshowthatforsolvingfrustratedclassicalHamiltoniansforIsingShastry-Sutherlandmodel,ourhardwareacceleratesthereachingofground-statesolutionbyupto5ordersofmagnitudecomparedtoGPUs.

i

Contents

Contentsi

1Introduction1

1.1Background 1

1.2MotivationsandPreviouswork 2

2ConvolutionalRBM(CRBM)3

2.1RestrictedBoltzmannMachine(RBM) 3

2.2ConvolutionalRBM(CRBM) 3

2.3CRBMComputationLogic 5

2.4Shastry-Sutherlandmodelmapping 8

3CRBMHardwareAccelerator10

3.1Background 10

3.2Architecture 10

3.3InputandOutput(I/O)andProgrammingLogic 13

3.4Testing 14

3.5Analysis 14

4Results16

4.1TimetoSolution 16

4.2RuntimeResults 16

4.3Evaluation 17

5Conclusion19

5.1FutureSteps 19

5.2Conclusion 20

Bibliography21

1

Chapter1

Introduction

1.1Background

Intheever-evolvinglandscapeofIsingmodels,thequestfore?cientandrobustmodelscapableofprocessingcomplexdataremainsparamount.AmongthemyriadoftechniquesthathaveemergedformappingIsingmodels,RestrictedBoltzmannMachines(RBMs)standoutasafundamentalbuildingblockintherealmofunsupervisedlearning.Withtheirabilitytocaptureintricatepatterns,parallelizegibbssampling,andmaprelationshipsbetweendi?erentneurons,RBMshavegarneredconsiderableattentionandacclaiminthe?eldofCombinatorialOptimization,Quantumproblems,andclassicalIsingmodels.

PartofthisattentionisascribedtoConvolutionalRestrictedBoltzmannMachines(CRBMs).

CRBMsharnessthepowerofprobabilisticinferencetoexploresolutionspacesmoree?ec-tively,therebyenablingthediscoveryofoptimalornear-optimalsolutionsincomputation-allychallengingproblems.CRBMs,aconvolutionalvariantofRBMs,havelowerparametercounts,therebyincreasingthecomputee?ciencyfortrainingandinference.Recentworkhassparkedinterestsinitsabilitytooptimallymaptranslationallysymmetricproblems,inwhichconvolutionweightsarerepeatedeverystride.

ThetransformativepotentialofCRBMshasimmensepracticalsigni?canceinaddressingreal-worldchallengeswithprofoundimplications.Inmaterialsscience,theabilitytoexplorevastsolutionspaceswithprobabilisticmethodologiesenablesresearcherstoexpeditethesearchfornovelcompoundsandmaterialswithdesiredproperties.ThispaperwillpartiallyincludedemonstrationofmappingaclassicalIsingShastry-SutherlandmodeltoCRBMstoacceleratethesamplingcomputationstoreachground-statesolution.

Duetotheirstochasticnature,CRBMsmayrequiresigni?cantiterationsofsamplingtoreachthedesiredground-statesolution.TherequiredsamplingcountalsoincreaseswiththenumberofneuronsintheCRBM.Thus,toharnessthepowerofCRBMswithinareasonablecomputetime,ane?cientimplementationisessential.ThismotivatesourapproachtodesigninghardwareacceleratorsforCRBMstoimprovecomputetimeandenergye?ciency.

2

CHAPTER1.INTRODUCTION

1.2MotivationsandPreviouswork

MotivationforHardwareAcceleration

MappingthemathematicallogicdirectlyintodigitalRegisterTransfer-level(RTL)logic,ratherthanencodingthemtoinstructionsforgeneralpurposecomputers,canspeedupthecalculationsbyseveralordersofmagnitude.Thisprocesscannotonlysavecomputationtime,butalsoreducetheenergyrequiredtocomputeadesiredprogram.

SamelogicfollowsfordesigningacustomdigitalhardwareacceleratorforConvolutionalRBMs.Transistorlogiccanbecustomizedandoptimizedtosuitthespeci?crequirementsofCRBMs,suchasoptimizingmemoryaccesspatterns,exploitingspatialparallelismatthehardwarelevel,andimplementingspeci?cmodulestailoredforGibbssamplingcomputations.ThedetailsofthehardwareimplementationarenotedinChapter3.

RelevantPreviouswork

ThisresearchispartofSalahuddinLab’sUnconventionalComputingsubgroup,whichhasbeenusingRBMsforNP-Hardcombinatorialoptimizations.Ourteam’sformermembershavedemonstratedtheusedofhardwareacceleratedRBMsforsolvingoptimizationproblemssuchasMAX-CUTproblemandSherrington-Kirkpatrickspinglass.TheFPGA-mappedRBMhasdemonstratedsimilarorbetterscalingperformancecomparedtoQuantumCom-puterssuchasDWave200QQuantumAdiabaticComputer[1].SubsequentworkhasusedtheRBMHardwareacceleratorforintegerfactorizationof16-bitnumbers.Thisworkshowedastaggeringruntimeimprovementof10000xoverCPUsand1000xoverGPUs.[2]

Aspreviousresearchonhardware-acceleratedRBMshavebeenmeaningful,ourgroupwasmotivatedtodesignhardwareacceleratorstospeci?cvariantsofRBMs,notablyCRBMs.Inthispaper,weembarkonajourneytoexplorethedesign,implementation,andevaluationofahardwareacceleratortailoredspeci?callyforConvolutionalRestrictedBoltzmannMa-chinesfornon-deterministicpolynomial-timecomputing.ThroughRTL-leveldescriptionsandFPGAmappings,wedemonstratethee?cacyandversatilityofhardware-acceleratedCRBMsinsolvingcombinatorialoptimizationproblems.

3

Chapter2

ConvolutionalRBM(CRBM)

2.1RestrictedBoltzmannMachine(RBM)

TheRestrictedBoltzmannMachine(RBM)isastochastic2-layergraphneuralnetwork.The2layersareeachcalled”visible”and”hidden”layers,whichareall-to-allconnected,containingtheformofabipartitegraph.RBMsareusedbyblockGibbssamplingbetweenthe2layersrepeatedly,thentrackthevisiblelayervalueseverysampletoderivetheprob-abilitydistributionoftheresultingnode(neuron)values.RBMisanenergy-basedmodel,whichmeansthattheobjectiveofsamplingistominimizetheenergyvalueassociatedwiththeweight,bias,andnodevalues.[3]

Allnodevaluesarebinary:0or1.Thenextvalueofanodeisdeterminedbyderivingaprobabilityforittobeofvalue1andconductingrandomsamplingaccordingtotheprobability.Thenextsetofvaluesforeachlayersissampledbytheconditionalprobabilitydependentontheotherlayer.Thevaluesofallnodesinasinglelayeraresampledjointly;thenextsetofhiddennodeswillbesampledbyprobabilityp(h|v),andthevisiblenodesbyprobabilityp(v|h).ThisformofsimultaneoussamplingiscalledblockGibbssampling.

ThenodesandedgesoftheRBMcorrespondtoneuronsandsynapticconnections.Thus,whenwemapdiferentproblemstoRBM,wecanassignthevisiblenodestorepresentphysicalvariables(suchasspins,direction,groupassignment)andthehiddennodestointeractionsbetweenthem(suchasspininteractions).

2.2ConvolutionalRBM(CRBM)

WhileRBMsareassumedtohavefully-connectededgesbetweenthevisibleandhiddenlayers,CRBMsworkwithstridesandconvolution.CRBMsshowtranslationalinvariance,wherethepatternofweightsareidenticalacrossdiferentpartsofthenodes.Astheall-to-allconnectionofRBMcanbememory-heavyandcompute-heavy,CRBMhelpsrelaxthelogicbyusingonlyasetofconnectionstofullyrepresenttheprobabilitiesforblockGibbssampling.

4

CHAPTER2.CONVOLUTIONALRBM(CRBM)

Figure2.1:PictorialrepresentationofRBMandCRBM.

The2.1showsthestructureofRBMandCRBM.Asseenontherightofthe?gure,CRBMshavethesameweightsrepeatedeveryastride(inthiscase,strideequalto1).The?gurealsonotesperiodicity,whichmeanswhenthestridegoesoutofboundsofthevisiblenodes,itwrapsbacktothe?rstindexofthehiddennodes(inthiscase,connectingv4withh1).Periodicitycanbeturnedonorof,dependingontheproblemformulation.

CRBMscanhavemultiplesetofweights.Forexample,asperFigure2.1,the?rstsetofweightscanbew1=(e1,e2)=(1,2),whilethesecondsetofweightscanbew2=(e1,e2)=(3,4).Eachsetofweightswillproduceagroupofhiddennodes.Anothersetofweightswillproduceaseparategroupofhiddennodes.Hereon,wewillnotethemasconvolutiongroups.

EnergyandProbabilityformulation

ThefollowingformulasarederivedbyconvertingthegeneralRBMenergyandprobabilityequationstore?ecttheconvolutionalnatureofCRBM.

Here,thenotationsare:vijisthevisiblenodeatthei-throwandj-thcolumn.k

representstheconvolutionalgroup?,whichcorrespondstothekthsetofweights,alsoknown

as’?lters’.Wkisthek-th?lter.Wkisthek-th?lter,?ippedinbothhorizontalandverticalaxes.hkijinturnrepresentsthehiddennodeatgroupk,i-throwandj-thcolumn.bisthehiddenbiasandcisthevisiblebias.?istheelement-wiseproductfollowedbysummation:

A?B=trATB.*operatordenotesconvolution.σdenotesthesigmoidoperator.[4]

P(hkij=1|v)=σ((Wk*v)ij+b)(2.2)

5

CHAPTER2.CONVOLUTIONALRBM(CRBM)

TheobjectiveofourCRBMistosamplerepeatedlyuntiltheenergyreachestheground-statesolution.(Theground-statesolutionisalsotheoutputwithhighest-likelihood).Theprobabilitiesareusedtosampleeachofthevisibleandhiddennodevalues.Thisprobabilityisusedtorandomlysamplethenodevalueof0or1,therebydeterminingthenextvalueofthenodes.

2.3CRBMComputationLogic

TheCRBMcomputationlogicandsequenceisillustratedinFigure2.2.Notethatthelogic?owsfromvisiblenodes→hiddennodes→visiblenodes,andrepeats.

2.3.1.Visiblenodes

Thesamplingstartswiththeinitialstateofvisiblenodes.Inoursetting,thevisiblelayeriscon?guredasa2-dimensionalarrayofbinarynodes.

Figure2.2startswithvisiblenodesofsize3x3.

2.3.2.Wrapping

Wrappingisdonetoensureperiodicityisincorporatedintotheconvolutionlogic.Assumethatthe?ltersizeisMxM.Ifperiodicityis’on’inthecolumndirection,the?rstM-1columnsiscopiedtothelastcolumnindex.Ifperiodicityis’of’inthecolumndirection,therewillbeM-1columnsofzerosinserted.Thesamelogicholdsfortherowdirection.

Figure2.2notesthewrappinglogicfora2x2size?lterandperiodicityoninbothcolumn

androwdirection.Thewrappednodesaredenotedincolororange.

2.3.3.Convolution-Forward

Forwardconvolutionnotestheconvolutionlogicnecessaryforsamplinghiddennodesfromvisiblenodes(visible→hidden).Convolutionhereoccursasanelement-wisematrixmultiplywiththe?lterandcurrentposition’svisiblenodes,followedbyaccumulation(mac).Thisoperationisconductedrepeatedlywithastride,whichmovesthe?ltertothenextrespectivelocation.Thestrideoccursinbothcolumnandrowdirection,andtheprocessisrepeateduntileachdirection’sindexisoutofbounds.

Thecompleteprocessmentionedaboveisidenticalforalldiferent?lters.Thenumberofoutputgroupswillbeequaltothenumberofdiferent?lters.

6

CHAPTER2.CONVOLUTIONALRBM(CRBM)

Figure2.2:CRBMComputationlogic

7

CHAPTER2.CONVOLUTIONALRBM(CRBM)

Figure2.2illustratestheconvolutionlogicfor3diferent2x2size?lterswithastrideof

2.For4x4visiblenodes,thisprocesscreatesa2x2resultforeach?ltergroup.

2.3.4.ProbabilityandSampling-Forward

TheconvolutionresultissenttoasigmoidoperatortoobtaintheprobabilityofP(h|v).Thesigmoidisappliedelement-wisetoeachoftheoutputsoftheconvolution.

Sigmoidwillprovideaprobabilityvaluebetween0and1,whichisin-turnusedforrandomsampling.Thesamplerwilltaketheprobabilityasthelikelihoodofresultnodebeingequalto1.Then,thesampler’sresult,either0or1,willbethenextvalueofthehiddennodes.Inpractice,thisprocessisdonebygeneratingarandom?oatingpointvaluebetween0and1,comparingittothesigmoidoutput,andsettingtheresultvalueto1iftherandomnumberislessthanthesigmoidoutput.

2.3.5.Hiddennodes

Thesampledvalueswillbethenexthiddennodevalues.WithNdiferent?lters,therewillbeNgroupsofhiddennodes.Allhiddennodevaluesarebinaryaswell.

2.3.6.ZeroPadding

Weconductazero-paddingtechniquetoensurethattheresultingreversesampling(hidden→visible)hasthesamedimensionasthestartingvisiblenodedimension.Thatis,weinsertzerosbetweenthehiddennodesinalldirections.

Similarlytothewrappingstep,zeropaddingalsoincludescopyingthelastcolumnsandrowstothebeginningcolumnandrow.Ifperiodicityison,wecopythehiddennodevaluesalongwithpaddedzerios.Ifperiodicityisof,wesimplyzerosareaddedtothebeginningcolumnandrowpositions.

2.3.7.Convolution-Reverse

Theconvolutionlogichereissimilartothatoftheconvolutioninforwarddirection.Thekeydiferencehereisthatthe?ltersappliedare?ippedinhorizontalandverticaldirections.Moreover,thestridevalueisalwaysequalto1inthereversedirection.

2.3.8.Accumulation

ForNdiferentconvolutiongroups,therewillbeNdiferentconvolutionoutputs.Thisstepaccumulatesallthenodevaluesfromtheconvolutionoutput,element-wise.Thedimensionoftheoutputfromthisstepisequaltothatofthevisiblenodes.

8

CHAPTER2.CONVOLUTIONALRBM(CRBM)

Figure2.3:Shastry-SutherlandMagnetizationPhases

2.3.10.ProbabilityandSampling-reverse

Similartotheforwarddirectionprocess,thesigmoidisappliedtoproducetheprobability,whichisusedforrandomlysamplingthenextsetofvisiblenodes.Thisstepproducesthenextsetofvisiblenodevalues,whichcompletesthefullcycle.

2.4Shastry-Sutherlandmodelmapping

Inourwork,wemaptheclassicalIsingShastry-SutherlandmodelontheCRBMstructuretosolvefrustratedclassicalHamiltonian.OurresultsdemonstratethattheCRBMcanbeusedtosimulateanykindoftranslationally-symmetricclassicalHamiltonian.TheShastry-SutherlandLatticehasdiscretetranslationalsymmetry,wherecertainsetofspininteractionsarerepeatedelsewhereonthelattice.TheShastry-SutherlandmodelcanbemappedtoCRBMinthefollowingway:thevisiblenodescanrepresentphsyicalvariables,inthiscasethemagneticspins.Thehiddennodescanrepresentinteractionsbetweenthespins.

TomaptheShastry-SutherlandIsingmodeltotheCRBMframework,weequatethephysicallattice’sBoltzmanndistributiontoRBM’smarginaldistribution.TheRBMweightsarethenmappedtobeuniqueonlyuptotheunitcellonthelattice,ofsize3x3.Thus,the

9

CHAPTER2.CONVOLUTIONALRBM(CRBM)

?ltersizesare3x3.TheShastry-Sutherlandcontainsunique10repeatedinteractions,leadingtotheformulationof10diferent?lters.

Wefocusedourexperimenton4oftheShastry-SutherlandMagnetizationphases,asnotedinFigure2.3.Eachnode,mappedtothevisiblenodesofCRBM,representthemagnetizationspins.Theemptycirclesarerepresentedas1,and?lledcirclesarerepresentedas0.Diferentphaseproblemsproducediferent?ltersandbiases.

?AFMPhase:Anti-FerromagneticPhase.Everynon-diagonalnodeshavetheoppositespins.

?FMPhase:FerromagneticPhase.Allnodeshavethesamemagnetizationspins

?1/3FractionalPhase:therowsofthelatticeshowapatternofFMphaserowsand-wichedbetweentwoAFMphaserows

?DimerPhase:certaindiagonalsetofnodesareexpectedtobeoppositespinsofeachother(markedingreenboxes)

DetailedmappingresultoftheShastry-SutherlandtoCRBMwillbeillustratedinacomingpaperfromtheSalahuddinGroup,inaworkpioneeredbyPratikBrahma.

10

Chapter3

CRBMHardwareAccelerator

3.1Background

TheobjectiveoftheCRBMhardwareacceleratoristosigni?cantlyreducetheruntimeofreachingtheground-statesolutionofCRBM.

ThehardwaredesignisimplementedinRTL(RegisterTransferLevel)andmappedtoFieldProgrammableGateArray(FPGA).WeusedtheVirtexUltrascale+FPGAdevice(VCU118),aproductofXilinx-AMD.ThisFPGArepresentsacutting-edgesolutioninthe?eldofFPGAswith14nm/16nmFinFETprocesstechnology,dynamicpowermanagement,andintegratedGen3x16PCIeblocks.WeusethisFPGAjointlywiththeexperimentserverwith11thGenIntelCorei9-11900K@3.50GHzand135GBRAM.ToprogramtheFPGA,weuseXilinx’sVivadotools.

3.2Architecture

Thehardwarearchitecture,asdenotedinFigure3.1,mapseachstepofCRBMintorespectivehardwaremodules.NotethattherearecorrespondingmodulestothedescribedstepsinFigure2.2.

Thehardwareispipelinedwith2stages:forwardandreverse.Theforwardstagecontainslogicofsamplingfromvisiblenodes→hiddennodes(stages3.2.1to3.2.5).Thereversestagecontainslogicofsamplingfromhiddennodes→visiblenodes(stages3.2.5to3.2.9).

3.2.1.VisibleNodeRegisters

The2Dvisiblenodelayerisrepresentedinasingleregister.Asthenodevaluesarebinary,theytakeupasinglebitintheregister.ThistechniqueminimizestheLUTresourceusageontheFPGA.

11

CHAPTER3.CRBMHARDWAREACCELERATOR

Figure3.1:CRBMHardwareAcceleratorArchitecture

ThedimensionofthelatticesizeisnotedasLxL,whichnotesLrowsandLcolumnsofvisiblenodes,makingatotalofLxLvisiblenodes.Thus,thereareLxLbitsinthevisiblenoderegister.

3.2.2.Wrapper

Thewrappermodulefollowsthelogicofwrappingtechniquenotedinsection2.3.2.Ittakesinthevisiblenoderegisterandperiodicitysignalasinputs,andcopiesorzerosouttherespectivecolumnsandrowsaccordingly.

3.2.3.Convoluter-Forward

Theconvolutermoduletakesinthewrapperand?lterstoconductconvolutionlogicasnotedinsection2.3.3.The?ltervaluesareprovidedbytheuser’ssoftwareviaPCIe.

Inthisimplementation,theconvolutertakesadvantageofspatialparallelism.Itcontainsconvolutionlogicofmultiplyandaccumulateinplaceforcorrespondingpositions.Samelogiciscopiedtootherpositionsthatareseparatedinadistanceequaltothestridevalue

12

CHAPTER3.CRBMHARDWAREACCELERATOR

inalldirections.Insummary,allconvolutioncomputationiscontainedinasinglespatiallyparallelisedcombinationallogic.

3.2.4.SigmoidandLFSR-Forward

Thesigmoidmodulesaresynthesizedwiththeinputandoutputbitcountparameters,whichareusedtodeterminethelevelofprecisionoftheinputandoutput.Theinputistheresultoftheconvolution.Theoutputisthecorrespondingsigmoidvalue.Thesigmoidmoduleinternallycontainsapre-codedLUTwhichisidenticaltoadictionaryofkeyandvalue,inputandoutput.Themoduleselectstheclosestcorrespondingsigmoidvaluethatwassynthesizedwiththegivenprecisionparameters.

TheLinearFeedbackShiftRegister(LFSR)moduleissynthesizedaccordingtoasetseedvalue.Theinternalregisterinthemodule,initializedwiththeseedvalue,isshu?edeverycycletoproducerandomizedbits.TheLFSRoutputisconvertedtoavaluebetween0and1.

ForN?ltergroups,thereareNsigmoidmodulesandNLFSRmodules.ThesigmoidvalueiscomparedwiththeoutputoftheLFSRmodule.Ifthesigmoidvalueisgreater,thecorrespondinghiddennodewillcontainvalue1.Otherwise,itwillcontainvalue0.Thislogiccompletesthe?rstpipelinestage.

TheSigmoidandLFSRhardwaremodulesarepioneeredbyourformerresearchers,Saa-vanPatelandPhilipCanoza.

3.2.5.HiddenNodeRegisters

ForN?lters,thereareNhiddennodegroupsproduced.Eachofthemwillhaveadimensionofb(L+1)/stridecxb(L+1)/stridec.Thus,thehiddennoderegisterwillcontainatotalofNxb(L+1)/stridecxb(L+1)/stridecbits.

3.2.6.ZeroPadder

Thezeropaddermoduleimplementsthelogicnotedinsection2.3.6.Thehardwarekeepsanarrayofzeroswithemptyslotsforpositionsthattakeinhiddennodevalues.Thehiddennodevaluesareinsertedinaspatiallyparallelmanner.ForN?lters,thereareNzeropaddermodules.

3.2.7.Convoluter-Reverse

Theconvolutermoduleinthereversedirectionisthesamemoduleusedintheforwarddirection(section3.2.3).The?ippedweightsareinputstothismodule,whichareprovidedbytheusersoftwareviaPCIe.ForN?lters,thereareNreverseconvolutermodules.

13

CHAPTER3.CRBMHARDWAREACCELERATOR

3.2.8.Accumulator

Theoutputoftheconvolutermoduleisaccumulatedinthismodule.Astheaccumulationisdoneelement-wise,itissimpletocreateacombinationallogicthataddsupthevaluesforthesamepositionsinNgroups.Followingtheaccumulatormodule,theN?ltergroupsareaggregatedtoasinglegroup.

Moreover,visiblebiasisappliedinthismodule.Thebiasvaluesareprovidedbytheusersoftware.Weprovideanoptiontouseoddbiasandevenbias,whichallowsdiferentbiasvaluestobeappliedforoddcolumnsandevencolumns.

3.2.9.SigmoidandLFSR-Reverse

ThesigmoidandLFSRmodulesusedinthereversedirectionarethesamemodulesusedintheforwarddirection(section3.2.4).Theoutputofthesemodulesdeterminethenextvisiblenodevalues.

Thismodulecompletesthesecondpipelinestage,andcompletesafullcycleofsampling.

3.3InputandOutput(I/O)andProgrammingLogic

ThehostmachineandtheFPGAcommunicatesoverthex16PCIe.WeimplementtheInputandOutput(I/O)logicofthePCIethroughanopensourcemodulenamedXillybus.XillybusprovidesbothanFPGAIPcoreandadriverforthehostPC’soperatingsystem.ItprovidescustomizedbundlesfordiferentFPGAmodels.

Althoughourhardwareneednotcommunicatelargedatawithineachtimesteps,thehostmachineandFPGArunondiferentclockfrequencies,producingaclockdomaincross-ing.Thus,weuseaFirst-In-First-Out(FIFO)moduletoenablesequentialcommunicationbetweenthehostandFPGA.

TheXillybusdrivercreates2devices?lesoftheFPGA:oneforwritingandoneforreading.Theusersoftwarewritesandreadsthefollowingvaluestoandfromthedevice?les:

?Write(PCtoFPGA):weights,?ippedweights,biases,latticesizes(visiblelayerdi-mensions),periodicity,andclearlasthiddenrowssignal(someapplicationsrequireclampingthelastrowofhiddennodestozero)

?Read(fromFPGA):Visiblenodevaluesofeachcycle

Afterthehostmachinereadsthevisiblenodevalues,theusersoftwarecalculatestheenergyofthenodes.

14

CHAPTER3.CRBMHARDWAREACCELERATOR

Figure3.2:CRBMHardwareAcceleratorcomplexities

3.4Testing

Eachhardwaremodulewentthroughbehavioraltestin

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論