2025年深度學(xué)習(xí)揭示東亞和東南亞領(lǐng)先公司的環(huán)境、社會(huì)和治理重點(diǎn)(英)_第1頁(yè)
2025年深度學(xué)習(xí)揭示東亞和東南亞領(lǐng)先公司的環(huán)境、社會(huì)和治理重點(diǎn)(英)_第2頁(yè)
2025年深度學(xué)習(xí)揭示東亞和東南亞領(lǐng)先公司的環(huán)境、社會(huì)和治理重點(diǎn)(英)_第3頁(yè)
2025年深度學(xué)習(xí)揭示東亞和東南亞領(lǐng)先公司的環(huán)境、社會(huì)和治理重點(diǎn)(英)_第4頁(yè)
2025年深度學(xué)習(xí)揭示東亞和東南亞領(lǐng)先公司的環(huán)境、社會(huì)和治理重點(diǎn)(英)_第5頁(yè)
已閱讀5頁(yè),還剩38頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

THEENVIRONMENTAL,SOCIAL,ANDGOVERNANCEEMPHASISOFLEADINGCOMPANIESINEASTASIAANDSOUTHEASTASIAUNVEILEDBYDEEPLEARNING

ChaoLi,AlexanderRyotaKeeley,ShunsukeManagi,andSatoruYamadera

NO.791

July2025

ADBECONOMICSWORKINGPAPERSERIES

ASIANDEVELOPMENTBANK

ADBEconomicsWorkingPaperSeries

TheEnvironmental,Social,andGovernanceEmphasisofLeadingCompaniesinEastAsiaandSoutheastAsiaUnveiledbyDeepLearning

ChaoLi,AlexanderRyotaKeeley,ShunsukeManagi,andSatoruYamadera

TheADBEconomicsWorkingPaperSeriespresentsresearchinprogresstoelicitcommentsandencouragedebateondevelopmentissuesinAsiaandthePacific.TheviewsexpressedarethoseoftheauthorsanddonotnecessarilyreflecttheviewsandpoliciesofADBor

itsBoardofGovernorsorthegovernmentstheyrepresent.

No.791|July2025

ChaoLi(li.chao.711@m.kyushu-u.ac.jp)isanassistantprofessoratKyushuUniversityandchiefdatascientistofaiESG,Inc.AlexanderRyotaKeeley(keeley.ryota.alexander.416@m.kyushu-u.ac.jp)

isanassociateprofessoratKyushuUniversity

andchiefresearcherofaiESG,Inc.ShunsukeManagi(managi@doc.kyushu-u.ac.jp)isadistinguishedprofessoratKyushuUniversityandpresidentofaiESG,Inc.SatoruYamadera(satoru.yamadera@)isaformerstaffattheEconomicResearch

andDevelopmentImpactDepartment,AsianDevelopmentBank.

ASIANDEVELOPMENTBANK

CreativeCommonsAttribution3.0IGOlicense(CCBY3.0IGO)

?2025AsianDevelopmentBank

6ADBAvenue,MandaluyongCity,1550MetroManila,PhilippinesTel+63286324444;Fax+63286362444

Somerightsreserved.Publishedin2025.ISSN2313-6537(print),2313-6545(PDF)

PublicationStockNo.WPS250262-2

DOI:

/10.22617/WPS250262-2

TheviewsexpressedinthispublicationarethoseoftheauthorsanddonotnecessarilyreflecttheviewsandpoliciesoftheAsianDevelopmentBank(ADB)oritsBoardofGovernorsorthegovernmentstheyrepresent.

ADBdoesnotguaranteetheaccuracyofthedataincludedinthispublicationandacceptsnoresponsibilityforanyconsequenceoftheiruse.ThementionofspecificcompaniesorproductsofmanufacturersdoesnotimplythattheyareendorsedorrecommendedbyADBinpreferencetoothersofasimilarnaturethatarenotmentioned.

Bymakinganydesignationoforreferencetoaparticularterritoryorgeographicareainthisdocument,ADBdoesnotintendtomakeanyjudgmentsastothelegalorotherstatusofanyterritoryorarea.

ThispublicationisavailableundertheCreativeCommonsAttribution3.0IGOlicense(CCBY3.0IGO)/licenses/by/3.0/igo/.Byusingthecontentofthispublication,youagreetobeboundbythetermsofthislicense.Forattribution,translations,adaptations,andpermissions,pleasereadtheprovisionsandtermsofuseathttps://

/terms-use#openaccess.

ThisCClicensedoesnotapplytonon-ADBcopyrightmaterialsinthispublication.Ifthematerialisattributedtoanothersource,pleasecontactthecopyrightownerorpublisherofthatsourceforpermissiontoreproduceit.ADBcannotbeheldliableforanyclaimsthatariseasaresultofyouruseofthematerial.

Pleasecontact

pubsmarketing@

ifyouhavequestionsorcommentswithrespecttocontent,orifyouwishtoobtaincopyrightpermissionforyourintendedusethatdoesnotfallwithintheseterms,orforpermissiontousetheADBlogo.

CorrigendatoADBpublicationsmaybefoundat

/publications/corrigenda.

Note:

ADBrecognizes“China”asthePeople’sRepublicofChinaand“SouthKorea”astheRepublicofKorea.

ABSTRACT

Environmental,social,andgovernance(ESG)considerationsarebecomingincreasinglyvitalincorporatedecision-making,especiallyforglobalcompanies,andinevaluatingcorporateperformanceandinvestments.ThisstudyexaminestheESGtendenciesofthecompanieswiththelargestmarketvaluesineightEastAsianandSoutheastAsiancountriesthroughtheanalysisof480corporatereportspublishedin2023.OurfindingsrevealthatamongthevariousESGtopics,economicsandgovernanceriskwerethemostfrequentlymentionedinthecorporatedisclosurereports,thoughsignificantvariationsexistacrosstheregion.

Keywords:environmental,social,andgovernance;corporatereport;pre-trainedtransformer;deeplearning;EastAsiaandSoutheastAsia

JELcodes:G30,M14,O16,Q56

Introduction

Theincreasingurgencyofaddressingglobalenvironmentalchallenges,socialinequalities,andgovernanceissueshasbroughtenvironmental,social,andgovernance(ESG)considerationstotheforefrontofcorporateandsocietalagendas(Rahman,Zahid,andAl-Faryan2023;Zhou,Liu,andLuo2022).Withgrowingawarenessofclimatechange(IPCC2022),resourcedepletion(Wu,Palm-Forster,andMesser2021),andsocietaldemandsforaccountability(Alshehhi,Nobanee,andKhare2018;Zhou,Liu,andLuo2022),ESGdisclosuremechanisms—suchasthoseproposedbytheTaskForceonClimate-relatedFinancialDisclosureandtheTaskforceonNature-relatedFinancialDisclosure—haveemergedascomprehensiveframeworkstoevaluatecorporateperformancebeyondfinancialmetrics.Inrecentyears,theinternationalcommunityandgovernmentsaroundtheworldhavepaidincreasingattentiontoissuessuchasgreenhousegasemissions(Gulletal.2023),airpollution(Lelieveldetal.2015),humanrights(Schrempf-StirlingandWettstein2017),andworkenvironments(Furmanetal.2019).Detailed,multifacetedandrelevantcontenthasbeguntoappearinmanykeycompanyreports(Lietal.2024b;Mehra,Louka,andZhang2022),whichoftenplayanimportantroleinconveyinginformationtovariousstakeholders,includinginvestors,customers,employees,andthegeneralpublic.WhileESGdisclosuresarebecomingcritical,theunderstandingofcorporateprioritiesandthedifferencesacrossregionsremainsunderexplored,particularlyindynamicanddiverseregionssuchasEastAsiaandSoutheastAsia.

Corporatereportsareessentialtoolsforcompaniestocommunicatetheirstrategies,priorities,andperformancetostakeholders(Ramzan,Amin,andAbbas2021;StantonandStanton2002).Amongthese,annualandintegratedreportsserveasvaluableresources,offeringcomprehensiveinsightsintoacompany’slong-termcommitmentsandoperationalfocus(Ramzan,Amin,andAbbas2021).Ifmostofthereport’scontentisrelatedtoacertainESG-relatedtopic,itsuggeststhatthecreatorofthereport,specificallythefirm,attachesgreatimportancetothistopic

PAGE

10

(Baieretal.2020,Lietal.2024b).Basedonthetextualanalysis,interpretingreportsandanalyzingcompanytendenciesisahottopicincurrentresearch(Baier,Berninger,andKiesel2020;Goloshchapovaetal.2019;LandrumandOhsowski2018;Lietal.2024b).However,sincethesereportsarelongandrequireacertainlevelofbackgroundknowledgetointerpret(Bodnaruk,Loughran,andMcDonald2015;LoughranandMcDonald2016),itisdifficulttoconductalargenumberofefficientandobjectiveanalysesbyhumans.Withtherapiddevelopmentofscienceandtechnologyinrecentyears,deeplearninghasmadethislarge-scaletextanalysisefficientandaccurate(Lietal.2024a,2024b).Thisstudyleveragesnaturallanguageprocessinganddeeplearningtechniquestouncoverpatternsinthesereports,analyzing480documentsfrom293leadingcompaniesineightEastAsianandSoutheastAsiancountries—thePeople’sRepublicofChina(PRC),Japan,theRepublicofKorea,Indonesia,Malaysia,thePhilippines,Singapore,andThailand.Thesecountriesrepresentadiversemixofdevelopedandemergingeconomieswithdifferentregulatoryenvironments,economicstructures,andESGchallenges(I??ketal.2024).TheadvancedTextMatchPre-TrainedTransformer(TMPT)modelisemployedtoassesstherelationshipbetweentextualcontentinthesereportsandthepredefinedESGtopics,offeringarobustandscalableapproachtotextualanalyses.

TheTMPTmodelrepresentsasignificantadvancementinnaturallanguageprocessingapplicationsforESGanalysis(Lietal.2024a).Unliketraditionalmethods,suchaskeywordcounting(Baier,Berninger,andKiesel2020;LokuwadugeandHeenetigala2017;Loughran,McDonald,andYun2009)ormanualclassification(GilesandMurphy2016,TillingandTilt2010),TMPTleveragesthepoweroftransformer-basedarchitectures(Vaswanietal.2017b)tounderstandthesemanticrelationshipswithintextualdata.Thiszero-shotlearningmodel,trainedonmultilingualdatasetsincludingWikipediaentriesandacademicabstracts,allowsforobjectiveandconsistentanalysisacrossdiverselanguagesandcontexts.BysegmentingreportsintosmallerfragmentsandevaluatingtheirrelevancetospecificESGtopics(Lietal.2024b),the

modelprovidesagranularviewofcorporatefocusareas.Inpreviousstudies,thelastversionoftheTMPTmodelonlyprocessedEnglishlanguagedataandhadrelativelylowaccuracy(Lietal.2024a).Inthisstudy,weretainedthearchitectureoftheTMPTmodelfrompreviousstudiesbutincreasedthemodelsizeandconductedlarger-scaletrainingtoaddressproblemsofthepreviousgenerationmodel.

ThisstudycontributestothegrowingliteratureonESGbyprovidingadetailed,data-drivenanalysisofcorporatetendenciesinEastAsiaandSoutheastAsia.Ithighlightsthepotentialofadvancedartificialintelligence(AI)methodologiessuchasTMPTinuncoveringnuancedpatternsinunstructuredtextualdata.Furthermore,thefindingsunderscoretheneedfortailoredstrategiestoaddressregionalandindustry-specificESGchallenges.AsESGconsiderationscontinuetoshapecorporatestrategiesandinvestordecisions,understandingtheseregionaldynamicsbecomesincreasinglycritical.BybridgingthegapbetweentextualdataanalysisandESGinsights,thisstudynotonlyenhancesourunderstandingofcorporatebehaviorandcommunicationbutalsoprovidesafoundationforfutureresearchonAItextualdataanalysis.Itopensavenuesforexploringtemporaltrendsandcross-industryandcross-countrycomparisons,aswellastheimpactofregulatorychangesoncorporateESGpriorities.Ultimately,thisresearchdemonstratesthetransformativepotentialofAItoshapesustainablebusinesspracticesandinformpolicydecisionsinarapidlyevolvinggloballandscape.

MaterialsandMethodology

Materials

ThisstudyaimstoinvestigatetheESGconsiderationsofthetopfirmsineightEastAsianandSoutheastAsiancountriestowards13ESG-relatedtopicsbasedonthecompanies’annualandintegratedreports.Thetopcompaniesineachcountryaredefinedinthispaperasthelistedcompanieswiththelargestmarketcapitalizationinthestockmarketofthatcountry.TheeightcountriesarethePRC,Japan,theRepublicofKorea,Indonesia,Malaysia,thePhilippines,

Singapore,andThailand.InthePRC,Japan,andtheRepublicofKorea,thetop50companiesbymarketcapitalizationareincludedinthestudy,whilefortheremainingcountries—Indonesia,Malaysia,thePhilippines,Singapore,andThailand—thetop30firmsarestudied.Therefore,intotal,300companiescompriseoursample.

Thecorporatedisclosurereportisacriticaltoolforcommunicatingwiththepublicandstakeholders,andcompanies’emphasiscanbedetectedthroughthelanguageandtextualcontentsintheirreports(Baier,Berninger,andKiesel2020;Lietal.2024b).Listedcompaniesmustpublishanannualreportthatexplainsdetailedinformationabouttheirfinancialperformanceandactivitiesovertheprecedingyear,includingthebalancesheet,incomestatement,andcashflowstatement,alongwithmanagement'sanalysisanddiscussions.Theprimaryfocusisonpresentingthecompany'sfinancialhealthandoperationaloutcomes,whichisimportantbecauseitdisclosescomprehensiveandstrategiccontent(Baier,Berninger,andKiesel2020;Lietal.2024b).Inaddition,listedcompaniesmaypublishanintegratedreportthatgoesbeyondtraditionalfinancialreportingbycombiningfinancialdatawithnonfinancialinformation—suchastheirstrategy,governance,andprospectswithregardtotheexternalenvironment—toofferaholisticviewofhowthecompanycreatesvalueovertime.Tobuildthereportdataset,wefocusedonannualreportsandintegratedreports.Weattemptedtodownloadtwolanguagereportsforeachcompany:oneinEnglishandoneinthelocallanguage.Inourdataset,480reportswereobtainedandanalyzed.TheESG-relatedtopicsofinterest,whicharedefinedtobeconsistentwithpreviousstudies(Lietal.2024a,2024b),includehumanrights,governancerisk,greenhousegases,safetyandhealth,miningconsumption,community,domesticjobcreation,domesticrefluxrate,productioncost,waterconsumption,airpollution,economicrippleeffect,andworkenvironment.

Methodology

BasicLogic

CompaniesgivegreaterconsiderationtoESGtopicsthattheyfrequentlyreferto(ormentionrelevantphrases)intheirreportsthantothosethatarenevermentioned(Baier,Berninger,andKiesel2020;Lietal.2024b;Mehra,Louka,andZhang2022).Simplyput,thisanalysisexaminestheextenttowhichacompany’sreportingisrelevanttoaparticularESGtopic.Themoststraightforwardmethodistocountthenumberoftimesanexpressionofatopicappears(Baier,Berninger,andKiesel2020).Duetothediversityofexpressions,simpleappearancecountsdonotperformwell.Hence,powerfulAImodels,suchasanESGClassificationPre-TrainedTransformer(CPT)(Lietal.2024b)andESGBidirectionalEncoderRepresentationsfromTransformers(Mehra,Louka,andZhang2022),aregainingemphasisinbothacademiaandindustriestodeterminewhetherapieceoftextcontentisrelatedtoESG.Sinceanannualreportislongandcontainsthousandsofwords,itisusuallydividedintomultiplepieces(Lietal.2024b).TheCPTproposesaprocessingmethod:Startingfromthebeginningofacertainreport,ittakesacertainnumberofwordseachtimeandthenmovesonewordforwarduntilitreachestheendofthereport(Lietal.2024b).Thenumberofpiecesobtainedinthiswayisalmostthesameasthenumberofwordsinthereport.Then,theCPTdeterminestherelatednessbetweeneachpieceofcontentandacertainESGtopicandaveragestherelatednessasthefinalscoreofanESGtopicinacertainreport.Webasicallyadoptthisprocesstoconductthetextualanalyses.Inotherwords,ouranalysisassumesthattheemphasisonanESGtopicinareportdependsontherelevanceofalltextsectionstothattopic.

DataPreprocessing

Datapreprocessingincludesfoursteps:(i)textextraction,(ii)tokenization,(iii)conversion,and(iv)fragmentation.TextextractionisnecessarybecausePDFfilescannotbeanalyzeddirectly.ThefirststepistoextractthetextualcontentfromthePDFdocuments—theoutputof

certainreportsischaracterstringdata.Second,eachcharacterstringistokenizedasalistoftokens.Specifically,tokenizationistheprocessofdividingtextintodiscreteunits(i.e.,tokens).Inthisstudy,weuseabert-multilingual-casedtokenizertocompletethetokenizationtask(Devlinetal.2018).Thebert-multilingual-casedtokenizeremployssub-wordtokenizationtechnology.Forinstance,“rights”wouldbedividedinto“right”and“##s”.Technically,thetokenizationoutputisalistofstrings.Third,thetokensareconvertedintothetokenIDs,whichareabatchofintegers,accordingtothetokenizer’sdictionary.Itshouldbenotedthatthecurrentdeep-learningmodelsareonlyabletocomputenumbersratherthancharacter-basedtokens.Here,asinglereportisconvertedfromPDFintoanarrayofintegers.Thefourthstepisfragmentation—thatis,thearrayofintegerswillbebrokenintoseveralpiecesofequallength.ThesefragmentsaretheinputoftheTMPTmodel.Similarly,the13ESG-relatedtopicsalsorequiredatapreprocessing,butfragmentationisnotneededforthesetopics.

TextMatchPre-TrainedTransformer

TheTMPTisadeep-learningmodelusedtoestimatethetextualrelatednessbetweenapieceoftextandatopic(Lietal.2024a).Technically,TMPTisaSiamesenetworkwithtwoencodersandamatcherasshowninFigure1.Oneencoderisfortextualcontextfrommaterials,whiletheotherisresponsiblefortheinputtopic.Thetwoencoders’corepartsarethestackingstandardtransformerblock(Vaswanietal.2017b).Thecontextencoder’sinputislongerthanthetopicencoder’s.Theinputlengthofthecontentencoderis512tokens,whilethelengthofthetopicencoderis16tokens.Itsoutputrangesfrom0,representingnorelevancebetweentwoinputs,to1,representingcompleterelatedness.TMPTisazero-shotlearningmodeltrainedbyalargeamountofacademicarticles.PreviousstudiesemployEnglish-orientedTMPTtoanalyzetherelatednessbetweenmillionsofnewsandESG-relatedtopicsefficientlyandeffectively(Lietal.2024a).Inthisstudy,weusetheadvancedTMPTstrengthenedbyalargenumberofmultilingualacademicpapersandWikipediaentries.TheadvancedTMPTthathas519,939,541

parametersislargerthantheEnglish-orientedversionwith138,843,049parameters(Lietal.2024a).Thearchitectureofthetwoversionsisthesame,butthenumberoflayersisdifferent.

Thecontentencoderconsistsofsixcomponents.First,thecontentinputlayeristheinputreceiverofthecontentencoder.Second,leveragingthebert-base-multilingual-cased(BERT)model,thecontentBERTembeddingpartpositionsvocabularywordsinalower-dimensionalspace,bringingsemanticallysimilarwordsclosertogether(Devlinetal.2018,Vaswanietal.2017a).Thisembeddinglayerworksina768-dimensionalsemanticspace,representingeachencodedtokenasa768-elementvector.Sincetransformerblocksdonotinherentlyhandlepositionalinformation,theembeddinglayeralsointegratespositionaldata.Asaresult,theembeddinglayerproducesatensorofshape512×768,whichthenpassesthroughninestandardtransformerblocks(Vaswanietal.2017a)inthethirdpart:thecontent-densetransformer-block-stackingpart.Figure2illustratesthearchitectureofastandardtransformerblock,composedofaself-attentionmechanismandafeedforwardneuralnetwork.Oftheninetransformerblocksused,theinitialsixaretakendirectlyfromtheBERTmodel(Devlinetal.2018),whiletheremainingthreearenewlyintroducedandinitializedwithinourownframework.Fourth,thecontentpoolingandflatteningpartincludesaglobalaveragepoolinglayerandaflattenedlayer.Thepoolinglayer’spoolingsizeis4,andafterpooling,thetensorshapeis4×768.Theflatteningoperationtransformsthetensorintoaone-dimensionaltensorcontaining3,072elements.Fifth,thetransformedtensorisroutedthroughsixcontent-fullresidualconnectionmodules,asdepictedinFigure2.IncorporatingtheseparagraphresidualconnectionsenhancestheTMPTmodel’srepresentationalpower.Moreover,byintroducingresidualpathsinsteadofstackingdenselayersdirectly,wemitigatethecommonissueofvanishinggradientsoftenencounteredindeepnetworks(Heetal.2021).Sixth,adenselayerthenreducesthevector’sdimensionalityfrom3,072to768.Altogether,thecontentencodercontains302,990,592parameters.

Thetopicencoderencodesa16-tokentopicintoa768-dimensiontensor,andithasfivecomponents.Thetopicinputlayer,topicBERTembeddingpart,andtopic-densetransformer-block-stackingpartfollowthesameconfigurationasthecontentencoder,differingonlyintheinputlength.Inthetopicpoolingandflatteningpart,thepoolinglayerusesapoolingsizeof1,resultinginanoutputwiththeshape1×768.Thesubsequentflatteningstepthenreshapesthisoutputintoa768-elementvector.Beforeandafterthetopicfullresidualconnectionpart,thetensor’sshaperemainsunchanged.Consequently,thetopicencoderdoesnotrequireanadditionaldenselayerfordimensionalityadjustment.Itisalsoimportanttonotethatthetwoencodersdonotshareparameters.Thetopicencodercomprisesatotalof210,045,012parameters.Thematcherisasimplesix-partmultilayerneuralnetwork.First,thereductionlayercomputestheelement-wisedifferencebetweentwofeaturevectors,producinga768-dimensionaloutput.Next,this768-dimensionaltensorisprocessedbythreeconnectionblocks,asshowninFigure3.Afterward,foursuccessivedenselayersgraduallyreducethedimensionalityuntilthefinaloutputisasinglescalarvalue.Intotal,thematcherconsistsof2,903,937parameters.

Inourmodel,alldropoutlayersusea10%dropoutratetoreduceoverfitting(Srivastavaetal.2014).Exceptforthefinaldimension-reductionlayer,everydenselayerusesarectifiedlinearunitasitsactivationfunction;thefinallayeremploysasigmoidactivation.Altogether,theTMPTmodelcontains519,939,541trainableparameters.Weemployabinarycross-entropylossfunctionfortrainingbecauseourmodelisforabinaryclassificationtask.WeimplementandtrainthemodelthroughTensorFlow2.12.0,withamemoryfootprintof1.92GB.ThemodelistrainedonfourNVIDIAA10040GBGPUs,andinlightofGPUmemoryconstraintsandmodelsize,wesetthebatchsizeto64.

TrainingDatasetforTMPT

Akeycapabilityofourmodeliszero-shotlearning(Pourpanahetal.2023),makingitcrucialtoexposethemodeltoabundantanddiversetrainingdata(Xuetal.2020).Tothatend,wecompileadatasetfromtwomainsources:Wikipediaentriesandacademicabstractspairedwithkeywords.TheWikipediadataset,madeavailablethroughHuggingFace,isversion1.2.0andencompassescontentin329languagesfromthe1September2023dump.

1

Foracademicdata,wefocusonabstractsandkeywordsinEnglish,Chinese,andJapanese.Englishcontentisdrawnfromhigh-impactElsevierjournals,ChinesedatafromtheChinaNationalKnowledgeInfrastructure,andJapanesematerialsfromJ-stage.ThereareseveralreasonswhyWikipediaentriesandacademicarticlesserveasexcellenttrainingmaterialsformultilanguageTMPT.First,theycoveravastarrayoftopics.Second,thesetextsaregenerallycoherent,well-structured,richininformation,andhigh-quality.Third,thekeywordsassociatedwiththesetextsactasconcise,preciselabelslinkingbacktotheirabstractsorexplanatorypassages.Fourth,theirlengthsaretypicallysuitableandthuspracticalfortraining.

Webeginwithabout75millionWikipediaentries.DuetoBERTtokenizerlimitations(i.e.,only103languagesaresupported),wediscardentriesinunsupportedlanguages,leavingapproximately74millionusableentries.Eachentry’sexplanatorysectionistreatedastheinputparagraph,anditskeywordsastheinputkeyword.Whenaninputparagraphandkeywordoriginatefromthesameentry,weassignalabelof1;otherwise,thelabelis0.Topreventthemodelfromexploitingeasyshortcuts,weomitthefirst10tokensofeachparagraphsincetheyoftenmirrortheentry’skeyword.Wealsoformnegativeinstancesbypairingeachexplanatorysectionwitharandomkeywordinthesamelanguagebutfromadifferententry,resultinginroughly148milliontotalsamplesderivedfromWikipedia.Additionally,wegather1.47million

1HuggingFace.Datasets–Graelo.

https://huggingface.co/datasets/graelo/wikipedia

(accessed1September2023).

Englisharticles,128thousandChinesearticles,and11thousandJapanesearticles.Giventhateacharticletypicallycomeswithseveralkeywords,thissubsetyieldsabout11millionsamples.Altogether,thecombineddatasetreachesabout159millionsamples.Wethensplittheentiredatasetintotraining(98%)andtesting(2%)sets.Notably,wereserve2%oftheacademicarticlesineachlanguageasuntouchedtestdata,ensuringthataportionofthedataremainscompletelyunseenbythemodeluntilthefinalevaluationphase.

AccuracyMetrics

Inthisstudy,wemainlyfocusonthreeaccuracymetrics:accuracy,precision,andrecall.AccuracyisameasureoftheoverallcorrectnessoftheTMPTmodelacrossallclassificationsbasedonthetestingsets.Itrepresentstheratioofcorrectpredictionstothetotalnumberofcasesexamined.Specifically,fortheTMPTmodel,accuracyindicatesthemodel’seffectivenessincorrectlyidentifyingfragmentsoftextasrelevantornotrelevanttospecifictopics.Inthisarticle,highaccuracyinthiscontextsuggeststhattheTMPTmodelisreliableindistinguishingbetweenrelevantandnonrelevantESGcontentacrossdifferentlanguagesandcorporatecontexts.

Precisionmeasurestheaccuracyofpositivepredictions,whichisrelevantjudgmentsfordatapairs.Itisdefinedastheratioofcorrectlypredictedrelevantdatapairstothetotalnumberofrelevantpredictionsmade.Inotherwords,high-accuracymodelsdonotaggressivelyandgreedilylabelthedatapairasrelevant.Specifically,precisionreflectstheTMPTmodel’sabilitytoaccuratelyidentifyreportfragmentsthataretrulyrelevanttoanESGtopicwithoutmistakenlycategorizingnonrelevantfragmentsasrelevant.Highprecisionensuresthatthemodel’sassessmentsofESGrelevancearetrustworthy,minimizingtheriskoffalsealarmswherenonrelevantcontentisflaggedassignificant.

Recallindicatesthemodel’sabilitytoidentifyallactuallyrelevantdatapairs.Itiscalculatedastheratioofcorrectlypredictedrelevantdatapairstothetotalnumberofrealrelateddatapairs.Here,recallassessestheTMPTmodel’scapacitytocaptureallfragmentsoftextthatarerelevant

toaparticularESGtopic.AhighrecallrateiscrucialforcomprehensiveESGreportanalysis,ensuringthatnopertinentinformationisoverlooked,especiallywhenassessingcompliancewithESGstandardsorunderstandingthefullscopeofacompany'sESGinitiatives.

Sincethedatasampledintherealworldisnotuniformacrosslanguages,themodeldoesnotperformexactlythesameacrosslanguages.Englishhasthemostabundantdata,soitusuallyperformsbetter.ThediversityofourtestdataensuresthegeneralcapabilitiesofTMPT.FortheESG-relatedinvestigation,theTMPTabilityshouldbebetterthantheresultsonthetestingsets.

Environmental,Social,andGovernanceEmphasisAnalyses

Eachreportwouldbedata-preprocessed,convertingPDFfilesintoabatchoffragmentsoftokenIDs.Theprocesscouldbeexpressedasfollows:

????????????????????????????????????????=????????????(????????????????????????????) (1)where????????????????????????????representsacertainreport????;????????????r

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論