數(shù)據(jù)挖掘外文翻譯參考文獻(xiàn)_第1頁(yè)
數(shù)據(jù)挖掘外文翻譯參考文獻(xiàn)_第2頁(yè)
數(shù)據(jù)挖掘外文翻譯參考文獻(xiàn)_第3頁(yè)
數(shù)據(jù)挖掘外文翻譯參考文獻(xiàn)_第4頁(yè)
數(shù)據(jù)挖掘外文翻譯參考文獻(xiàn)_第5頁(yè)
已閱讀5頁(yè),還剩11頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

數(shù)據(jù)挖掘外文翻譯參考文獻(xiàn)數(shù)據(jù)挖掘外文翻譯參考文獻(xiàn)(文檔含中英文對(duì)照即英文原文和中文翻譯)外文:WhatisDataMining?Simplystated,dataminingreferstoextractingor“mining”knowledgefromlargeamountsofdata.Thetermisactuallyamisnomer.Rememberthattheminingofgoldfromrocksorsandisreferredtoasgoldminingratherthanrockorsandmining.Thus,“datamining”shouldhavebeenmoreappropriatelynamed“knowledgeminingfromdata”,whichisunfortunatelysomewhatlong.“Knowledgemining”,ashorterterm,maynotreflecttheemphasisonminingfromlargeamountsofdata.Nevertheless,miningisavividtermcharacterizingtheprocessthatfindsasmallsetofpreciousnuggetsfromagreatdealofrawmaterial.Thus,suchamisnomerwhichcarriesboth“data”and“mining”becameapopularchoice.Therearemanyothertermscarryingasimilarorslightlydifferentmeaningtodatamining,suchasknowledgeminingfromdatabases,knowledgeextraction,data/patternanalysis,dataarchaeology,anddatadredging.Manypeopletreatdataminingasasynonymforanotherpopularlyusedterm,“KnowledgeDiscoveryinDatabases”,orKDD.Alternatively,othersviewdataminingassimplyanessentialstepintheprocessofknowledgediscoveryindatabases.Knowledgediscoveryconsistsofaniterativesequenceofthefollowingsteps:·datacleaning:toremovenoiseorirrelevantdata,·dataintegration:wheremultipledatasourcesmaybecombined,·dataselection:wheredatarelevanttotheanalysistaskareretrievedfromthedatabase,·datatransformation:wheredataaretransformedorconsolidatedintoformsappropriateforminingbyperformingsummaryoraggregationoperations,forinstance,·datamining:anessentialprocesswhereintelligentmethodsareappliedinordertoextractdatapatterns,·patternevaluation:toidentifythetrulyinterestingpatternsrepresentingknowledgebasedonsomeinterestingnessmeasures,and·knowledgepresentation:wherevisualizationandknowledgerepresentationtechniquesareusedtopresenttheminedknowledgetotheuser.Thedataminingstepmayinteractwiththeuseroraknowledgebase.Theinterestingpatternsarepresentedtotheuser,andmaybestoredasnewknowledgeintheknowledgebase.Notethataccordingtothisview,dataminingisonlyonestepintheentireprocess,albeitanessentialonesinceituncovershiddenpatternsforevaluation.Weagreethatdataminingisaknowledgediscoveryprocess.However,inindustry,inmedia,andinthedatabaseresearchmilieu,theterm“datamining”isbecomingmorepopularthanthelongertermof“knowledgediscoveryindatabases”.Therefore,inthisbook,wechoosetousetheterm“datamining”.Weadoptabroadviewofdataminingfunctionality:dataminingistheprocessofdiscoveringinterestingknowledgefromlargeamountsofdatastoredeitherindatabases,datawarehouses,orotherinformationrepositories.Basedonthisview,thearchitectureofatypicaldataminingsystemmayhavethefollowingmajorcomponents:1.Database,datawarehouse,orotherinformationrepository.Thisisoneorasetofdatabases,datawarehouses,spreadsheets,orotherkindsofinformationrepositories.Datacleaninganddataintegrationtechniquesmaybeperformedonthedata.2.Databaseordatawarehouseserver.Thedatabaseordatawarehouseserverisresponsibleforfetchingtherelevantdata,basedontheuser’sdataminingrequest.3.Knowledgebase.Thisisthedomainknowledgethatisusedtoguidethesearch,orevaluatetheinterestingnessofresultingpatterns.Suchknowledgecanincludeconcepthierarchies,usedtoorganizeattributesorattributevaluesintodifferentlevelsofabstraction.Knowledgesuchasuserbeliefs,whichcanbeusedtoassessapattern’sinterestingnessbasedonitsunexpectedness,mayalsobeincluded.Otherexamplesofdomainknowledgeareadditionalinterestingnessconstraintsorthresholds,andmetadata(e.g.,describingdatafrommultipleheterogeneoussources).4.Dataminingengine.Thisisessentialtothedataminingsystemandideallyconsistsofasetoffunctionalmodulesfortaskssuchascharacterization,associationanalysis,classification,evolutionanddeviationanalysis.5.Patternevaluationmodule.Thiscomponenttypicallyemploysinterestingnessmeasuresandinteractswiththedataminingmodulessoastofocusthesearchtowardsinterestingpatterns.Itmayaccessinterestingnessthresholdsstoredintheknowledgebase.Alternatively,thepatternevaluationmodulemaybeintegratedwiththeminingmodule,dependingontheimplementationofthedataminingmethodused.Forefficientdatamining,itishighlyrecommendedtopushtheevaluationofpatterninterestingnessasdeepaspossibleintotheminingprocesssoastoconfinethesearchtoonlytheinterestingpatterns.6.Graphicaluserinterface.Thismodulecommunicatesbetweenusersandthedataminingsystem,allowingtheusertointeractwiththesystembyspecifyingadataminingqueryortask,providinginformationtohelpfocusthesearch,andperformingexploratorydataminingbasedontheintermediatedataminingresults.Inaddition,thiscomponentallowstheusertobrowsedatabaseanddatawarehouseschemasordatastructures,evaluateminedpatterns,andvisualizethepatternsindifferentforms.Fromadatawarehouseperspective,dataminingcanbeviewedasanadvancedstageofon-1ineanalyticalprocessing(OLAP).However,datamininggoesfarbeyondthenarrowscopeofsummarization-styleanalyticalprocessingofdatawarehousesystemsbyincorporatingmoreadvancedtechniquesfordataunderstanding.Whiletheremaybemany“dataminingsystems”onthemarket,notallofthemcanperformtruedatamining.Adataanalysissystemthatdoesnothandlelargeamountsofdatacanatmostbecategorizedasamachinelearningsystem,astatisticaldataanalysistool,oranexperimentalsystemprototype.Asystemthatcanonlyperformdataorinformationretrieval,includingfindingaggregatevalues,orthatperformsdeductivequeryansweringinlargedatabasesshouldbemoreappropriatelycategorizedaseitheradatabasesystem,aninformationretrievalsystem,oradeductivedatabasesystem.Datamininginvolvesanintegrationoftechniquesfrommult1pledisciplinessuchasdatabasetechnology,statistics,machinelearning,highperformancecomputing,patternrecognition,neuralnetworks,datavisualization,informationretrieval,imageandsignalprocessing,andspatialdataanalysis.Weadoptadatabaseperspectiveinourpresentationofdatamininginthisbook.Thatis,emphasisisplacedonefficientandscalabledataminingtechniquesforlargedatabases.Byperformingdatamining,interestingknowledge,regularities,orhigh-levelinformationcanbeextractedfromdatabasesandviewedorbrowsedfromdifferentangles.Thediscoveredknowledgecanbeappliedtodecisionmaking,processcontrol,informationmanagement,queryprocessing,andsoon.Therefore,dataminingisconsideredasoneofthemostimportantfrontiersindatabasesystemsandoneofthemostpromising,newdatabaseapplicationsintheinformationindustry.AclassificationofdataminingsystemsDataminingisaninterdisciplinaryfield,theconfluenceofasetofdisciplines,includingdatabasesystems,statistics,machinelearning,visualization,andinformationscience.Moreover,dependingonthedataminingapproachused,techniquesfromotherdisciplinesmaybeapplied,suchasneuralnetworks,fuzzyandorroughsettheory,knowledgerepresentation,inductivelogicprogramming,orhighperformancecomputing.Dependingonthekindsofdatatobeminedoronthegivendataminingapplication,thedataminingsystemmayalsointegratetechniquesfromspatialdataanalysis,Informationretrieval,patternrecognition,imageanalysis,signalprocessing,computergraphics,Webtechnology,economics,orpsychology.Becauseofthediversityofdisciplinescontributingtodatamining,dataminingresearchisexpectedtogeneratealargevarietyofdataminingsystems.Therefore,itisnecessarytoprovideaclearclassificationofdataminingsystems.Suchaclassificationmayhelppotentialusersdistinguishdataminingsystemsandidentifythosethatbestmatchtheirneeds.Dataminingsystemscanbecategorizedaccordingtovariouscriteria,asfollows.1)Classificationaccordingtothekindsofdatabasesmined.Adataminingsystemcanbeclassifiedaccordingtothekindsofdatabasesmined.Databasesystemsthemselvescanbeclassifiedaccordingtodifferentcriteria(suchasdatamodels,orthetypesofdataorapplicationsinvolved),eachofwhichmayrequireitsowndataminingtechnique.Dataminingsystemscanthereforebeclassifiedaccordingly.Forinstance,ifclassifyingaccordingtodatamodels,wemayhavearelational,transactional,object-oriented,object-relational,ordatawarehouseminingsystem.Ifclassifyingaccordingtothespecialtypesofdatahandled,wemayhaveaspatial,time-series,text,ormultimediadataminingsystem,oraWorld-WideWebminingsystem.Othersystemtypesincludeheterogeneousdataminingsystems,andlegacydataminingsystems.2)Classificationaccordingtothekindsofknowledgemined.Dataminingsystemscanbecategorizedaccordingtothekindsofknowledgetheymine,i.e.,basedondataminingfunctionalities,suchascharacterization,discrimination,association,classification,clustering,trendandevolutionanalysis,deviationanalysis,similarityanalysis,etc.Acomprehensivedataminingsystemusuallyprovidesmultipleand/orintegrateddataminingfunctionalities.Moreover,dataminingsystemscanalsobedistinguishedbasedonthegranularityorlevelsofabstractionoftheknowledgemined,includinggeneralizedknowledge(atahighlevelofabstraction),primitive-levelknowledge(atarawdatalevel),orknowledgeatmultiplelevels(consideringseverallevelsofabstraction).Anadvanceddataminingsystemshouldfacilitatethediscoveryofknowledgeatmultiplelevelsofabstraction.3)Classificationaccordingtothekindsoftechniquesutilized.Dataminingsystemscanalsobecategorizedaccordingtotheunderlyingdataminingtechniquesemployed.Thesetechniquescanbedescribedaccordingtothedegreeofuserinteractioninvolved(e.g.,autonomoussystems,interactiveexploratorysystems,query-drivensystems),orthemethodsofdataanalysisemployed(e.g.,database-orientedordatawarehouse-orientedtechniques,machinelearning,statistics,visualization,patternrecognition,neuralnetworks,andsoon).Asophisticateddataminingsystemwilloftenadoptmultipledataminingtechniquesorworkoutaneffective,integratedtechniquewhichcombinesthemeritsofafewindividualapproaches.翻譯:什么是數(shù)據(jù)挖掘?簡(jiǎn)單地說,數(shù)據(jù)挖掘是從大量的數(shù)據(jù)中提取或“挖掘”知識(shí)。該術(shù)語(yǔ)實(shí)際上有點(diǎn)兒用詞不當(dāng)。注意,從礦石或砂子中挖掘黃金叫做黃金挖掘,而不是叫做礦石挖掘。這樣,數(shù)據(jù)挖掘應(yīng)當(dāng)更準(zhǔn)確地命名為“從數(shù)據(jù)中挖掘知識(shí)”,不幸的是這個(gè)有點(diǎn)兒長(zhǎng)?!爸R(shí)挖掘”是一個(gè)短術(shù)語(yǔ),可能它不能反映出從大量數(shù)據(jù)中挖掘的意思。畢竟,挖掘是一個(gè)很生動(dòng)的術(shù)語(yǔ),它抓住了從大量的、未加工的材料中發(fā)現(xiàn)少量金塊這一過程的特點(diǎn)。這樣,這種用詞不當(dāng)攜帶了“數(shù)據(jù)”和“挖掘”,就成了流行的選擇。還有一些術(shù)語(yǔ),具有和數(shù)據(jù)挖掘類似但稍有不同的含義,如數(shù)據(jù)庫(kù)中的知識(shí)挖掘、知識(shí)提取、數(shù)據(jù)/模式分析、數(shù)據(jù)考古和數(shù)據(jù)捕撈。許多人把數(shù)據(jù)挖掘視為另一個(gè)常用的術(shù)語(yǔ)—數(shù)據(jù)庫(kù)中的知識(shí)發(fā)現(xiàn)或KDD的同義詞。而另一些人只是把數(shù)據(jù)挖掘視為數(shù)據(jù)庫(kù)中知識(shí)發(fā)現(xiàn)過程的一個(gè)基本步驟。知識(shí)發(fā)現(xiàn)的過程由以下步驟組成:1)數(shù)據(jù)清理:消除噪聲或不一致數(shù)據(jù),2)數(shù)據(jù)集成:多種數(shù)據(jù)可以組合在一起,3)數(shù)據(jù)選擇:從數(shù)據(jù)庫(kù)中檢索與分析任務(wù)相關(guān)的數(shù)據(jù),4)數(shù)據(jù)變換:數(shù)據(jù)變換或統(tǒng)一成適合挖掘的形式,如通過匯總或聚集操作,5)數(shù)據(jù)挖掘:基本步驟,使用智能方法提取數(shù)據(jù)模式,6)模式評(píng)估:根據(jù)某種興趣度度量,識(shí)別表示知識(shí)的真正有趣的模式,7)知識(shí)表示:使用可視化和知識(shí)表示技術(shù),向用戶提供挖掘的知識(shí)。數(shù)據(jù)挖掘的步驟可以與用戶或知識(shí)庫(kù)進(jìn)行交互。把有趣的模式提供給用戶,或作為新的知識(shí)存放在知識(shí)庫(kù)中。注意,根據(jù)這種觀點(diǎn),數(shù)據(jù)挖掘只是整個(gè)過程中的一個(gè)步驟,盡管是最重要的一步,因?yàn)樗l(fā)現(xiàn)隱藏的模式。我們同意數(shù)據(jù)挖掘是知識(shí)發(fā)現(xiàn)過程中的一個(gè)步驟。然而,在產(chǎn)業(yè)界、媒體和數(shù)據(jù)庫(kù)研究界,“數(shù)據(jù)挖掘”比那個(gè)較長(zhǎng)的術(shù)語(yǔ)“數(shù)據(jù)庫(kù)中知識(shí)發(fā)現(xiàn)”更為流行。因此,在本書中,選用的術(shù)語(yǔ)是數(shù)據(jù)挖掘。我們采用數(shù)據(jù)挖掘的廣義觀點(diǎn):數(shù)據(jù)挖掘是從存放在數(shù)據(jù)庫(kù)中或其他信息庫(kù)中的大量數(shù)據(jù)中挖掘出有趣知識(shí)的過程。基于這種觀點(diǎn),典型的數(shù)據(jù)挖掘系統(tǒng)具有以下主要成分:數(shù)據(jù)庫(kù)、數(shù)據(jù)倉(cāng)庫(kù)或其他信息庫(kù):這是一個(gè)或一組數(shù)據(jù)庫(kù)、數(shù)據(jù)倉(cāng)庫(kù)、電子表格或其他類型的信息庫(kù)??梢栽跀?shù)據(jù)上進(jìn)行數(shù)據(jù)清理和集成。數(shù)據(jù)庫(kù)、數(shù)據(jù)倉(cāng)庫(kù)服務(wù)器:根據(jù)用戶的數(shù)據(jù)挖掘請(qǐng)求,數(shù)據(jù)庫(kù)、數(shù)據(jù)倉(cāng)庫(kù)服務(wù)器負(fù)責(zé)提取相關(guān)數(shù)據(jù)。知識(shí)庫(kù):這是領(lǐng)域知識(shí),用于指導(dǎo)搜索,或評(píng)估結(jié)果模式的興趣度。這種知識(shí)可能包括概念分層,用于將屬性或?qū)傩灾到M織成不同的抽象層。用戶確信方面的知識(shí)也可以包含在內(nèi)??梢允褂眠@種知識(shí),根據(jù)非期望性評(píng)估模式的興趣度。領(lǐng)域知識(shí)的其他例子有興趣度限制或閾值和元數(shù)據(jù)(例如,描述來(lái)自多個(gè)異種數(shù)據(jù)源的數(shù)據(jù))。數(shù)據(jù)挖掘引擎:這是數(shù)據(jù)挖掘系統(tǒng)基本的部分,由一組功能模塊組成,用于特征化、關(guān)聯(lián)、分類、聚類分析以及演變和偏差分析。模式評(píng)估模塊:通常,此成分使用興趣度度量,并與數(shù)據(jù)挖掘模塊交互,以便將搜索聚集在有趣的模式上。它可能使用興趣度閾值過濾發(fā)現(xiàn)的模式。模式評(píng)估模塊也可以與挖掘模塊集成在一起,這依賴于所用的數(shù)據(jù)挖掘方法的實(shí)現(xiàn)。對(duì)于有效的數(shù)據(jù)挖掘,建議盡可能深地將模式評(píng)估推進(jìn)到挖掘過程之中,以便將搜索限制在有興趣的模式上。圖形用戶界面:本模塊在用戶和數(shù)據(jù)挖掘系統(tǒng)之間進(jìn)行通信,允許用戶與系統(tǒng)進(jìn)行交互,指定數(shù)據(jù)挖掘查詢或任務(wù),提供信息、幫助搜索聚焦,根據(jù)數(shù)據(jù)挖掘的中間結(jié)果進(jìn)行探索式數(shù)據(jù)挖掘。此外,此成分還允許用戶瀏覽數(shù)據(jù)庫(kù)和數(shù)據(jù)倉(cāng)庫(kù)模式或數(shù)據(jù)結(jié)構(gòu),評(píng)估挖掘的模式,以不同的形式對(duì)模式進(jìn)行可視化。從數(shù)據(jù)倉(cāng)庫(kù)觀點(diǎn),數(shù)據(jù)挖掘可以看作聯(lián)機(jī)分析處理(OLAP)的高級(jí)階段。然而,通過結(jié)合更高級(jí)的數(shù)據(jù)理解技術(shù),數(shù)據(jù)挖掘比數(shù)據(jù)倉(cāng)庫(kù)的匯總型分析處理走得更遠(yuǎn)。盡管市場(chǎng)上已有許多“數(shù)據(jù)挖掘系統(tǒng)”,但是并非所有系統(tǒng)的都能進(jìn)行真正的數(shù)據(jù)挖掘。不能處理大量數(shù)據(jù)的數(shù)據(jù)分析系統(tǒng),最多是被稱作機(jī)器學(xué)習(xí)系統(tǒng)、統(tǒng)計(jì)數(shù)據(jù)分析工具或?qū)嶒?yàn)系統(tǒng)原型。一個(gè)系統(tǒng)只能夠進(jìn)行數(shù)據(jù)或信息檢索,包括在大型數(shù)據(jù)庫(kù)中找出聚集的值或回答演繹查詢,應(yīng)當(dāng)歸類為數(shù)據(jù)庫(kù)系統(tǒng)

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論