




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
HowtoSelectYour
CloudDataWarehousePlatformStrategically
May2022
RichardWinter,CEO
WinterCorpLLC
TheExpertsinAnalyticDataManagementatScale
WC
RichardWinter
Presenter
RichardWinter,CEOandPrincipalConsultant,WinterCorp
?Careerfocusinanalyticdatamanagementatlargescale
?Pioneeringdeveloperofdatawarehousetechnologyandproducts
?Hasperformedmorethan50datawarehouseevaluations,oftenincludingbenchmarks,forleadingenterprises
?Advisertoindustryleadersinanalytics,userandvendor
?Skilledatdatawarehousestrategies,requirements,strategicdesign,engineeringanalysis,testdesignandevaluation
?Experiencedconsultantandtechnologyeducator,widelypublished
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC2
TheDataDrivenEnterpriseofthe2020s
MostCompaniesareIncreasingUseof:
1.Digital,realtimebusinessprocesses
2.Automateddecisionmaking(AI/ML)
3.Focusoncompetingwithanalytics
Alltrendsacceleratedbytheincreasingspeedofbusinesscompetition
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC3
SourceSystems
…
Orders
Report1
Products
Products
Report2
Suppliers
ReportN
Shipments
Cleanseddata,integrated
acrosstheenterprise
WeeklyLoads
TheDataWarehouseImpetus:1990s
Orders
CONSISTENT,ACCURATEREPORTING
DataWarehouse
Inventory
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC4
ANALYZING
WHYdidit
happen?
REPORTING
WHAT
happened?
A
STRATEGICINTELLIGENCE
AT
PREDICTING
WHATWILL
happen?
StagesofDataWarehouseEvolution
OPERATIONALINTELLIGENCE
OPERATIONALIZING
WHATIS
happeningnow?
ACTIVATING
MAKEithappen!
A
T
A
T
Batch
AdHoc
AnalyticsContinuousupdates, tacticalqueriesEventdriven
A
T
Source:DiagramadaptedfromBrobst&Rarey,2003 ?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.WWCC5
WC
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
LeadingEdge–200%CAGR
EB
5
4
3
2
Estimated
1
2010201420182022
Data
Warehouse
Growth
Source:WinterCorpPrimaryResearch–surveysandcustomerreports
WC6
WC
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
Key
Dimensions
ofData
Warehouse
Scale
WC7
Averagecommercialbuilding(20ksf)
BurjKhalifa,Dubai–TallestBuildingintheWorld
Average
single
family
home
(2500sf)
?2717feettall
?3.3millionsquarefeetinteriorspace
?57elevators
?62milesofpipesforplumbing(plus132milesforthefireemergencysystem)
?24kwindows
?62khpcoolingsystem
?1100xthesizeofasinglefamilyhome
?165xthesizeoftheaveragecommercialbuilding
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC8
SCALE+COMPLEXITY
Ifyouonlyrememberonethingtoday…
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC9
DataMartvsIntegratedDataWarehouse(IDW)
?Purposebuilt
?Onelargetable
?Dimensionsusuallysmall
?1-1or1-to-manyrelationships
?Commonpartitioningkeys
?Mostjoinsnotsodemanding
?User-visiblepre-joinscanbeok
StarSchema–DataMart
Store
PK
storeid
storename
street
city
state
zip
managerid
storeclass
storesize
salesstrategy
buildingtype
dateopened
dateclosed
numberofregisterscachementpopulation
SMSA
censustract
Transaction
PK
transid
transtype
date
time
productid
storeid
customerid
associateid
basketid
quantity
unitprice
discount
taxcategory
taxamount
extendedpricepaymentmethodaccountid
Product
PK
productid
namedescription
SKU
package
size
color
style
attr1-n
Customer
PK
customerid
name
street
city
state
zip
phone
maritalstatusfamilysizeagegroupincomegroupeducationallevelgenderbirthyear
attr1-n
?Manylargetableswithmultiplejoinpaths
?Many-to-manyrelationships
?Differentpartitioningkeys
?Joinsaremoredemanding
?User-visiblepre-joinsareaproblem
RelationalSchema-IDW
Organization
PK
FK1
name
street
city
state
zip
country
phone
fax
emailwebaddressindustrycodenumberofempbusinesssizestatus
startdate
enddate
orgtype
partyID
Person
PK
personID
FK1
FK2
name
street
city
state
zip
phone
gender
dob
incomelevelstatus
startdateenddatehouseholdIDpartyID
Household
PK
householdID
FK1
street
ctiy
state
zip
phone
householdsizeincomelevelpartyID
Party
PK
partyID
partytypestartdateenddate
Arrangement
PK
arrangementID
arrangementtypenamedescription
startdate
enddate
notes
Resource
PK
resourceID
FK1
resourcetypestartdataenddatenamedescriptionarrangementID
Location
PK
locationID
FK1
locationtype
name
street
city
state
zip
partyID
Product
PK
productID
FK2
productclassnamedescriptionproductstatusarrangementID
Conditions
PK
conditionID
FK1
conditiontypestartdateenddateamountarrangementID
Statement
PK
statement_id
FK1
balancepast_due_amountpastduedayseventID
Event
PK
eventID
FK1
eventtypeeventdateeventtimeeventamountaccountID
Relationship
PK
relationshipID
FK1
FK2
FK3
rel_typeaccountIDarrangementIDpartyID
Account
PK
accountID
accounttypestartdateenddateaccountstatus
orgID
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC10
IllustrativeDataWarehouseUses:FinancialServices
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC11
WC
12
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
TheDataRelationshipsinanIntegratedDataWarehouse
?Analysisofactualdatarelationshipsinanintegrateddatawarehouse,largeUScompany
?Dotsaretablesorviews
?Linesaredatarelationshipsthatactuallyoccurinqueries
?Diagramillustratesthatacombinatorialexplosionofjoinsdevelopsasthebusinessvalueofthedatarelationshipsisexploited
WC
IntegratedDataWarehouse
TYPICALCOREREQUIREMENTS
?Supportefficient(bigtable,bigtable)joinswithoutco-locationorpre-join(Toomanydifferentjoinsarerequired)
?Oftenlargerscale
?Supportcomplexqueriesalongavarietyofjoinpaths
?Effectivecomplexqueryoptimization
?Efficient,fasttacticalquery
?Managecomplex,mixedworkloads
?Supportmanyconcurrentoperations
?Supportupdatesthroughoutthedayaswellasanynightlybatchruns
?High,ifnotcontinuous,availabilityofup-to-datedata
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC13
JoinPerformanceatScale
?Challengingproblemiswhenthesetstobematchedarelarge
?Costandtimearemanytimesgreaterwhen:
—Techniquesarenotscalable
—Intermediateresultsspilltodisk
?Customerscanbemisledwhenthesamequeriesworkfineatsmallerscale
?Dominatestheperformancepictureinmanydatawarehouses
?Allpopulardatawarehouseplatformsperformsomejoinswell
?Manyperformthelargerandmorecomplexjoinspoorly
?Differencebetweenthebestandtheworstisoftenverylarge
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC14
JoinTestDatabase
?TPC-Hbenchmarkdatabase
?ScaleFactors:10,100,1000,10000
?AtSF1databaseis1GB
TableName
RowCount
RowSize
(Bytes)
EstimatedTableSize(MB,Uncompressed)
LineItem
6,000,000
121
725
Order
1,500,00
110
165
Part
200,00
120
24
Part-Supplier
800,000
143
114
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC15
Orders
X4
6millionrows
660MB
Part-Supplier
800Krows
114MB
N
Part
200Krows
24MB
4
N
N
1
1
DatabaseQuery–BigJoin
AtSF1,resultingjoinisatablewith:
?24millionrows
?450bytesperrow
?≈10.8GBtotalsize,uncompressed
Allsizesincreaseproportionallywithscalefactor
Line
Item
6millionrows
725MB
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC16
QueryPerformanceandCostvsDataSize
?Runonsmallvirtualdatawarehouse
?Twoservers,eachwith8virtualprocessors
DataSize
ResponseTime
(Seconds)
OnDemand
Cost
Comparedto
LinearGrowth
10GB
156
$0.09
-
100GB
1626
$0.90
104%
1TB
30,656
$17
196%
10TB
736,072
$409
454%
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC17
$17at1TB
(196%ofprojectedcost)
QueryCostisWorseThanLinearwithDataSize
POPULARCLOUDDATAWAREHOUSESERVICE
QueryCost($)
$409at10TB
Source:WinterCorpCloudDatabaseLabTestingonGCP
400
(472%ofprojectedcost)
320
240
ActualProjected
160
$0.90at100GB
Shouldbe$90at10TBGB
80
DataSize(GB)
100
1000
500
Shouldbe
$9at1TB
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC18
BigJoinExperiment:Interpretation
?Forthesystemtested,causeoftheperformanceproblemwasthatintermediatequeryresultsweresignificantlylargerthanSSDperserver
?Largeintermediateresultsoccurwith:
—Joinofmultiplebigtables—commonintheintegrateddatawarehouse
—Self-joinofalargetable—commoninpathanalytics,networkanalytics,IoT
?Specificquerypatternoccursinpractice,thoughinfrequently
?Sameproblemoccursinnconcurrentquerieseachneedcreate(1/n)intermediateresults—canoccurinawiderangeofusecases
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC19
15
?Independentconsultant
?Differentrequirement
ProductB
12
mightfavoradifferentproduct
9
WC
20
AnnualCloudChargesfortheSameDataWarehouse
WorkloadwithThreePopularCloudDWEngines
?Realcustomerdatabase&workload
$Millions
ProductA
6
Nooneproductis
alwaysbest!!!
ProductC
3
Source:BEZNextWhitePaper-WhichPlatformIsBestforYourCloudDataWarehouse
/#readthepaper
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC
DataWarehousingintheCloud
THEPROMISE
—Agility
—Elasticity
—Cost
—Scalability
—WorkloadManagement
Thesub-text:don’tworry,moreresourcesarealwaysavailable
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC21
DataWarehousingintheCloud
THEREALITY
?Datawarehouseengineeringchallengesarestillfundamental—Performance
—Cost
—DataAvailability
majorriskareas
hardtodeliveratenterprisescaleandcomplexity
?Resourcesnotfree
?Doublingtheresourcesoftendoesn’tdoubletheperformanceorthroughput
?Scalerisesfast
DatawarehouseproductsdiffergreatlyinperformanceandcostandtodaywearefocusingonBIGdifferences(usually>3x)
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC22
WWCC
MostWidelyUsedCloudDataWarehousePlatforms
Vendor
AmazonWeb
Services
(AWS)
Cloudera
Microsoft
Oracle
Snowflake
Teradata
Product/Service
Redshift
ClouderaData
Warehouse
AzureSynapseAnalytics
BigQuery
AutonomousDataWarehouse(ADW)
SnowflakeDataPlatform
Vantage
CloudAvailability
AWS
AWS,Azure,GCP
Azure
GCP
OracleCloud(AWSNotAs-a-Service)
AWS,Azure,GCP
AWS
GCPAnnounced
MicrosoftAzure
TeradataIntellicloud
OnPremisesAvailability
No
Yes,onHadoop
MicrosoftSQLServer
(Similarbutnotidentical)
No
Yes
Oracle
Cloud@Customer
No
Yes,optionallywithconsumptionpricing
Serverless
No
No
No
Yes
Twomodes:dedicatedandshared(~serverless)
ServerT-ShirtSizes
No
PricingModel
Default:ondemand
pricingbythesecond;
reservedinstance
discount(1-3yearterm);
payseparatelyfordata
storage
Priceperhourperserver
forsoftware;customer
payscloudprovider
separatelyforcompute
andstorage
Separatecomputeand
storage.Payfordata
stored,withflexible
scaling.Payforthe
computelevel,with
optiontopausecompute
entirely.Monthlypricing
(with3yearterm)
Default:ondemandpricing-
payperquerybasedondata
scanned;flatratepricingfor
reserved"queryslots"(a
blendedresourceunit);pay
separatelyfordatastorage
Unitpricingfor"shared"
publiccloudusefor
computeandstorage;
dedicatedhardwarewith
licenseportability
availableinpubliccloud
andonpremise;private
cloudonpremise
availableonsubscription
basis.
Separationofcompute
andstorage;"Snowflake
onDemand"persecond
pricing;reservedcapacity
availableonrequest;3
editions
Licenseportability;
consumptionpricing--
payonlyforcompleted
queries(cloudandon
prem);customernot
chargedforoverhead
operationssuchas
statistics;alsooffer
capacitypricing;
?2020,2021WinterCorpLLC,TyngsboroMA
23
WC
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
The
Datanolongerneedstobein
onecentralstore:cloud,onprem,multi-cloud,allofabove
Modern
Data
Warehouse
WC24
TheModernAnalyticDataPlatform
DataLake
Alltypesofdata
including“unstructured”
(largevideo,scans,etc.)
Managesalltypesofdataforanalyticuseincludingdatathatis/has:
?largeandvariableinsize
?relatively“unstructured”
?lowervaluedensity
?lessshareduse
?variedlevelsofcuration
?colder
Data&Requests
Data&Requests
Data&requestsflowinboth
directions
Boundariesareunclearand
changing
Maybeitwillsoonbeone
“dataplatform”–sometimes
positionedasthe“datacloud”
DataWarehouse
Manytypesofdata,mosthassomestructure
Managesmanytypesof
dataforanalyticusethat
ismost:
?intensivelyused
?widelyshared
?highlycurated
?integrated
?valuable
DataLakehousehassomecharacteristicsofbothlakeandwarehouse
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC25
Industry
Trends
?QuantifiedRequirementsare:
—DatabaseMacro
structure
Projected
Growth**
—Scale
—Workload(e.g.,queryclassesandfrequencies)
—Userpopulation
—ServiceLevels
Estimateinranges,but
estimate!!
DefiningQuantifiedArchitecturalRequirements
Business**Interests/Vision
CurrentNeeds
Projected
Requirements
(Quantified)
?2019-2022WinterCorpLLC,Tyngsboro,MA.AllRightsReserved.
WC26
DoaQuantifiedEvaluation,withMeasurement
WorkingfromyourEstimateofMacro-Requirements
1.Canthecandidateplatformsperformatyourtargetlevelsofscaleandcomplexity?
?Notonlyfeatures,butanengineeringassessment
?Requireproductionreferenceswithsimilarmacro-structureandscale
2.Criti
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 商業(yè)綜合體招商運營風(fēng)險分析及2025年社會穩(wěn)定風(fēng)險防范策略報告
- 新解讀《GB-T 39058 - 2020農(nóng)產(chǎn)品電子商務(wù)供應(yīng)鏈質(zhì)量管理規(guī)范》
- 中小學(xué)2025年《秋季軍訓(xùn)》工作方案 合計3份
- 2025年小學(xué)安全教育日考試題及答案
- 江蘇省高考數(shù)學(xué)二輪復(fù)習(xí) 專題三 解析幾何 3.2 大題考法-直線與圓課件-人教版高三全冊數(shù)學(xué)課件
- 高考數(shù)學(xué)一輪復(fù)習(xí) 9.9 用樣本估計總體課件 理-人教版高三全冊數(shù)學(xué)課件
- 2025年上海新高一英語自學(xué)專項提升:高中題型之語法填空考向探究(初高考點差異)
- 2026年高考語文一輪復(fù)習(xí)文學(xué)類文本教考銜接之小說閱讀
- 2025中考數(shù)學(xué)押題預(yù)測卷(湖北卷)含答案解析
- 2026年中考物理高頻考點復(fù)習(xí):簡單機械 講義(含練習(xí)題及答案)
- GB/T 6543-2025運輸包裝用單瓦楞紙箱和雙瓦楞紙箱
- DLT5210.1-2021電力建設(shè)施工質(zhì)量驗收規(guī)程第1部分-土建工程
- 建筑工程質(zhì)量策劃方案編制指導(dǎo)手冊 2025
- 2025年登高架設(shè)高處作業(yè)模擬考試100題及答案
- 機器人學(xué)導(dǎo)論 課件全套 王偉 第1-5章-緒論 -操作臂的控制方法
- 事業(yè)單位離崗?fù)损B(yǎng)政策
- 基礎(chǔ)教育改革理論與熱點 課件 第三章基礎(chǔ)教育課程改革
- 洪澇災(zāi)害后常見傳染病防控要點
- 2025年紙杯機市場分析現(xiàn)狀
- 新疆烏魯木齊市名校2025屆八下數(shù)學(xué)期末經(jīng)典試題含解析
- 《初中畢業(yè)班家長會課件中學(xué)銜接》
評論
0/150
提交評論