Concept
Regressionanalysisisamathematicalmodel.Whenthedependentvariableandtheindependentvariablehavealinearrelationship,itisaspeciallinearmodel.
Thesimplestcaseisone-variablelinearregression,whichconsistsofanindependentvariableandadependentvariablethatareroughlylinearlyrelated;themodelisY=a+bX+ε(Xistheindependentvariable,YisthecauseVariable,εisrandomerror).
Usuallyassumethatthemeanvalueoftherandomerroris0,andthevarianceisσ^2(σ^2﹥0,σ^2hasnothingtodowiththevalueofX).Ifitisfurtherassumedthattherandomerrorfollowsanormaldistribution,itiscalledanormallinearmodel.Generally,iftherearekindependentvariablesand1dependentvariable,thevalueofthedependentvariableisdividedintotwoparts:onepartisaffectedbytheindependentvariable,thatis,expressedasitsfunction,thefunctionformisknownandcontainsunknownparameters;theotherpartisdeterminedbyOtherunconsideredfactorsandrandomeffectsarerandomerrors.
Whenthefunctionisalinearfunctionwithunknownparameters,itiscalledalinearregressionanalysismodel;whenthefunctionisanonlinearfunctionwithunknownparameters,itiscalledanonlinearregressionanalysismodel.Whenthenumberofindependentvariablesisgreaterthan1,itiscalledmultipleregression,andwhenthenumberofdependentvariablesisgreaterthan1,itiscalledmultipleregression.
Regressionanalysiscontent
Themaincontentofregressionanalysisisasfollows:
①Startingfromasetofdata,determinethequantitativerelationshipbetweencertainvariables;Thatis,amathematicalmodelisestablishedandunknownparametersareestimated.Usuallytheleastsquaremethodisused.
②Testthetrustworthinessoftheserelations.
③Intherelationshipbetweenmultipleindependentvariablesaffectingadependentvariable,judgewhethertheindependentvariablehasasignificantimpact,andselectthesignificantimpactintothemodel,andeliminateinsignificantvariables.Stepwiseregression,forwardregression,andbackwardregressionareusuallyused.
④Usetherequiredrelationshiptopredictorcontrolacertainprocess.
Theapplicationofregressionanalysisisveryextensive,andtheuseofstatisticalsoftwarepackagescanmakevariousalgorithmsmoreconvenient.
Typesofregression
Themaintypesofregressionare:linearregression,curvilinearregression,binarylogisticregression,andmultiplelogisticregression.
Applicationofanalysis
Correlationanalysisstudiesthecorrelationbetweenphenomena,thedirectionandclosenessofcorrelation,andgenerallydoesnotdistinguishbetweenindependentvariablesordependentvariables.Regressionanalysisistoanalyzethespecificformsofcorrelationbetweenphenomena,determinethecausalrelationship,andusemathematicalmodelstoexpressthespecificrelationship.Forexample,fromthecorrelationanalysis,wecanknowthatthe"quality"and"usersatisfaction"variablesarecloselyrelated,butwhichvariablebetweenthesetwovariablesisaffectedbywhichvariable,andthedegreeofinfluence,requiresregressionanalysisMethodtodetermine.
Generallyspeaking,regressionanalysisistodeterminethecausalrelationshipbetweendependentvariablesandindependentvariables,establisharegressionmodel,andsolvethevariousparametersofthemodelbasedonthemeasureddata,andthenevaluatewhethertheregressionmodelisItcanfitthemeasureddatawell;ifitcanfitwell,furtherpredictionscanbemadebasedontheindependentvariables.
Forexample,ifyouwanttostudythecausalrelationshipbetweenqualityandusersatisfaction,inapracticalsense,productqualitywillaffectusersatisfaction,sosetusersatisfactionasthedependentvariableandrecorditasY;Qualityistheindependentvariable,denotedasX.AccordingtothescatterplotinFigure8-3,thefollowinglinearrelationshipcanbeestablished:
Y=A+BX+§
where:AandBareundeterminedparameters,andAisregressionTheinterceptofthestraightline;Bistheslopeoftheregressionline,whichrepresentstheaveragechangeofYwhenXchangesbyoneunit;§istherandomerrortermthatdependsonusersatisfaction.
LinearregressioncanbeeasilyimplementedintheSPSSsoftware.Theregressionequationisasfollows:
y=0.857+0.836xTheinterceptoftheregressionlineontheyaxisis0.857andtheslopeis0.836,Thatis,foreveryonepointimprovementinquality,usersatisfactionincreasesby0.836pointsonaverage;inotherwords,thecontributionofeveryonepointimprovementinqualitytousersatisfactionis0.836points.
Theexampleshownaboveisasimplelinearregressionproblemofoneindependentvariable.Duringdataanalysis,thiscanalsobeextendedtomultipleregressionofmultipleindependentvariables.PleaserefertothespecificregressionprocessandmeaningRefertorelevantstatisticsbooks.Inaddition,intheSPSSresultoutput,R2,FtestvalueandTtestvaluecanalsobereported.R2isalsocalledthecoefficientofdeterminationoftheequation,whichindicatesthedegreeofinterpretationofthevariableXtoYintheequation.ThevalueofR2isbetween0and1.Thecloserto1,thestrongertheinterpretationabilityofXtoYintheequation.R2isusuallymultipliedby100%toexpressthepercentageofYchangeexplainedbytheregressionequation.TheFtestisoutputthroughtheanalysisofvariancetable,andthesignificancelevelisusedtotestwhetherthelinearrelationshipoftheregressionequationissignificant.Generallyspeaking,significancelevelsbelow0.05aremeaningful.WhentheFtestpasses,itmeansthatatleastoneoftheregressioncoefficientsintheequationissignificant,butnotallregressioncoefficientsaresignificant,soaTtestisneededtoverifythesignificanceoftheregressioncoefficients.Similarly,theTtestcanbedeterminedbythesignificanceleveloralook-uptable.Intheexampleshownabove,themeaningofeachparameterisshowninTable1-1.
Table1-1linearregressionequationtest
index | Value | Saliencylevel | Meaning |
R | 0.89 | "Quality"explains89%ofthedegreeofchangein"UserSatisfaction" | |
F | 276.82 | 0.001 | Thelinearrelationshipoftheregressionequationissignificant |
T | 16.64 | 0.001 | Thecoefficientoftheregressionequationissignificant |