Introduction
"TextMining(Englishversion)"isamasterpieceinthefieldoftextmining.Theauthorisaworld-renownedauthoritativescholar."TextMining(EnglishEdition)"isverysuitableforresearchersandpractitionersinthefieldoftextminingandinformationretrieval.Itisalsosuitableasatextbookfordataminingandknowledgediscoverycoursesforpostgraduatesincomputerandrelatedmajorsincollegesanduniversities.
Abouttheauthor
RonenFeIdmarl,apioneerinmachinelearning,dataminingandunstructureddatamanagement.SeniorLecturerintheDepartmentofMathematicsandComputerScienceatBar-liarlUniversity,Israel,DirectoroftheDataMiningLaboratory,Co-founderandChairmanofClearforestCompany(mainlydevelopingnext-generationtextminingapplicationsforenterprisesandgovernmentagencies),andnowamemberofNewYorkUniversity'sSternSchoolofBusinessAssociateProfessor.
JamesSangerventurecapitalist,arecognizedindustryexpertinthefieldsofbusinessdatasolutions,InternetapplicationsandITsecurityproducts.Heco-foundedABSVetlluresin1982.Priortothis,hewasthemanagingdirectorofDBCapitalNewYork.HegraduatedfromtheUniversityofPennsylvaniawithabachelor'sdegreeandgraduatedfromtheUniversityofOxfordandtheUniversityofLiverpool.HeisamemberofIEEEandtheAmericanAssociationforArtificialIntelligence(AAAI).
MediaRecommendation
"...Iboughtthisbook.Thisbookisdefinitelyareferencebookworthhaving."
——L.VenkataSubramaniam,IBMIndiaResearchLaboratory
"Anintroductiontotextminingwrittenbythemostimportantexpertinthefield.Thisbookisverywellwritten.Itperfectlycombinesthetheoryandpracticeoftextmining.Suitableforresearchersandpractitioners...Itishighlyrecommendedforthosewhodonothaveanycomputationallinguisticsbackgroundandwanttodelveintothefieldoftextminingtoreadthisbook."
——RadaMihalcea,UniversityofNorthTexas
Textmininghasbecomeanexcitingnewresearchfield.Thisbookiswrittenbyworld-renownedauthoritativescholars.Inadditiontoexplainingcoretextminingandlinkdetectionalgorithmsandtechniques,italsointroducesadvancedpreprocessingtechniques.Andconsiderthefactorsofknowledgerepresentationandvisualizationmethods.also.Thebookalsodiscussestheapplicationofrelatedtechnologiesinpractice,takingintoaccountthetheoryandpracticeoftextmining.
Contents
I.IntroductiontoTextMining1
I.1DefiningTextMining1
I.2GeneralArchitectureofTextMiningSystems13
II.CoreTextMiningOperations19
II.1CoreTextMiningOperations19
II.2UsingBackgroundKnowledgeforTextMining41
II.3TextMiningQueryLanguages51
III.TextMiningPreprocessingTechniques57
III.1Task-OrientedApproaches58
III.2FurtherReading62
IV.Categorization64
IV.1ApplicationsofTextCategorization65
IV.2DefinitionoftheProblem66
IV.3DocumentRepresentation68
IV.4KnowledgeEngineeringApproachtoTC70
IV.5MachineLearningApproachtoTC70
IV.6UsingUnlabeledDatatoImproveClassification78
IV.7EvaluationofTextClassifiers79
p>IV.8CitationsandNotes80
V.Clustering82
V.1ClusteringTasksinTextAnalysis82
V.2TheGeneralClusteringProblem84
V.3ClusteringAlgorithms85
V.4ClusteringofTextualData88
V.5CitationsandNotes92
VI.InformationExtraction94
VI.1IntroductiontoInformationExtraction94
VI.2HistoricalEvolutionofIE:TheMessageUnderstandingConferencesandTipster96
VI.3IEExamples101
VI.4ArchitectureofIESystems104
VI.5AnaphoraResolution109
VI.6InductiveAlgorithmsforIE119
VI.7StructuralIE122
VI.8FurtherReading129
VII.ProbabilisticModelsforInformationExtraction131
VII.1HiddenMarkovModels131
VII.2StochasticContext-FreeGrammars137
VII.3MaximalEntropyModeling138
VII.4MaximalEntropyMarkovModels140
VII.5ConditionalRandomFields142
VII.6FurtherReading145
VIII.PreprocessingApplicationsUsingProbabilisticandHybridApproaches146
VIII.1ApplicationsofHMMtoTextualAnalysis146
p>VIII.2UsingMEMMforInformationExtraction152
VIII.3ApplicationsofCRFstoTextualAnalysis153
VIII.4TEG:UsingSCFGRulesforHybridStatistical–Knowledge-BasedIE155
VIII.5Bootstrapping166
VIII.6FurtherReading175
IX.Presentation-LayerConsiderationsforBrowsingandQueryRefinement177
IX.1Browsing177
IX.2AccessingConstraintsandSimpleSpecificationFiltersatthePresentationLayer185
IX.3AccessingtheUnderlyingQueryLanguage186
IX.4CitationsandNotes187
X.VisualizationApproaches189
X.1Introduction189
X.2ArchitecturalConsiderations192
X.3CommonVisualizationApproachesforTextMining194
X.4VisualizationTechniquesinLinkAnalysis225
X.5Real-WorldExample:TheDocumentExplorerSystem235
XI.LinkAnalysis244
XI.1Preliminaries244
XI.2AutomaticLayoutofNetworks246
XI.3PathsandCyclesinGraphs250
XI.4Centrality251
p>XI.5PartitioningofNetworks259
XI.6PatternMatchinginNetworks272
XI.7SoftwarePackagesforLinkAnalysis273
XI.8CitationsandNotes274
XII.TextMiningApplications275
p>XII.1GeneralConsiderations276
XII.2CorporateFinance:MiningIndustryLiteratureforBusinessIntelligence281
XII.3A“Horizontal”TextMiningApplication:PatentAnalysisSolutionLeveragingaCommercialTextAnalyticsPlatform297
XII.4LifeSciencesResearch:MiningBiologicalPathwayInformationwithGeneWays309
AppendixA:DIAL:ADedicatedInformationExtractionLanguageforTextMining317
A.1WhatIstheDIALLanguage?317
A.2InformationExtractionintheDIALEnvironment318
A.3TextTokenization320
A.4ConceptandRuleStructure320
A.5PatternMatching322
A.6PatternElements323
A.7RuleConstraints327
A.8ConceptGuards328
A.9CompleteDIALExamples329
Bibliography337
Index391