Web數據挖掘:挖掘Web內容模式、結構和用途

Web數據挖掘:挖掘Web內容模式、結構和用途

《Web數據挖掘:挖掘Web內容模式、結構和用途》是2007年Wiley Blackwell出版的圖書,作者是Zdravko Markov。

基本介紹

  • 書名:Web數據挖掘:挖掘Web內容模式、結構和用途
  • 作者:Zdravko Markov
  • 出版社:Wiley Blackwell
  • 出版時間:2007年04月01日
圖書信息,圖書簡介,內容簡介,內容截圖,目錄,

圖書信息

中文名: Web數據挖掘:挖掘Web內容模式、結構和用途
作者: Zdravko Markov
Daniel T. Larose
圖書分類: 網路
資源格式: PDF
版本: 文字版
出版社: Wiley Blackwell
發行時間: 2007年04月01日
地區: 美國
語言: 英文

圖書簡介

內容簡介

This book introduces the reader to methods of data mining on the web, including uncovering patterns in web content (classification, clustering, language processing), structure (graphs, hubs, metrics), and usage (modeling, sequence analysis, performance)

內容截圖

Web數據挖掘:挖掘Web內容模式、結構和用途

目錄

  • PREFACE
    PART I: WEB STRUCTURE MINING
    1 INFORMATION RETRIEVAL AND WEB SEARCH
    Web Challenges
    Web Search Engines
    Topic Directories
    Semantic Web
    Crawling the Web
    Web Basics
    Web Crawlers
    Indexing and Keyword Search
    Document Representation
    Implementation Considerations
    Relevance Ranking
    Advanced Text Search
    Using the HTML Structure in Keyword Search
    Evaluating Search Quality
    Similarity Search
    Cosine Similarity
    Jaccard Similarity
    Document Resemblance
    References
    Exercises
    2 HYPERLINK-BASED RANKING
    Introduction
    Social Networks Analysis
    PageRank
    Authorities and Hubs
    Link-Based Similarity Search
    Enhanced Techniques for Page Ranking
    References
    Exercises
    PART II: WEB CONTENT MINING
    3 CLUSTERING
    Introduction
    Hierarchical Agglomerative Clustering
    k-Means Clustering
    Probabilty-Based Clustering
    Finite Mixture Problem
    Classification Problem
    Clustering Problem
    Collaborative Filtering (Recommender Systems)
    References
    Exercises
    4 EVALUATING CLUSTERING
    Approaches to Evaluating Clustering
    Similarity-Based Criterion Functions
    Probabilistic Criterion Functions
    MDL-Based Model and Feature Evaluation.
    Minimum Description Length Principle.
    MDL-Based Model Evaluation
    Feature Selection
    Classes-to-Clusters Evaluation
    Precision, Recall, and F-Measure
    Entropy
    References
    Exercises
    5 CLASSIFICATION
    General Setting and Evaluation Techniques
    Nearest-Neighbor Algorithm
    Feature Selection
    Naive Bayes Algorithm
    Numerical Approaches
    Relational Learning
    References
    Exercises
    PART III: WEB USAGE MINING
    6 INTRODUCTION TO WEB USAGE MINING
    Definition of Web Usage Mining
    Cross-Industry Standard Process for Data Mining
    Clickstream Analysis
    Web Server Log Files
    Remote Host Field
    Date/Time Field
    HTTP Request Field
    Status Code Field
    Transfer Volume (Bytes) Field
    Common Log Format
    Identification Field
    Authuser Field
    Extended Common Log Format
    Referrer Field
    User Agent Field
    Example of a Web Log Record
    Microsoft IIS Log Format
    Auxiliary Information
    References
    Exercises
    7 PREPROCESSING FOR WEB USAGE MINING
    Need for Preprocessing the Data
    Data Cleaning and Filtering
    Page Extension Exploration and Filtering
    De-Spidering the Web Log File
    User Identification
    Session Identification
    Path Completion
    Directories and the Basket Transformation
    Further Data Preprocessing Steps
    References
    Exercises
    8 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING
    Introduction
    Number of Visit Actions
    Session Duration
    Relationship between Visit Actions and Session Duration
    Average Time per Page
    Duration for Individual Pages
    References
    Exercises
    9 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION
    Introduction
    Modeling Methodology
    Definition of Clustering
    The BIRCH Clustering Algorithm
    Affinity Analysis and the A Priori Algorithm
    Discretizing the Numerical Variables: Binning
    Applying the A Priori Algorithm to the CCSU Web Log Data
    Classification and Regression Trees
    The C4.5 Algorithm
    References
    Exercises
    INDEX

相關詞條

熱門詞條

聯絡我們