Custom cover image
Custom cover image

Automated data collection with R : a practical guide to web scraping and text mining / Simon Munzert, Christian Rubba, Peter Meißner, Dominic Nyhuis

By: Contributor(s): Resource type: Ressourcentyp: Buch (Online)Book (Online)Language: English Publisher: Hoboken : Wiley, 2014Description: Online-Ressource (XXII, 452 S.)ISBN:
  • 9781118834787
  • 9781322236414
  • 1322236410
Subject(s): Additional physical formats: 9781322236414 | 9781118834817 | Erscheint auch als: Automated data collection with R. Druck-Ausgabe Chichester : Wiley, 2015. xxii, 452 SeitenDDC classification:
  • 006.312
  • 006.3/12
RVK: RVK: ST 250LOC classification:
  • QA76.9 .D343 M865 2014
Online resources: Summary: A hands on guide to web scraping and text mining for both beginners and experienced users of R Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL. Provides basic techniques to query web documents and data sets (XPath and regular expressions). An extensive set of exercises are presented to guide the reader through each technique. Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management. Case studies are featured throughout along with examples for each technique presented. R code and solutions to exercises featured in the book are provided on a supporting website.Summary: Intro -- Automated Data Collection with R -- Contents -- Preface -- What you won't learn from reading this book -- Why R? -- Recommended reading to get started with R -- Typographic conventions -- The book's website -- Disclaimer -- Acknowledgments -- 1 Introduction -- 1.1 Case study: World Heritage Sites in Danger -- 1.2 Some remarks on web data quality -- 1.3 Technologies for disseminating, extracting, and storing web data -- 1.3.1 Technologies for disseminating content on the Web -- 1.3.2 Technologies for information extraction from web documents -- 1.3.3 Technologies for data storage -- 1.4 Structure of the book -- Part One A Primer on Web and Data Technologies -- 2 HTML -- 2.1 Browser presentation and source code -- 2.2 Syntax rules -- 2.2.1 Tags, elements, and attributes -- 2.2.2 Tree structure -- 2.2.3 Comments -- 2.2.4 Reserved and special characters -- 2.2.5 Document type definition -- 2.2.6 Spaces and line breaks -- 2.3 Tags and attributes -- 2.3.1 The anchor tag -- 2.3.2 The metadata tag -- 2.3.3 The external reference tag -- 2.3.4 Emphasizing tags , , -- 2.3.5 The paragraphs tag -- 2.3.6 Heading tags , , , -- 2.3.7 Listing content with , , and -- 2.3.8 The organizational tags and -- 2.3.9 The tag and its companions -- 2.3.10 The foreign script tag -- 2.3.11 Table tags , , , and -- 2.4 Parsing -- 2.4.1 What is parsing? -- 2.4.2 Discarding nodes -- 2.4.3 Extracting information in the building process -- Summary -- Further reading -- Problems -- 3 XML and JSON -- 3.1 A short example XML document -- 3.2 XML syntax rules -- 3.2.1 Elements and attributes -- 3.2.2 XML structure -- 3.2.3 Naming and special characters -- 3.2.4 Comments and character data -- 3.2.5 XML syntax summary -- 3.3 When is an XML document well formed or valid?.PPN: PPN: 816332533Package identifier: Produktsigel: ZDB-26-MYL | ZDB-30-PAD | ZDB-30-PQE
No physical items for this record

Powered by Koha