Automated data collection with R : a practical guide to web scraping and text mining / Simon Munzert, Christian Rubba, Peter Meißner, Dominic Nyhuis

By:

Munzert, Simon, 1985- [VerfasserIn]

Contributor(s):

Resource type: Ressourcentyp: Buch (Online)Book (Online)Language: English Publisher: Hoboken : Wiley, 2014Description: Online-Ressource (XXII, 452 S.)ISBN:

9781118834787
9781322236414
1322236410

Subject(s):

Additional physical formats: 9781322236414 | 9781118834817 | Erscheint auch als: Automated data collection with R. Druck-Ausgabe Chichester : Wiley, 2015. xxii, 452 SeitenDDC classification:

006.312
006.3/12

RVK: RVK: ST 250LOC classification:

QA76.9 .D343 M865 2014

Online resources:

Zugang im Netz des KIT

Summary: A hands on guide to web scraping and text mining for both beginners and experienced users of R Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL. Provides basic techniques to query web documents and data sets (XPath and regular expressions). An extensive set of exercises are presented to guide the reader through each technique. Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management. Case studies are featured throughout along with examples for each technique presented. R code and solutions to exercises featured in the book are provided on a supporting website.Summary: Intro -- Automated Data Collection with R -- Contents -- Preface -- What you won't learn from reading this book -- Why R? -- Recommended reading to get started with R -- Typographic conventions -- The book's website -- Disclaimer -- Acknowledgments -- 1 Introduction -- 1.1 Case study: World Heritage Sites in Danger -- 1.2 Some remarks on web data quality -- 1.3 Technologies for disseminating, extracting, and storing web data -- 1.3.1 Technologies for disseminating content on the Web -- 1.3.2 Technologies for information extraction from web documents -- 1.3.3 Technologies for data storage -- 1.4 Structure of the book -- Part One A Primer on Web and Data Technologies -- 2 HTML -- 2.1 Browser presentation and source code -- 2.2 Syntax rules -- 2.2.1 Tags, elements, and attributes -- 2.2.2 Tree structure -- 2.2.3 Comments -- 2.2.4 Reserved and special characters -- 2.2.5 Document type definition -- 2.2.6 Spaces and line breaks -- 2.3 Tags and attributes -- 2.3.1 The anchor tag -- 2.3.2 The metadata tag -- 2.3.3 The external reference tag -- 2.3.4 Emphasizing tags , , -- 2.3.5 The paragraphs tag -- 2.3.6 Heading tags , , , -- 2.3.7 Listing content with , , and -- 2.3.8 The organizational tags and -- 2.3.9 The tag and its companions -- 2.3.10 The foreign script tag -- 2.3.11 Table tags , , , and -- 2.4 Parsing -- 2.4.1 What is parsing? -- 2.4.2 Discarding nodes -- 2.4.3 Extracting information in the building process -- Summary -- Further reading -- Problems -- 3 XML and JSON -- 3.1 A short example XML document -- 3.2 XML syntax rules -- 3.2.1 Elements and attributes -- 3.2.2 XML structure -- 3.2.3 Naming and special characters -- 3.2.4 Comments and character data -- 3.2.5 XML syntax summary -- 3.3 When is an XML document well formed or valid?.PPN: PPN: 816332533Package identifier: Produktsigel: ZDB-26-MYL | ZDB-30-PAD | ZDB-30-PQE

Holdings ( 0 )

No physical items for this record

Print
Cite
Add to your cart (remove)
Save record
BIBTEX Dublin Core MARCXML MARC (non-Unicode/MARC-8) MARC (Unicode/UTF-8) MARC (Unicode/UTF-8, Standard) MODS (XML) RIS ISBD
More searches

Search for this title in:
Karlsruher Virtueller Katalog Bibliotheksportal Karlsruhe Google Scholar Andere Bibliotheken (WorldCat)