Custom cover image
Custom cover image

Citation-based Plagiarism Detection : Detecting Disguised and Cross-language Plagiarism using Citation Pattern Analysis / by Bela Gipp

By: Resource type: Ressourcentyp: Buch (Online)Book (Online)Language: English Series: SpringerLink BücherPublisher: Wiesbaden ; s.l. : Springer Fachmedien Wiesbaden, 2014Description: Online-Ressource (XXVI, 350 p. 70 illus, online resource)ISBN:
  • 9783658063948
Subject(s): Genre/Form: Additional physical formats: 9783658063931 | Erscheint auch als: Citation-based plagiarism detection. Druck-Ausgabe Wiesbaden : Springer Vieweg, 2014. XXVI, 350 SDDC classification:
  • 006.42
  • 006 23
RVK: RVK: SR 8500 | ST 515LOC classification:
  • QA75.5-76.95
DOI: DOI: 10.1007/978-3-658-06394-8Online resources:
Contents:
Acknowledgements; Contents; List of Tables; List of Figures; Glossary; Abstract; Kurzfassung; 1 Introduction; 1.1 Problem Setting; 1.2 Motivation; 1.3 Research Objective; 1.4 Thesis Outline; 2 Plagiarism Detection; 2.1 Academic Plagiarism; 2.1.1 Definition; 2.1.2 Forms of Academic Plagiarism; 2.1.3 Prevalence of Plagiarism in the Academic Environment; 2.2 Plagiarism Detection Approaches; 2.2.1 Generic Detection Approach; 2.2.2 Overview of Plagiarism Detection Approaches; 2.2.3 Fingerprinting; 2.2.4 Term Occurrence Analysis; 2.2.5 Stylometry; 2.2.6 Cross-Language Plagiarism Detection
2.3 Plagiarism Detection Systems2.3.1 Evaluations of PDS; 2.3.2 Technical Weaknesses of PDS; 2.4 Conclusion; 3 Citation-based Document Similarity; 3.1 Terminology; 3.1.1 Citation vs. Reference; 3.1.2 Similarity vs. Relatedness; 3.1.3 Dimensions of Similarity: Lexical, Semantic, Structural; 3.2 Citation-based Similarity Measures; 3.2.1 Direct Citation; 3.2.2 Bibliographic Coupling; 3.2.3 Co-citation; 3.2.4 Amsler; 3.2.5 Co-citation Proximity-based Methods; 3.3 Conclusion; 4 Citation-based Plagiarism Detection; 4.1 Concept; 4.1.1 Citing Behavior; 4.2 Citation Characteristics Considered
4.2.1 Bibliographic Coupling Strength4.2.2 Probability of Citation Co-occurrence; 4.2.3 Order and Proximity of Citations; 4.3 Challenges to Citation Pattern Identification; 4.3.1 Unknown Pattern Constituents; 4.3.2 Transpositions; 4.3.3 Scaling; 4.3.4 Insertions or Substitutions of Citations; 4.4 Design of Citation-based Detection Algorithms; 4.4.1 Bibliographic Coupling (BC); 4.4.2 Longest Common Citation Sequence (LCCS); 4.4.3 Greedy Citation Tiling (GCT); 4.4.4 Citation Chunking (Cit-Chunk); 4.5 Projected Suitability of CbPD Algorithms for Plagiarism Forms
4.6 Assessment of Identified Citation Patterns4.6.1 Citing Frequency-Score (CF-Score); 4.6.2 Continuity-Score (Cont.-Score); 4.7 Conclusion; 5 Prototype: CitePlag; 5.1 Document Parser; 5.2 Database; 5.2.1 Consolidation of Reference Identifiers; 5.3 Detector; 5.4 Frontend; 5.5 Conclusion; 6 Quantitative and Qualitative Evaluation; 6.1 Methodology; 6.1.1 Test Collection Requirements; 6.1.2 Test Collection Challenges; 6.1.3 GuttenPlag Wiki; 6.1.4 VroniPlag Wiki; 6.1.5 PubMed Central OAS; 6.1.6 Summary and Comparison of Test Collections; 6.2 Evaluation using GuttenPlag Wiki
6.3 Evaluation using VroniPlag Wiki6.3.1 Evaluation: Random Sample of Sources; 6.3.2 Evaluation: Translated Plagiarism; 6.3.3 Evaluation: Plagiarism Case Heun; 6.3.4 Conclusion VroniPlag Wiki; 6.4 Evaluation using PubMed Central OAS36; 6.4.1 Methodology; 6.4.2 Results; 6.4.3 Conclusion of PMC OAS Evaluation; 6.5 Conclusion of Evaluations; 7 Summary & Future Work; 7.1 Summary; 7.2 Contributions; 7.3 Future Work; 7.3.1 General Research Need; 7.3.2 Improvements to Detection Accuracy; 7.3.3 Additional Applications; 7.3.4 Further Evaluations; References; Appendix
A Preliminary PMC OAS Corpus Analysis
Summary: Plagiarism is a problem with far-reaching consequences for the sciences. However, even today’s best software-based systems can only reliably identify copy&paste plagiarism. Disguised plagiarism forms, including paraphrased text, cross-language plagiarism, as well as structural and idea plagiarism often remain undetected. This weakness of current systems results in a large percentage of scientific plagiarism going undetected. Bela Gipp provides an overview of the state-of-the art in plagiarism detection and an analysis of why these approaches fail to detect disguised plagiarism forms. The author proposes Citation-based Plagiarism Detection to address this shortcoming. Unlike character-based approaches, this approach does not rely on text comparisons alone, but analyzes citation patterns within documents to form a language-independent "semantic fingerprint" for similarity assessment. The practicability of Citation-based Plagiarism Detection was proven by its capability to identify so-far non-machine detectable plagiarism in scientific publications. Contents Current state of plagiarism detection approaches and systems Citation-based Plagiarism Detection Target Groups Readers interested in the problem of plagiarism in the sciences Faculty and students from all disciplines, but especially computer science The Author Bela Gipp is a postdoctoral researcher at the University of California, BerkeleyPPN: PPN: 1658604237Package identifier: Produktsigel: ZDB-2-SCS
No physical items for this record

Powered by Koha