Gross plagiarism is easy to spot and most people agree it’s wrong, so it’s relatively easy to deal with. But while stealing somebody else’s paper and pretending it’s your own is obvious misconduct, it’s surprisingly hard to define exactly what plagiarism is, especially for more minor offences. It would be helpful if we could agree a definition of plagiarism (or a classification of different types) so that editors (and teachers) could decide how they should handle it/them. Editors now have access to powerful text-matching software (such as CrossCheck or even a simple Google™ search). It’s now easy to discover the percentage of text in one document that matches text in another (or several others). But it’s much more difficult to know what those numbers mean. In fact, one editor I know says that the numbers are meaningless (although she admits that the tools are helpful for flagging up possible problems and then looking for large matches).
I agree that it wouldn’t be helpful to rule that, say, anything above 50% matching text was major plagiarism, anything from 20-49% constituted minor plagiarism, and <20% was simply chance. While the amounts are helpful, they are only one aspect that should be considered. It’s also important to realise that data and figures can be plagiarised but won’t be picked up with text-matching software. Similarly, if work is translated and then appropriated, the words won’t match but this is clearly a form of plagiarism. And it is also possible to plagiarise somebody’s theory or analytical framework but express it in different words and claim it as your own, so the definition needs to cover more than simply identical text.
Another problem is that software can spot identical strings of words but can’t distinguish between common terms and sparks of original genius. To illustrate the problem, if you Google the phrase “p<0.005 was considered statistically significant” you’ll find 588,000 documents that contain it or 410,000 stating that research was “performed according to the Declaration of Helsinki.” Nobody knows who first used these strings and probably nobody cares, but other shorter strings such as “the winter of our discontent” (Shakespeare) or “the end of the beginning” (Churchill) are clearly quotations which ought to be attributed.
So, it’s hard to define plagiarism but COPE (the Committee on Publication Ethics) wants to try, and would appreciate your help. We’ve produced a discussion document (available at: http://www.publicationethics.org/resources/discussion-documents)
and would like comments from anyone who’s interested (researchers, authors, students, academics, editors, and readers).
Conflict of interest: I wrote the COPE discussion document on plagiarism and am chair of COPE …. so this is blatant advertising, but it isn’t plagiarised!
Liz Wager PhD is a freelance medical writer, editor, and trainer. She is the current chair of the Committee on Publication Ethics (COPE).