Having been falsely accused of plagiarism while in school, I worry about
any tool might come up with false positives.  In my case, I worked
extremely hard on a French paper, was much more verbose than normal, did
some research to figure out some tenses that we had not learned in class,
and the end result was an accusation of plagiarism.  I lost pretty much all
of my respect for that teacher, and did not do a whole lot of extra work on
assignments after that.

Please be certain before accusing.  For a small assignment, given the
specification, I would not be surprised to find 2 (or more) students with
similar designs.  I remember a conference talk on n-version software, where
the author (the name escapes me) said that they needed to have (I think) 8
independent implementations before a significant number of design errors
would be caught.  Not only were the designs similar, they made the same
mistakes!

Statistics and probability only work with large numbers.  Trying to catch a
single cheater using a tool that says "The probability that these 2
programs were written by different people is 0.0001" is not valid.  Over
the long term, using this type of tool might be able to confidently state
what percentage of the students were cheating, but you can not confidently
state which ones did it.

Roger Racine

At 02:58 PM 7/24/2002 , Carlisle Martin C Dr USAFA/DFCS wrote:
>See, e.g.:
>
>http://www.wired.com/news/topstories/0,1287,10464,00.html
>
>Martin C. Carlisle, PhD
>Associate Professor and Advisor in Charge
>Department of Computer Science
>United States Air Force Academy
>
>
>
>-----Original Message-----
>From: Alan Barnes [mailto:[log in to unmask]]
>Sent: Wednesday, July 17, 2002 10:38 AM
>To: [log in to unmask]
>Subject: Plagarism Detection
>
>
>Can anyone recommend any software to assist in plagarism detection in Ada 95
>programs?  We have around 200 students doing initial programming courses
>each submitting 5 programming assignments (a few hundred lines of code each)
>over the course of 2 semesters.
>
>In the last two years plagarism has become a major problem.  Generally the
>attempts to disguise the copying are not very sophisticated and rarely go
>beyond altering comments and identifiers and changing the order of
>declarations.
>
>Occasionally a "pick and mix" strategy is used where students copy
>subprograms selected randomly from several different sources. This is
>typical in exercises of the form: given a package spec. for an ADT xxxx,
>write and test a suitable package body.
>
>Thanks
>Alan Barnes
>
>--
>Dr. Alan Barnes, MInstP, FRAS
>Lecturer in Computer Science
>Computer Science          Telephone: +44 121 359 3611 Ext. 4663
>Aston University          E-Mail:  [log in to unmask]
>Aston Triangle            Fax:     +44 121 333 6215
>Birmingham B4 7ET         WWW:     http://www.cs.aston.ac.uk/~barnesa
>U. K.

Roger Racine
Draper Laboratory, MS 31
555 Technology Sq.
Cambridge, MA 02139, USA
617-258-2489
617-258-3939 Fax