Wednesday, 10 March 2010

Irrelevant Asides

Moans and Whines

Why can't people put dates on web pages? It's infuriating to find an interesting web site and then when you try to track down the author to praise him/her/it or whatever you can't find her/it/him because it/he/she wrote the site five years ago and then moved to another ISP/host/university/company/planet. It's not hard to do, especially if you use intelligent tools like Emacs Wiki Mode (thanks again JohnWiegley).

Winamp

Am I the only person in the world who dislikes Winamp? In fact it's more than dislike: I think it has one of the worst user interfaces ever created and I regard myself as something of an expert in that having created some absolutely screamingly awful things myself, not to mention having to use many of M$'s worst offerings!

Compilers

Came across yet another reason to dislike M$. I accidentally added a dot before a constant name in an err.raise statement in a VB6 program. The compiler silently compiled this and it wasn't until I ran it on a machine that tripped the error that this fault showed up. Of course it was in the error handling code so instead of showing me my nice pop up dialog and logging it to the log file it simply crashed with M$'s unhelpful 'does not support this method' message. I had to instrument half the program (well it seemed like that!) before I tracked it down.

I know that unit tests might have helped but really this is a simple piece of static analysis that is part of the compiler anyway!

And getting enough coverage to trip all the errors in unit tests would be a major job both to create and to maintain.

Automated Plagiarism Detection

I have been searching for tools to help maintain medium sized software projects for some time. In a large program created by a geographically, culturaly, and educationally diverse programming team one problem that recurs over and over is code that is copied from place to place to solve several similar problems. This is a problem because if the original copy of the algorithm has a bug then the copies quite likely have it too, unfortunately the author of the original may not be aware that the code has been copied nor by whom so it is difficult for him or her to fix all the copies. This is just the standard argument for parametric procedures. So it would be good to have a method that would let us scour the code for duplicates and near duplicates, but when I ask Google to find pages matching 'code similarity' many of the hits are academics complaining that their students present copies of someone else's work as their own and then crowing that they have created a wonderful new tool to detect such plagiarism. The tool is, of course, never open source and is only available to bona fide academics just in case the students could figure out how it works and thus defeat it. The only open source code that i know of for this purpose is my own implementation of the DotPlot 'algorithm'.

This page is my counter blast to automated plagiarism detection.

It seems to me that the justification for the development of automated plagiarism detectors rests on faulty foundations. The argument seems to go something like this:

  • plagiarism is a form of cheating,
  • cheating is a way of gaining monetary or other advantage with less effort than doing proper work, that is, it is unfair,
  • those responsible for awarding degrees, certificates, etc. are only willing to look at the end result of a process,

therefore we must have automated methods of detecting plagiarism.

I hope I haven't created a straw man here. Anyway the argument is arrant drivel, at least as far as software development is concerned. All good programmers plagiarise other programmers' work for the simple and honourable reason that one can see further by standing on the shoulders of a giant (to paraphrase Newton). Even standing on the shoulders of a pygmy can help you see over the tall grass. It seems to me that the real driving force behind automated plagiarism detection computer science courses is that the instructors are following a mechanical process where the only requirement to pass the course is to produce a finished piece (no doubt several pieces) of work that performs according to the specification. Well, in the real world of software development there is almost never a finished program, only one that is temporarily stable enough to use while the developer takes a breather and tries to map his way to the next version. Software development is a process, no amount of examination of the final result will tell you how good the developer was at dealing with incomplete and contradictory specifications, deadlines that were totally unreasonable, customers who change the rules half way though. Plagiarism detection would be unnecessary if the avowed purpose of the computer science course was to produce students who could cope with those difficulties because the grades awarded would depend more on the students behaviour and less on the actual text of the final program. One classic way of achieving some part of this is to conduct a viva voce examination, known in the software development industry as a code review. In an academic context this could be done in the old fashioned IBM style where the author is required not merely to explain the code but to defend the decisions that produced it. This means that the student would have to maintain a log book containing notes showing the development of the solution to the programming problem. Another way of expressing this is to say that computer science course should be more like physics where the actual answer is less important than the log showing the method that was used to arrive at it. A physics student is not penalised for repeating one of Faraday's experiments, quite the reverse, he would be praised if he succeeded in performing it as well as the great man. On the other hand even if he made a much more accurate determination of the force due to the charge on two plates than Faraday no one would be particulary interested unless the method were different, it is the process that counts.

No comments:

Post a Comment

Blog Archive

Followers