Here is a list of useful programms, which implement functions for the CATY toolbox. Most of them include therit source under GPL. I encurage possible implementers to look at them for reference only.
Do get an impression of how they work!
Do your own implementation!
Do not copy any code!
Chgrep
'chgrep' searches the input files (or standard input if no files are named) for oldpattern and changes them to newpattern (grep doesn't support this). You can use .lock files (or another extend). It is useful in (but not limited to) mail servers.
Learn more about it at:
Get sources at:
http://www.bmk.bicom.pl/chgrep/chgrep-1.2.2.tgz
Meld
'Meld' is a GNOME 2 diff and merge tool. It lets you edit files in place (diffs update dynamically), and a middle column shows detailed changes and allows merges. It has user-friendly diff-browsing. The margins show location of changes, and it also has a tabbed interface that lets you open multiple diffs at once.
Learn more about it at:
Get sources at:
http://prdownloads.sourceforge.net/meld/meld-0.9.1.tgz?use_mirror=twtelecom
Bow
Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering
Learn more about it at:
http://www-2.cs.cmu.edu/~mccallum/bow/
Get sources at:
http://www-2.cs.cmu.edu/~mccallum/bow/src/
Provided in the library source distribution, there are currently three executable programs based on the library.
- Rainbow is an executable program that does document classification. While mostly designed for classification by naive Bayes, it also provides TFIDF/Rocchio, Probabilistic Indexing and K-nearest neighbor.
- Arrow is an executable program that does document retrieval. It currently only performs simple TFIDF-based retrieval.
- Crossbow is a an executable program that does document clustering (and also classification).
JPlag
JPlag is a system that finds similarities among multiple sets of source code files. This way it can detect software plagiarism. JPlag does not merely compare bytes of text, but is aware of programming language syntax and program structure and hence is robust against many kinds of attempts to disguise similarities between plagiarized files. JPlag currently supports Java, C, C++, Scheme, and natural language text.
Learn more about it at:
http://www.ipd.uka.de/jplag/
Get it at:
http://wwwipd.ira.uka.de:2222/user.cgi
--
BernhardGroehl - 28 Dec 2003