Simulation & Tools
Group
This group is exploring the
use of various simulation techniques for research and education, and the
development of independent software tools that solve problems in
non-computer-science domains.
Project: Snitch!
Spotting & Neutralizing Internet Theft
by CHeaters
Purpose: Create an
application that scans student technical research papers to detect instances
of plagiarism from the Internet.
Researchers: Tom Way
Research Alumni:
Sebastian Niezgoda, Joseph Bruno, Purushotham Ch
Description:
Snitch is a Java application that scans the
text in a student paper, identifying passages that might be plagiarized,
searching the Internet for matching web sites that contain the passages, and
finally presenting an HTMLized version of the original student paper with
embedded links to any plagiarized material.
Tools
Resources
References
-
Automatic Conceptual Analysis for Plagiarism Detection - Heinz
Dreher, Issues in Informing Science and Information Technology Volume 4,
2007.
-
Plagiarism Detection Software - Marlin Thomas, E-Leader Bangkok,
2008.
-
Detecting and Tracing Plagiarized Documents by Reconstruction
Plagiarism-Evolution Tree - Chang-Keon Ryu, Hyong-Jun Kim, Seung-Hyun
Ji, Gyun Woo, and Hwan-Gue Cho, 2008.
-
Technical Review of Plagiarism Detection Software Report, Joanna
Bull, Carol Collins, Elisabeth Coughlin, Dale Sharp, Technical Report,
Joint Information Systems Committee, Computer Assisted Assessment
Centre, University of Luton, Bedfordshire, UK, 2001.
-
Issues Raised by Use of Turnitin Plagiarism Detection Software -
online article at Cyberdash.com.
-
Comparison of some plagiarism detection software - on German blog,
2007. Version of
comparison study for 2008 is underway.
Current Tasks
- Develop Java class that performs
Flesch-Kincaid Grade Level analysis of textual input. Test application
should enable the user to select a text file, open it, analyze it and
display appropriate statistics such as grade level for each sentence and
paragraph, and counts of characters, words, sentences, and paragraphs.
- Create example program that converts
Microsoft Word document into text document
- Create example program that converts
PDF document into text document
- Locate as many search APIs as possible
and design examples of how to use them in a Java program.
- Determine if there is a way to make
use of the newer Google web application API (because they are phasing
out the SOAP API, so it will not be feasible to make Google searches
from a standard Java application). In other words, could SNITCH be made
into a web-based application rather than a stand-alone application?
- Explore use of Java 2 BreakIterator class
for managing input tokenization
- Investigate using MOSS to create a
user interface for programming project plagiarism detection.
Project Plan
- Download & install Jigloo to help
with UI design and devel
- Refine user interface: better design,
better functionality, prettier, make buttons same size
- Create better HTML report generation,
add viewer
- Refactor code as needed, make more
object-oriented, improve windowing approach
(document->paragraphs->sentences->words)
- Find a way to handle Word doc and PDF input
files
- Devise better candidate selection
algorithm, write up specification for it, see if Fleischer Scale is
applicable, other analysis techniques
- Research approaches to plagiarism
detection, both automated and manual
- Alpha release candidate goal: Summer
2008
Project: Algorithms & Data
Structures for Business Analysis
Purpose: Develop the
theory and framework for a proprietary business analysis approach
Researchers: Tom Way,
Mike Peterson (Univ. of Delaware)
Description:
We have
developed a k-layers, massively interconnected data structure and analysis
framework for use in Dr. Peterson's organizational culture research. This
technology has been implemented in a software tool that provides a flexible and
powerful means to manipulate large data sets, enabling a sophisticated,
concept-cluster-based, stimulus-response analysis. The analysis algorithm
and data structure significantly improve upon early analysis methods, making it
possible to conduct the complex task in a matter of hours rather than days or
weeks.
Current plans are to fully develop the software prototype tool, and to refine
the data structures and algorithms used in the analysis to improve the tool's
efficiency.
Tasks:
- Identify salient technical innovations from the project
- Prepare write-up of the technical aspects
updated:
10/01/09
actlab.csc.villanova.edu |