Note: links to assignments and class presentations are
on the syllabus page.
Syllabus
Description:
The world wide web has changed the nature of problems
people face in dealing with information. Online access
to documents is now largely taken for granted. This
has stimulated a number of technical approaches for
dealing with large amounts of text, as people look for
ways to deal with the flood of information now available.
Several such technologies are grouped under the general
term of text mining. Text mining tools use a combination
of statistics, natural language processing, and other
artificial intelligence techniques to classify, categorize
and summarize documents, and to extract information
from the documents into a usable form such as a
semantic net or database.
This course will be a seminar in applying text mining
tools. We will cover a basic introduction to the field
of text mining, followed by hands-on experience with
several kinds of text mining tools. We will install and apply
tools in the areas of basic natural language processing,
categorization, clustering, summarization and information
extraction.
The class will have three components.
For the first part I will present basic concepts and topics in the area, and make homework assignments that I think reinforce or add insight to these concepts.
For the second component we will explore one or more specific text mining applications, starting with GATE.
For the third component, we will have presentations on additional text mining tools, applications and projects. These will include presentations or demos from some vendors, some that interest me, and some from the students. p>Note: links to assignments and class presentations are on the syllabus page.
Syllabus
Requirements and Grading
Academic Integrity
Student Questionnaire
I am usually on campus only to teach my class; I can meet with you before or after class, or by arrangement at other times. Email is the best way to reach me.
Prerequisites: 8301 (Design and Analysis of Algorithms) and one of: 8520, Special Topics--9010 Web Mining, or permission of instructor. Students should also feel comfortable downloading and installing software from the web.