CSC 9010: Text Mining Applications
Spring, 2012
Paula Matuszek
Adjunct Professor, Villanova
E-mail: or
Phone: (610) 647-9789

Description: The internet has changed the nature of problems people face in dealing with information. Online access to documents is now largely taken for granted; the amount of information available in text form is massive and still expanding. This has created an entirely new problem: how to deal with a flood of documents relevant to some question.

Search tools such as Google have become increasingly sophisticated at retrieving an appropriate document or piece of information. However, search cannot by itself deal with knowledge which is spread across a large corpus of documents. Other tools are needed for that; several such technologies are grouped under the general term of text mining. Text mining tools include techniques from natural language processing, data mining, machine learning, and other areas of AI. They are applied to large corpora of documents to tasks such as:

This course will be a seminar in applying text mining tools. We will cover a basic introduction to the field of text mining, followed by hands-on experience with several kinds of text mining tools. We will install and apply tools in areas such as basic natural language processing, categorization, clustering, summarization and information extraction.

The class will have three components.

Fundamentals of Predictive Text Mining. Sholom M. Weiss, Nitin Indurkhya, Tong Zhang. Springer, 2010.
ISBN-10: 1849962251,
ISBN-13: 978-1849962254

Course links:
NLTK and Python code snippets
Requirements and Grading
Academic Integrity
Student Questionnaire

I will be on campus primarily to teach class; I can meet with you before or after class, or by arrangement at other times. Email is the best way to reach me.

Some interesting links