Lab 7 - Designing a Classifier
Evolution and Learning in Computational and Robotic Agents
MSE 2400 Dr. Tom Way
Introduction
- In this lab project, you will use the Python programming language to
create text classifiers. You will have the opportunity to develop your own
approach and even to modify the code if you like. This lab will give you a
chance to reinforce what you have learned about classification and apply it
in a hands-on way.
- Problems? If you have issues with Python, such as error messages about
missing packages, functions or corpora, see the class
Resources page for tips and ideas on
fixing problems.
- For this lab, you can work alone or with a partner.
Worth
What to Hand In
- A single, completed copy of this lab handout with all the names of your
research team (of one to three people)
- An email to the instructor with your improved gender2.py program
included as an attachment.
Due
- At the end of the lab session designated for this work, or at a time
mutually agreed to by you and the instructor.
Lab Steps
Part 1 - Create a Basic Gender Classifier
- Open the IDLE Python editor, type in the following program, and save it
as "gender1.py":
def gender_of(name):
# make name all lowercase
name = name.lower()
# check last letter of name
if name[-1] in ['a', 'e', 'i']:
return 'female'
elif name[-1] in ['k', 'n', 'o', 'r', 's', 't']:
return 'male'
else:
return 'unknown'
# Main program
while True:
# Have the user type in a name
name = input('Enter a name (or exit)>')
name = name.strip()
# Exit if that's what they want
if name.lower() == 'exit':
break
# Otherwise, print the name and gender
else:
print('{} is {}'.format(name,
gender_of(name)))
- Run the program and try it out a number of times. Look over the code and
try to understand what it is doing. (Hint: it is looking at a specific
letter in each name and using it to tell whether it is a male or female
name.) Write down some notes on how you think it is trying to determine the
gender of a name.
Part 2 - Improving the Basic Gender
Classifier
- Using the same program, see if you can
determine why it does poorly sometimes and how you might make it better. Think about
the names of people you know and what features of those names could be used
to tell whether it is a male or female name. Do some quick research online
about names if you like. Then, jot down a few notes about how
you think the program could be more accurate:
- Write a brief hypothesis about how the accuracy of the program will do
if you incorporated the ideas you came up with.
- Improve the program with one or more of your ideas by adding to or
modifying the code. For assistance, refer to one of the online tutorials we
looked at in a previous lab (Computer
Science Circles)
or do a Google search for easy Python tutorials (be sure to look for "Python
3" or "Python 3.4" rather than "Python 2" or "Python 2.7" as there are
slight difference between the older and newer versions).
- How did your improved version do compared to the original? Write down
whether or not your hypothesis was correct, including anything you noticed
about the accuracy getting better or worse and why you think it happened.
- Demonstrate your new version to the instructor or TA and
have them initial here: ____________
Part 3 - Using Naïve Bayes to Classify
Gender
- Save the gender2.py program to a folder where
you can find it. Then open it in IDLE.
- Briefly review the code, and notice it is more complex than the previous
gender classification code. Read through the code, looking at the comments
to get a better understanding of what it is doing.
- Run the program and observe its behavior. In particular, make a note
here about the initial information: number of male and female names and the
Classifier Accuracy:
- Continue testing the program by trying some of the same names you used
in Part 2 (above) and write down how the accuracy of this version compares.
Does it do a better job at classifying names by gender?
- Try modifying the TRAIN_SIZE and TEST_SIZE to be larger or smaller, and
run the program again. Do this a few times with different sizes and compare
the Classifier Accuracy of each, and create a list below of the TRAIN_SIZE,
TEST_SIZE and Classifier Accuracy for each of your tests.
- What did you observe about the affects, if any, on Classifier Accuracy
of changing the TRAIN_SIZE and TEST_SIZE? And if you saw an affect, why do
you think is the reason?
Part 4 - Improving the Naïve Bayes Gender
Classifier
- Write notes about how you might be able to improve the Classifier
Accuracy of this classifier. Use a combination of common sense, careful
thinking and analysis of what is happening, and information you might glean
by looking over sections 1.1 and 1.2 of
Chapter 6 of the NLTK Book.
- Modify the program, trying one or more of your ideas. You will probably
need to experiment and try a few things to see what works, what helps, and
what has no impact on accuracy.
Focusing on selecting the best features to use for classification is an
excellent approach. See if you can add to the gender_features
function. There are ideas in
section 1.2 of the NLTK Book and in the comment section all the way at
the end of the gender2.py program.
You might also decide to set the TRAIN_SIZE and TEST_SIZE to
whatever results you found were best in the previous experiment.
- Test out a few of your ideas, and keep a record (a list) below of what
you tried and how it changed the accuracy. Try to get the best accuracy you
can. Your list should have a brief Description of the modification
and the Classifier Accuracy that resulted from the modification.
- What was the approach you discovered that produce the best Classifier
Accuracy and why do you think it worked? Be prepared to shared what you
discovered during a class discussion.
- Demonstrate your new version to the instructor or TA and
have them initial here: ____________
Part 5 - Building a Sentiment Analysis
Classifier for Tweets
- Download and save the tweet_classify.py
program.
- Run the program and observe its behavior, and note its Accuracy
here: ____________
- Experiment by changing the TWEET_PCT a number of times, anywhere
between 0.0 and 100.0, and try to determine the best perctage of tweets to
use for training. List the results of your experiments here, with the
TWEET_PCT you used and the resulting Accuracy.
- Using the best TWEET_PCT you found, run the program again and record
information about the top 5 Most Informative Features exactly as they are
displayed. You will use these for comparison later. For more details,
uncomment the call to show_tweet_results.
Part 6 - Improving the Tweet Data
- Look through the list of training and testing tweets, called tweets. Think about any experience you have with Twitter
and writing or reading tweets, and edit the list of tweets to be more
reflective of what real tweets are like. Change and add enough good tweets,
some positive and some negative,
so the list has at least 25 tweets in it.
Demonstrate your enhanced version to the instructor or TA and get initials: ____________
- Make sure TWEET_PCT is set to the best value you found before,
and run the program again with this new data, and record the new Accuracy.
- If your Accuracy was reduced, repeat the experiment above to find
the TWEET_PCT and leave it set at that value, recording results of
these additional experiments here (list TWEET_PCT and Accuracy).
- Again, record information about the top 5 Most Informative Features
exactly as they are displayed. Compare with earlier important features and
note any changes you observe.
Part 7 - Adding Data Automatically
- Now, you're going to add more training data automatically. Find the
section in the code that is "commented out". In IDLE, highlight the entire
section, starting with "### START HERE" all the way down to, and including,
"### END HERE". Once highlighted, go to the menu and choose
Format->Uncomment Region. The section of code will now be uncommented and
ready to run.
- Run the program again and observe the additional output. Note any change
in the Accuracy reported here:
- Because the additional training (and testing) data are from movie
reviews, their content will be slightly different than a tweet. Don't be
surprised if the results are worse and not significantly better. Now,
experiment with at least 5 different values of the MORE variable, and
see if raising or lowering that number (try values like 10, 50, 100, 200,
500, etc.) and see if you can find a "best" value for MORE, which is
the number of additional items of training data that are added. Record your
results here, listing the value for MORE and the corresponding
Accuracy.
- What was the best value you found for MORE and what were the 5
Most Informative Features this time?
- Overall, how to you think the classifier does at reporting sentiment?
How accurate is it, and do you feel that is good enough? Why or why not?
- Do you think this tool could be useful if you needed to evaluate a
continuous stream of tweets in real time? Why or why not? Where do you think
it does well and where does it fall short?
- Finally, demonstrate your final version to the instructor or TA and get
initials: ____________