Installing Python 3.4 & NLTK 3.0
- Instructions -
Installing NLTK and Python (follow these, step-by-step)
- Windows
- Install Python 3.4 (here)
- Install Numpy (here)
- Open a Command Prompt (Look for it in the Start menu under All
Programs->Accessories)
- Change to the Python scripts directory by typing in the Command
Prompt:
- Install NLTK by typing in the Command Prompt:
- Install BeautifulSoup by typing:
- pip install beautifulsoup4
- Mac
- Install Python 3.4 (here)
- Install Easy Setup by saving
ez_setup.py
someplace easy to find, then double-click on the file to run it.
- Open a Terminal window (Look for it in Applications in the
Finder)
- Install Pip by typing in the Terminal window:
- Install Numpy (optional) by typing:
- sudo pip3 install -U numpy
- Install NLTK, by typing:
- sudo pip3 install -U nltk
- Install BeautifulSoup by typing:
- sudo pip3 install -U
BeautifulSoup4
- Make sure BeautifulSoup installed
correctly by typing the following into the IDLE Shell:
- from bs4 import BeautifulSoup
- NLTK Problems
- If you encounter error messages when running a Python program
and the error mentions some package in NLTK, you can fix the problem
by downloading the package. Type the following in an open IDLE
shell:
- import nltk
- nltk.download()
- Once the user interface pops up,
find the missing module and download it.
- Missing Corpus
- If an error message says that names or
movie_reviews is missing, use the same NLTK import
approach as above. Type the following into IDLE:
- import nltk
- nltk.download()
- Click on the Corpora tab, and find the missing corpus and download
it.
- Textblob Missing
- If you see an error message about
textblob not being found, this is
fixed at the command line.
- Windows
- Open a command prompt, and type the following:
- cd
C:\Python34\Scripts
- pip install -U textblob
- cd ..
- python -m textblob.download_corpora
- Mac
- Open a Terminal window (Look for it in Applications in the
Finder), and type the following:
- sudo pip install -U textblob
- sudo python -m textblob.download_corpora
- If you get a textblob not found error message when running
the last command, try this in IDLE:
- import nltk
- nltk.download()
- Click on the Models tab of the window that pops up and
download punkt (Punkt Tokenizer Models).
- If you still get a textblob not found error message, or
similar, try this in the Terminal window:
- sudo -H pip3 install -U textblob
- sudo -H python3 -m textblob.download_corpora
- Fun fact: The reason you might need to do this is because
Macs often come with Python 2.7 pre-installed so you have to use pip3
(for Python 3.x) in this case rather than just pip to install
things in the right place.
Python
NLTK
Finch Robot
- The Finch - creators website, with lots of source code examples is
many languages
Fourth Paradigm Software Tools - Student Discoveries from HW 2
- SPSS - is a data
analysis tool i used for one of my classes. It is very convenient when
looking at how stats interact with others. (Brad)
- Fantasy Football
Nerd - allows you to enter player names in a simulated trade, and
the website processes the players respected stats and projections in
order to calculate whether or not you should perform this trade for the
betterment of your team. (Kyle)
-
Body Mass Index (BMI) calculator - By inputing one's data such as
height and weight, the online calculator is able to evaluate the inputs
and calculate one's BMI. (Emilia)
- Sentiment
Analyzer - analyze the sentiment of English text on a scale of -100
to +100, and as the website likes to advertise- it's free! (Sharon)
- Natural Language Toolkit - a
program that needs to be installed, but involves using natural language
processing to build programs to use with human language data. (Morgan)
- BigQuery - a low
cost analytics database tool that is low cost and allows you to focus on
analyzing data to find meaningful insights and is used by a wide
spectrum of organizations. (Emily)
-
Splunk - organizes data into an easily accessible index that is
perfect for searching specific "text strings" or phrases, and also
allows you to write code in any language. (Emma)
- Hadoop - is a software
library that allows for the organization of data from different clusters
of computers. (Jake)
- Meltwater - assess the tone
of the commentary as a representation for a brand, and also helps me
discover what my target audience is attracted to based on the world of
online data. (Eunice)
- AutoSummarizer - is a form
of Natural Language Processing that summarizes articles of longform
text. It is an example of automatic summarization and their parent
company publishes their API and algorithm for implementation. It is is
an online tool that generates a summary from the original source or text
it was copied from. (Nick, Mike)
- TweetStats - conduct some
sentiment analysis on your Twitter account as well as provide other
information about tweets and related statistics. (Charlie)
-
Box and Whisker Plot - is a graphical method of displaying
variation in a set of data that can also provide additional detail while
allowing multiple sets of data to be displayed in the same graph.
(Olivia)
- SETI@Home - allows
anyone to use their idle computers to process data from radio telescopes
radio looking for possible narrow-bandwith signals from space in the
effort to detect extraterrestrial life. (Kinjal)
- Affectiva - judges the
emotional response of participants to visual stimulus by employing
sophisticated computer vision algorithms. (Kendall)
- Google Analytics -
is a tool used by marketers and website owners to track, report, and
describe all website traffic. (Daniela)
- Terracotta - searches for
the desired data by using many resources which include data sheets,
analyst reports, and white pages. It also has a wide array of industrial
resources which include financial services, eCommerce, Media and
Entertainment, and more. (Xavier)
-
Finds Unusual Words in Arbitrary Texts - analyzes text and determine
if a word is unusual. The author has a list of what he deems as common
words, and if a word does not match one of the common words from the
list it is deemed "unusual." (James)
- Quora - a data intensive
website that always you to search through a huge data base to find the
answer to the question you might have. (Jordan)
- SkyTree - assists users in
building more effective models using a series of algorithms that break
down massive data sets. (Tim)
- Teleport Flock - A tool
for groups whose members may be spread throughout different states or
countries; uses their locations and returns the most efficient city for
a meeting. Returns travel costs, total distances travelled, hotel
information, and more in a nicely formatted layout. (John)
- Text
Analyzer - allows you to find the most frequent phrases and
frequencies of words. Non-English language texts are supported. It also
counts number of words, characters, sentences and syllables. Also
calculates lexical density. (Will)
- Grammarly - is a tool that
checks your spelling and grammar across multiple online applications.
Grammarly gives you weekly grammar/spelling statistics -- tracking
progress and providing suggestions for improvements. (Dan)
|