Keywords

.NET (3) .rb (1) *.cod (1) 3110c (1) Algorithm (1) Amazon Cloud Drive (1) amkette (1) Android (1) Apex (6) apex:dynamic (1) API (1) API version (1) Application Development Contest (2) Artificial Intelligence (2) Atricore (1) b2g (1) Binary Search Tree (1) Blackberry Application Development (1) Blackberry Java Development Environment (1) Blender Game Engine (1) bluetooth (2) Boot2Gecko (1) bug fix (1) C (1) C++ (2) Cloud computing (1) Cloud Storage (1) Code Blocks (1) Code for a Cause (2) codejam (1) Coding (1) const_cast (1) Custom Help (1) Dancing With the Googlers (1) Data Structures (1) desktop environment (5) Doubly Linked List (1) Dropbox (1) dynamic visualforce component (1) dynamic_cast (1) Enterprise WSDL (1) Execution Context (1) fedora 14 (1) fedora 17 (5) Firefox OS (1) Flashing Nokia 3110c handset (1) Force.com (7) Gaia (1) Game Developement (1) GCC (2) GDG (2) Goank (1) Google (4) Google Developer Group (2) Google Drive (1) GTK+ (5) HACK2012 (2) Hall of Mirrors (1) help for this page (1) HTML5 (2) HTTP Web Server (1) IDE (1) Identity Provider (1) Intelligent Systems (1) Java (1) JDE (1) JOSSO (1) location based social network (1) me.social (1) MinGW (1) Natural Language Processing (1) Natural Language Toolkit (1) neckphone (1) NLKT (1) Nokia Pheonix (1) Notebook (1) Numeric XML Tags (1) OAuth2.0 (1) OLPC (7) OLPC-XO-1 (7) One Laptop per Child (5) Override custom help (1) Paas (1) Partner WSDL (1) Polymorphism (1) programming contest (1) PyGTK (4) Python (10) Recycled Numbers (1) reinterpret_cast (1) Research (1) REST (1) RM-237 (1) Robotics (1) Ruby (1) Saas (2) Salesforce.com (7) SDK (1) Service Provider (1) Single sign on (1) SOAP (3) Speaking in Tongues (1) SSO Agent (1) SSO Gateway (1) static_const (1) sugar (7) sugar activity (4) sugarlabs (7) SVG (2) Symbiotic AI (1) Tabbed container (1) TCP/IP (1) TCP/IP stack (1) Typecasting (1) typeid (1) ubuntu 13.10 (1) UDP (1) Upgrade Assembly (1) Visualforce (2) Web Server (1) Web Services (3) Web2.0 (1) wikipedia (1) wikipediaHI (1) WSDL (1) XML tags (1)

Wednesday, September 19, 2012

Natural Language Processing with Python

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.Natural Language Processing with Python provides a practical introduction to programming for language processing. Written by the creators of NLTK, it guides the reader through the fundamentals of writing Python programs, working with corpora, categorizing text, analyzing linguistic structure, and more.

Lets start with steps to install NLTK and other utilities.

Steps:
  1. Install Python: http://www.python.org/download/releases/2.7.3/
  2. Install Numpy (optional):http://sourceforge.net/projects/numpy/files/NumPy/1.6.2/numpy-1.6.2-win32-superpack-python2.7.exe
  3. Install NLTK: http://pypi.python.org/pypi/nltk
  4. Install PyYAML: http://pyyaml.org/wiki/PyYAML
Now if you type the command import nltk at IDLE shell you'll get the nltk module loaded and cursor will come to next line.

Power of NLTK:
1. Tokenizer: This returns you list of tokens present in a sentence you provide as parameter to tokenizer.




2. Part of Speech tagger: This tags the tokens within sentence with appropriate tags like NP, VP, JJ, etc.

It is possible that you get such error:

You can resolve this with command nltk.download(). This will bring the NLTK downloader:

Select the "book" entry and hit download button. This will start downloading required packages associated with book. In this step we are actually downloading corporus for NLTK.

Once you download the corporus, rerun the same command and you will see sentence being tagged with Nouns, Verbs, Adjective,etc. :

3. Parse Tree using Context Free Grammar(CFG)
Now lets see the power of recursive descent parser in recognizing a sentence apropos to a context free grammar. Consider we have a context free grammar:
S -> NP VP
VP -> V NP | V NP PP
PP -> P NP
V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with"

Sample sentence that can be produced by this grammar is :
Mary saw Bob

Let see how it works:


- We start with specifying the grammar 
- We split the sentence in tokens by " "( blank space)
- We create instance of recursive descent parser using this grammar
- We create a tree by calling parse method of the parser
- As a result we get the parse tree for the sentence


I hope this post throws some light on natural language processing capabilities of python.