Keywords

.NET (3) .rb (1) *.cod (1) 3110c (1) Algorithm (1) Amazon Cloud Drive (1) amkette (1) Android (1) Apex (6) apex:dynamic (1) API (1) API version (1) Application Development Contest (2) Artificial Intelligence (2) Atricore (1) b2g (1) Binary Search Tree (1) Blackberry Application Development (1) Blackberry Java Development Environment (1) Blender Game Engine (1) bluetooth (2) Boot2Gecko (1) bug fix (1) C (1) C++ (2) Cloud computing (1) Cloud Storage (1) Code Blocks (1) Code for a Cause (2) codejam (1) Coding (1) const_cast (1) Custom Help (1) Dancing With the Googlers (1) Data Structures (1) desktop environment (5) Doubly Linked List (1) Dropbox (1) dynamic visualforce component (1) dynamic_cast (1) Enterprise WSDL (1) Execution Context (1) fedora 14 (1) fedora 17 (5) Firefox OS (1) Flashing Nokia 3110c handset (1) Force.com (7) Gaia (1) Game Developement (1) GCC (2) GDG (2) Goank (1) Google (4) Google Developer Group (2) Google Drive (1) GTK+ (5) HACK2012 (2) Hall of Mirrors (1) help for this page (1) HTML5 (2) HTTP Web Server (1) IDE (1) Identity Provider (1) Intelligent Systems (1) Java (1) JDE (1) JOSSO (1) location based social network (1) me.social (1) MinGW (1) Natural Language Processing (1) Natural Language Toolkit (1) neckphone (1) NLKT (1) Nokia Pheonix (1) Notebook (1) Numeric XML Tags (1) OAuth2.0 (1) OLPC (7) OLPC-XO-1 (7) One Laptop per Child (5) Override custom help (1) Paas (1) Partner WSDL (1) Polymorphism (1) programming contest (1) PyGTK (4) Python (10) Recycled Numbers (1) reinterpret_cast (1) Research (1) REST (1) RM-237 (1) Robotics (1) Ruby (1) Saas (2) Salesforce.com (7) SDK (1) Service Provider (1) Single sign on (1) SOAP (3) Speaking in Tongues (1) SSO Agent (1) SSO Gateway (1) static_const (1) sugar (7) sugar activity (4) sugarlabs (7) SVG (2) Symbiotic AI (1) Tabbed container (1) TCP/IP (1) TCP/IP stack (1) Typecasting (1) typeid (1) ubuntu 13.10 (1) UDP (1) Upgrade Assembly (1) Visualforce (2) Web Server (1) Web Services (3) Web2.0 (1) wikipedia (1) wikipediaHI (1) WSDL (1) XML tags (1)

Monday, January 7, 2013

WikipediaHI: Offline Wikipedia in Hindi !!





Last week I spent some time working on WikipediaHI activity for Sugar Desktop Environment. I must say it is one of the awesome activities I have come across. The best part is that it can serve you with data in offline mode. That is even if don't have internet connection which is otherwise required to access Wikipedia online, then also your WikipediaHI activity will serve your purpose.

There are lot many developers and contributors who are working in collaborative form on such awesome stuff who continuously inspire you to take up new things and create something that can be used by others in the world. Sugar developers and contributors are epitome of such group.

I came across few of such developers, Anish Mangal and Gonzalo Odiard, two of them whose contributions are significant for Sugar. I took up the task of creating WikipediaHI using Wikipedia dump for Hindi available for free. I followed the steps specified on this page[ hosted by Gonzalo] for creating Wikipedia activity in your own language.

I will quickly explain the steps I took to create WikipediaHI:

1) Downloaded the Wikipedia dump file for Hindi:
http://dumps.wikimedia.org/hiwiki/20121225/hiwiki-20121225-pages-articles.xml.bz2
NOTE: [ Make sure you pick the valid latest file from here : http://dumps.wikimedia.org/hiwiki/   this location will show you listing as per dates. Pick the latest dump and proceed further.]

and downloaded WikipediaBase from this link

2) Created "hi" directory for HINDI under WikipediaBase directory and moved the downloaded dump to this folder.

3) Extracted contents of this file using:
bzip2 -d hiwiki-20121225-pages-articles.xml.bz2

4) Processed the dump using page parser:
../tools2/pages_parser.py

The result of this operation will generate these files:
hiwiki-20121225-pages-articles.xml.links
hiwiki-20121225-pages-articles.xml.page_templates
hiwiki-20121225-pages-articles.redirects
hiwiki-20121225-pages-articles.templates

5) Then you can include selective articles or all articles from this dump to your activity by using this command:
../tools2/make_selection.py
* Make sure you have favorites.txt and blacklist.txt filled with appropriate keywords.

Now if you want to include all articles use this command:
../tools2/make_selection.py --all

6) Then proceed to create the index for these articles:
../tools2/create_index.py

7) In order to test the index created in previous step you can use this command:
../tools2/test_index.py

8) Next step is to expand the templates of articles :
cd ..
./tools2/expandtemplates.py hi

9) Go back to hi directory and re-create the index :
cd hi
mv hiwiki-20121225-pages-articles.xml.processed_expanded hiwiki-20121225-pages-articles.xml.processed
../tools2/create_index.py --delete_all

10) Download the images for the articles you selected:
cd hi
../tools2/download_images.py

if you want to download the images for pages you selected in previous step:
../tools2/download_images.py --all

11) Create files specific to language:
(a)activity/activity.info.lang : activity info file for you language activity
(b)activity/activity-wikipedia-lang.svg : activity icon for your language
(c)activity_lang.py : activity file for your language
(d)static/about_lang.html : about page for wikipedia in your language.
(e)static/index_lang.html : index page for wikipedia in your language. This is the page displayed when activity is launched. So its important for you to know the articles included in the search.db ( generated when index is created) for you to create the index page.


12) Create the XO file for wikipedia in your language:
./setup_new_wiki.py hi/hiwiki-20121225-pages-articles.xml

I went through the search.db file to identify the articles present in it and create the index page accordingly.
This gave me an idea to write some script that can generate index page(part or whole) to be used as home page for activity using search.db[ Stay tuned for next blog on this idea]

Here you go.. you can see WikipediaHI

On launching this, you can see the index page listing the articles you can view offline using WikipediaHI

If you want to play with WikipediaHI, you can download it : WikipediaHI-35.xo

I must thank Gonzalo for his amazing help and guidance in getting this done. I have to mention here that Wikipedia
changed its XML format in their dumps which resulted in error when I was creating the index. I took Gonzalo's help to get it resolved.
Thanks to Anish, who motivated me to pick this up and guided me to complete it.

Thanks guys !! :D

5 comments:

rohini said...

Well researched article and I appreciate this. The blog is subscribed and will see new topics soon.
Authorized iphone service center in Chennai | iphone service center in chennai | Mobile service center in chennai | Authorized iphone service center in Chennai | iphone service center in chennai

DumpsPass4sure said...

It was so interesting to study from Pass4sure XML dumps and the presentation of concepts was scholarly and praiseworthy. Dumpspass4sure deserve appreciations and I thank them for so good experience of preparation. In my view, there is no other better material than Pass4sure XML questions and answers.

Tripu Design said...

who provides Graphic services, web designing services, logo design services, graphic design
and all kind of website design,Graphic services.
Freelance Graphic Designing:
Freelance Catalogue Designing in delhi
Freelance Catalogue Designing in gurgaon
Freelance Brochure Designing
Freelance Label Designing
Freelance Banner Designer
Freelance Poster Designer
graphic design services in delhi
graphic design services in gurgaon
Freelance Catalogue Designing in delhi
Freelance Catalogue Designing in gurgaon
Freelance Brochure Designing
Freelance Label Designing
Freelance Banner Designer
Freelance Poster Designer
graphic design services in delhi
graphic design services in gurgaon
Freelance Catalogue Designing in delhi
Freelance Catalogue Designing in gurgaon
Freelance Brochure Designing
Freelance Label Designing
Freelance Banner Designer
Freelance Poster Designer
graphic design services in delhi
graphic design services in gurgaon
Freelance Catalogue Designing in delhi
Freelance Catalogue Designing in gurgaon
Freelance Brochure Designing
Freelance Label Designing
Freelance Banner Designer
Freelance Poster Designer
graphic design services in delhi
graphic design services in gurgaon

TNK Design Desk said...

This is an amazing blog, thank you so much for sharing such valuable information with us.
Visit for best website design and SEO services at- Website Development Company in India
best website design services in gurgaon
best web design company in gurgaon
best website design in gurgaon
website design services in gurgaon
website design service in gurgaon
best website designing company in gurgaon
website designing services in gurgaon
web design company in gurgaon
best website designing company in india
top website designing company in india
best web design company in gurgaon
best web designing services in gurgaon
best web design services in gurgaon
website designing in gurgaon
website designing company in gurgaon
website design in gurgaon
graphic designing company in gurgaon
website company in gurgaon
website design company in gurgaon
web design services in gurgaon
best website design company in gurgaon
website company in gurgaon
Website design Company in gurgaon
best website designing services in gurgaon
best web design in gurgaon
website designing company in gurgaon
website development company in gurgaon
web development company in gurgaon
website design company
website designing services

Online Business Solution said...

Nice publish! Thanks for sharing these useful statistics to us. I'm looking ahead to your new post so, please preserve sharing.We also provide
digital marketing company in delhi
Web Designing Company
Digital Marketing Services
Internet Marketing Services
Web Designing Services
Web Development Company
Website Development Company
website design company in delhi
Mobile Responsive
Mobile Friendly Website
Website Redesigning
Website Redesign
Ecommerce Website Development Company
Website Development for Ecommerce
Magento Development Company