My Weblog

Blog about programming and math

SPOJ EMAIL ID

Finally solved EMAIL ID using python though it was very hard for me to switch from Haskell to python. While doing this problem, I learned quite a lot about regular expressions, found some cool site like pythontutor and problem solving with python . Accepted code in python.


import re

if __name__ == "__main__":
    n = int ( raw_input() )
    c = 1
    while c <= n :
        email =  re.findall ( "[a-zA-Z0-9][a-zA-Z0-9._]{4,}@[a-zA-Z0-9]+\.(?:com|edu|org|co\.in)", raw_input() )  
        t = len ( email )
        print 'Case #' + str ( c ) + ': ' + str ( t )
        for i in xrange ( t ) : print email[i]
        c += 1





March 11, 2013 Posted by | Programming, python | , , , | Leave a comment

Converting Wikipedia html files in pdf

I want to convert html files from Wikipedia to pdf for off line reading purpose . After bit of searching , Wikipedia itself provides a link on left side [ Print/export ] of every article to convert it into pdf . After couple of clicks , we can download the pdf but I want to write Haskell script. This script generates the rendering url. Rendering url return empty tags while copy and pasting the rendering url to web browser generates the pdf file. After asking on Haskell-cafe revealed that the link is generated by javascript and i have to script an actual browser to generated pdf from this code. Technically this is still unfinished project 😦 but first time I played with some sort of web programming.

import Network.HTTP
import Text.HTML.TagSoup
import Data.Maybe
 
parseHelp :: Tag String -> Maybe String 
parseHelp ( TagOpen _ y ) = if any ( \( a , b ) -> b == "Download a PDF version of this wiki page" ) y 
                             then Just $  "http://en.wikipedia.org" ++   snd (   y !!  0 )
                              else Nothing
 
 
parse :: [ Tag String ] -> Maybe String
parse [] = Nothing 
parse ( x : xs ) 
   | isTagOpen x = case parseHelp x of 
                         Just s -> Just s 
                         Nothing -> parse xs
   | otherwise = parse xs
 
 
main = do 
        x <- getLine 
        tags_1 <-  fmap parseTags $ getResponseBody =<< simpleHTTP ( getRequest x ) --open url
        let lst =  head . sections ( ~== "<div class=portal id=p-coll-print_export>" ) $ tags_1
            url =  fromJust . parse $ lst  --rendering url
        putStrLn url
        tags_2 <-  fmap parseTags $ getResponseBody =<< simpleHTTP ( getRequest url )
        print tags_2
 

My second choice was obviously python and it finished the job perfectly . Python script for this purpose and in fact it can convert any html file to pdf. Its like opening a html file in web browser and printing it to pdf file.

import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

#http://www.rkblog.rk.edu.pl/w/p/webkit-pyqt-rendering-web-pages/
#http://pastebin.com/xunfQ959
#http://bharatikunal.wordpress.com/2010/01/31/converting-html-to-pdf-with-python-and-qt/
#http://www.riverbankcomputing.com/pipermail/pyqt/2009-January/021592.html

def convertFile( ):
                web.print_( printer )
                print "done"
                QApplication.exit()


if __name__=="__main__":
        url = raw_input("enter url:")
        filename = raw_input("enter file name:")
        app = QApplication( sys.argv )
        web = QWebView()
        web.load(QUrl( url ))
        #web.show()
        printer = QPrinter( QPrinter.HighResolution )
        printer.setPageSize( QPrinter.A4 )
        printer.setOutputFormat( QPrinter.PdfFormat )
        printer.setOutputFileName(  filename + ".pdf" )
        QObject.connect( web ,  SIGNAL("loadFinished(bool)"), convertFile  )
        sys.exit(app.exec_())
~                              

September 9, 2011 Posted by | Programming | , , | Leave a comment