SPOJ EMAIL ID
Finally solved EMAIL ID using python though it was very hard for me to switch from Haskell to python. While doing this problem, I learned quite a lot about regular expressions, found some cool site like pythontutor and problem solving with python . Accepted code in python.
import re if __name__ == "__main__": n = int ( raw_input() ) c = 1 while c <= n : email = re.findall ( "[a-zA-Z0-9][a-zA-Z0-9._]{4,}@[a-zA-Z0-9]+\.(?:com|edu|org|co\.in)", raw_input() ) t = len ( email ) print 'Case #' + str ( c ) + ': ' + str ( t ) for i in xrange ( t ) : print email[i] c += 1
Converting Wikipedia html files in pdf
I want to convert html files from Wikipedia to pdf for off line reading purpose . After bit of searching , Wikipedia itself provides a link on left side [ Print/export ] of every article to convert it into pdf . After couple of clicks , we can download the pdf but I want to write Haskell script. This script generates the rendering url. Rendering url return empty tags while copy and pasting the rendering url to web browser generates the pdf file. After asking on Haskell-cafe revealed that the link is generated by javascript and i have to script an actual browser to generated pdf from this code. Technically this is still unfinished project 😦 but first time I played with some sort of web programming.
import Network.HTTP import Text.HTML.TagSoup import Data.Maybe parseHelp :: Tag String -> Maybe String parseHelp ( TagOpen _ y ) = if any ( \( a , b ) -> b == "Download a PDF version of this wiki page" ) y then Just $ "http://en.wikipedia.org" ++ snd ( y !! 0 ) else Nothing parse :: [ Tag String ] -> Maybe String parse [] = Nothing parse ( x : xs ) | isTagOpen x = case parseHelp x of Just s -> Just s Nothing -> parse xs | otherwise = parse xs main = do x <- getLine tags_1 <- fmap parseTags $ getResponseBody =<< simpleHTTP ( getRequest x ) --open url let lst = head . sections ( ~== "<div class=portal id=p-coll-print_export>" ) $ tags_1 url = fromJust . parse $ lst --rendering url putStrLn url tags_2 <- fmap parseTags $ getResponseBody =<< simpleHTTP ( getRequest url ) print tags_2
My second choice was obviously python and it finished the job perfectly . Python script for this purpose and in fact it can convert any html file to pdf. Its like opening a html file in web browser and printing it to pdf file.
import sys from PyQt4.QtCore import * from PyQt4.QtGui import * from PyQt4.QtWebKit import * #http://www.rkblog.rk.edu.pl/w/p/webkit-pyqt-rendering-web-pages/ #http://pastebin.com/xunfQ959 #http://bharatikunal.wordpress.com/2010/01/31/converting-html-to-pdf-with-python-and-qt/ #http://www.riverbankcomputing.com/pipermail/pyqt/2009-January/021592.html def convertFile( ): web.print_( printer ) print "done" QApplication.exit() if __name__=="__main__": url = raw_input("enter url:") filename = raw_input("enter file name:") app = QApplication( sys.argv ) web = QWebView() web.load(QUrl( url )) #web.show() printer = QPrinter( QPrinter.HighResolution ) printer.setPageSize( QPrinter.A4 ) printer.setOutputFormat( QPrinter.PdfFormat ) printer.setOutputFileName( filename + ".pdf" ) QObject.connect( web , SIGNAL("loadFinished(bool)"), convertFile ) sys.exit(app.exec_()) ~