My Weblog

Blog about programming and math

Parsing Email ID

Now a days I am exploring one of the most awesome feature of Haskell, parsing. There are lot of parsing libraries but Parsec is the popular one. This parsing code is written for SPOJ EMAIL ID problem ( Unfortunately it’s getting time limit exceed. I have seen couple of python solution accepted so I am hopeful that there must be another algorithm probably using regular expressions to solve the problem ) but you can build more sophisticated email-id parser by adding more functionality. Tony Morris excellent parsing tutorial is must read.

import Data.List
import qualified Text.Parsec.ByteString as PB
import Text.Parsec.Prim
import Text.Parsec.Char
import Text.Parsec.Combinator
import qualified Data.ByteString.Char8 as BS
import Control.Applicative hiding ( ( <|> ) , many ) 

validChars :: PB.Parser Char
validChars  = alphaNum <|> oneOf "._" 

dontCare :: PB.Parser Char 
dontCare = oneOf "~!@#$%^&*()<>?,."

{--
emailAddress :: PB.Parser  String
emailAddress = do 
             _ <- many dontCare
             fi <- alphaNum
             se <- validChars
             th <- validChars
             fo <- validChars
             ft <- validChars
             restAddr <- many validChars
             let addr = fi : se : th : fo : ft : restAddr 
             char '@'
             dom <- many1 alphaNum 
             rest <- try ( string ".com" <|> string ".org"  
                  <|>  string ".edu" ) <|> try ( string ".co.in" )
             _ <- many dontCare
             return $  addr ++ (  '@': dom ++ rest ) 
   
--} 
          
emailAddress :: PB.Parser String
emailAddress = conCatfun <$> ( many dontCare *> alphaNum ) <*> validChars <*> 
               validChars <*> validChars <*> validChars <*> many alphaNum  <*> 
               ( char '@' *> many1 alphaNum ) <*> ( try ( string ".com" <|> 
               string ".org" <|>  string ".edu" ) <|> try ( string ".co.in" ) 
               <* many dontCare ) where 
                conCatfun a b c d e f dom rest = 
                       ( a : b : c : d : e : f ) ++ ( '@' : dom ) ++ rest 


collectEmail :: BS.ByteString -> String
collectEmail email = case  parse emailAddress "" email of
                        Right addr -> addr 
                        Left err ->  "" 

process :: ( Int , [ String ] ) -> BS.ByteString
process ( k , xs ) = ( BS.pack "Case " ) `BS.append` ( BS.pack . show $ k ) 
          `BS.append` ( BS.pack ": " ) `BS.append` ( BS.pack . show . length $ xs ) 
          `BS.append` ( BS.pack "\n" ) `BS.append` ( BS.pack  
          (  unlines . filter ( not . null ) $  xs ) )

main = BS.interact $ BS.concat .  map process . zip [ 1.. ] . map ( map collectEmail . 
       BS.words ) . tail . BS.lines

Mukeshs-MacBook-Pro:Haskell mukeshtiwari$ cat t.txt 
2
svm11@gmail.com
svm11@gmail.com svm12@gmail.co.in  ~!@#$%^&*()<>?svm12@gmail.co.in~!@#$%^&*()
Mukeshs-MacBook-Pro:Haskell mukeshtiwari$ ./Spoj_11105 < t.txt
Case 1: 1
svm11@gmail.com
Case 2: 2
svm11@gmail.com
svm12@gmail.co.in
svm12@gmail.co.in

Update

I tried again to solve this problem using regular expression. Following the tutorial, I wrote this code but I got compiler error because of old version of ghc on SPOJ ( GHC-6.10.4 ). It’s working fine on my system but I still have to test if it is correct and fast enough to get accepted.

import Data.List
import Text.Regex.Posix
import qualified Data.ByteString.Char8 as BS

pat :: BS.ByteString
pat = BS.pack "[^~!@#$%^&*()<>?,.]*[a-zA-Z0-9][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._]*@[a-zA-Z0-9]+.(com|edu|org|co.in)[^~!@#$%^&*()<>?,.a-zA-Z0-9]*"

collectEmail :: BS.ByteString -> BS.ByteString
collectEmail email = ( =~ ) email pat 

process :: ( Int , [ BS.ByteString ] ) -> BS.ByteString
process ( k , xs ) = ( BS.pack "Case " ) `BS.append` ( BS.pack . show $ k ) 
          `BS.append` ( BS.pack ": " ) `BS.append` ( BS.pack . show . length $ xs ) 
          `BS.append` ( BS.pack "\n" ) `BS.append` ( BS.unlines xs )


main = BS.interact $ BS.concat .  map process . zip [ 1 .. ] . 
       map ( filter ( not . BS.null ) . map collectEmail . BS.words ) . 
       tail . BS.lines 
Mukeshs-MacBook-Pro:SPOJ mukeshtiwari$ cat t.txt 
2
svm11@gmail.com
svm11@gmail.com svm12@gmail.co.in %&^%&%&%&%&^%&^%&^%&^mukeshtiwari.iiitm@gmail.com%&%^&^%&%&%&%&^%&%&^%&^%&^% %$%$#%#%#%#%$#%&%&%&tiwa@gmail.com
Mukeshs-MacBook-Pro:SPOJ mukeshtiwari$ ./Spoj_11105 < t.txt 
Case 1: 1
svm11@gmail.com
Case 2: 3
svm11@gmail.com
svm12@gmail.co.in
mukeshtiwari.iiitm@gmail.com
Advertisements

March 8, 2013 Posted by | Haskell, Programming | , , , | Leave a comment

Parsing with Applicative Functors

While reading Write Yourself a Scheme in 48 Hours, I thought of converting the monadic codes of parsing into applicative style. Learn you a Haskell for Great Good has excellent explanation about functors and applicative functors. Real World Haskell has excellent chapter on using applicative functors for parsing. Below is my attempt to convert the monadic codes of parsing into applicative style for Scheme parsing. On parsing note, see the excellent tutorial of Tony Morris and Albert Y. C. Lie.

import Control.Applicative hiding ( many , ( <|> )  )
import Text.ParserCombinators.Parsec hiding ( spaces )

symbol :: Parser Char
symbol =  oneOf "!$%&|+-*/:<=?>@^_~#"

spaces :: Parser ()
spaces = skipMany space 

parseString :: Parser LispVal
parseString = String <$> ( char '"' *>  x <*  char '"' ) where 
            x = many ( noneOf "\"" ) 
--This is not completely applicative style because I have to take the result for testing conditions.
parseAtom :: Parser LispVal
parseAtom = do 
       atom <-  ( : ) <$> ( letter <|> symbol ) <*> many ( letter <|> digit <|> symbol )
       return $ case atom of 
                "#t" -> Bool True
                "#f" -> Bool False
                _ -> Atom atom


parseNumber :: Parser LispVal
parseNumber = Number . read <$> many1 digit


parseList :: Parser LispVal
parseList = List <$> sepBy parseExpr space 

parseDottedList :: Parser LispVal
parseDottedList = DottedList <$> ( endBy parseExpr space ) <*> ( char '.' *> spaces *> parseExpr )


parseQuoted :: Parser LispVal
parseQuoted =  f  <$>   ( char '\'' *> parseExpr ) where 
              f x =   List [ Atom "quote" , x ] 

parseExpr :: Parser LispVal
parseExpr =  parseAtom 
         <|> parseString
         <|> parseNumber 
         <|> parseQuoted
         <|> ( char '(' *> spaces *> ( ( try parseList ) <|> parseDottedList ) <* spaces  <* char ')' )

Complete code on github ( Apart from applicative parsing every other code is from Scheme parsing tutorial ). If you have any comments or suggestion then please let me know.

January 18, 2013 Posted by | Haskell, Programming | , , , | Leave a comment