My Weblog

Blog about programming and math

Parsing Email ID

Now a days I am exploring one of the most awesome feature of Haskell, parsing. There are lot of parsing libraries but Parsec is the popular one. This parsing code is written for SPOJ EMAIL ID problem ( Unfortunately it’s getting time limit exceed. I have seen couple of python solution accepted so I am hopeful that there must be another algorithm probably using regular expressions to solve the problem ) but you can build more sophisticated email-id parser by adding more functionality. Tony Morris excellent parsing tutorial is must read.

import Data.List
import qualified Text.Parsec.ByteString as PB
import Text.Parsec.Prim
import Text.Parsec.Char
import Text.Parsec.Combinator
import qualified Data.ByteString.Char8 as BS
import Control.Applicative hiding ( ( <|> ) , many ) 

validChars :: PB.Parser Char
validChars  = alphaNum <|> oneOf "._" 

dontCare :: PB.Parser Char 
dontCare = oneOf "~!@#$%^&*()<>?,."

{--
emailAddress :: PB.Parser  String
emailAddress = do 
             _ <- many dontCare
             fi <- alphaNum
             se <- validChars
             th <- validChars
             fo <- validChars
             ft <- validChars
             restAddr <- many validChars
             let addr = fi : se : th : fo : ft : restAddr 
             char '@'
             dom <- many1 alphaNum 
             rest <- try ( string ".com" <|> string ".org"  
                  <|>  string ".edu" ) <|> try ( string ".co.in" )
             _ <- many dontCare
             return $  addr ++ (  '@': dom ++ rest ) 
   
--} 
          
emailAddress :: PB.Parser String
emailAddress = conCatfun <$> ( many dontCare *> alphaNum ) <*> validChars <*> 
               validChars <*> validChars <*> validChars <*> many alphaNum  <*> 
               ( char '@' *> many1 alphaNum ) <*> ( try ( string ".com" <|> 
               string ".org" <|>  string ".edu" ) <|> try ( string ".co.in" ) 
               <* many dontCare ) where 
                conCatfun a b c d e f dom rest = 
                       ( a : b : c : d : e : f ) ++ ( '@' : dom ) ++ rest 


collectEmail :: BS.ByteString -> String
collectEmail email = case  parse emailAddress "" email of
                        Right addr -> addr 
                        Left err ->  "" 

process :: ( Int , [ String ] ) -> BS.ByteString
process ( k , xs ) = ( BS.pack "Case " ) `BS.append` ( BS.pack . show $ k ) 
          `BS.append` ( BS.pack ": " ) `BS.append` ( BS.pack . show . length $ xs ) 
          `BS.append` ( BS.pack "\n" ) `BS.append` ( BS.pack  
          (  unlines . filter ( not . null ) $  xs ) )

main = BS.interact $ BS.concat .  map process . zip [ 1.. ] . map ( map collectEmail . 
       BS.words ) . tail . BS.lines

Mukeshs-MacBook-Pro:Haskell mukeshtiwari$ cat t.txt 
2
svm11@gmail.com
svm11@gmail.com svm12@gmail.co.in  ~!@#$%^&*()<>?svm12@gmail.co.in~!@#$%^&*()
Mukeshs-MacBook-Pro:Haskell mukeshtiwari$ ./Spoj_11105 < t.txt
Case 1: 1
svm11@gmail.com
Case 2: 2
svm11@gmail.com
svm12@gmail.co.in
svm12@gmail.co.in

Update

I tried again to solve this problem using regular expression. Following the tutorial, I wrote this code but I got compiler error because of old version of ghc on SPOJ ( GHC-6.10.4 ). It’s working fine on my system but I still have to test if it is correct and fast enough to get accepted.

import Data.List
import Text.Regex.Posix
import qualified Data.ByteString.Char8 as BS

pat :: BS.ByteString
pat = BS.pack "[^~!@#$%^&*()<>?,.]*[a-zA-Z0-9][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._]*@[a-zA-Z0-9]+.(com|edu|org|co.in)[^~!@#$%^&*()<>?,.a-zA-Z0-9]*"

collectEmail :: BS.ByteString -> BS.ByteString
collectEmail email = ( =~ ) email pat 

process :: ( Int , [ BS.ByteString ] ) -> BS.ByteString
process ( k , xs ) = ( BS.pack "Case " ) `BS.append` ( BS.pack . show $ k ) 
          `BS.append` ( BS.pack ": " ) `BS.append` ( BS.pack . show . length $ xs ) 
          `BS.append` ( BS.pack "\n" ) `BS.append` ( BS.unlines xs )


main = BS.interact $ BS.concat .  map process . zip [ 1 .. ] . 
       map ( filter ( not . BS.null ) . map collectEmail . BS.words ) . 
       tail . BS.lines 
Mukeshs-MacBook-Pro:SPOJ mukeshtiwari$ cat t.txt 
2
svm11@gmail.com
svm11@gmail.com svm12@gmail.co.in %&^%&%&%&%&^%&^%&^%&^mukeshtiwari.iiitm@gmail.com%&%^&^%&%&%&%&^%&%&^%&^%&^% %$%$#%#%#%#%$#%&%&%&tiwa@gmail.com
Mukeshs-MacBook-Pro:SPOJ mukeshtiwari$ ./Spoj_11105 < t.txt 
Case 1: 1
svm11@gmail.com
Case 2: 3
svm11@gmail.com
svm12@gmail.co.in
mukeshtiwari.iiitm@gmail.com

March 8, 2013 Posted by | Haskell, Programming | , , , | Leave a comment