Parsing Email ID
Now a days I am exploring one of the most awesome feature of Haskell, parsing. There are lot of parsing libraries but Parsec is the popular one. This parsing code is written for SPOJ EMAIL ID problem ( Unfortunately it’s getting time limit exceed. I have seen couple of python solution accepted so I am hopeful that there must be another algorithm probably using regular expressions to solve the problem ) but you can build more sophisticated email-id parser by adding more functionality. Tony Morris excellent parsing tutorial is must read.
import Data.List import qualified Text.Parsec.ByteString as PB import Text.Parsec.Prim import Text.Parsec.Char import Text.Parsec.Combinator import qualified Data.ByteString.Char8 as BS import Control.Applicative hiding ( ( <|> ) , many ) validChars :: PB.Parser Char validChars = alphaNum <|> oneOf "._" dontCare :: PB.Parser Char dontCare = oneOf "~!@#$%^&*()<>?,." {-- emailAddress :: PB.Parser String emailAddress = do _ <- many dontCare fi <- alphaNum se <- validChars th <- validChars fo <- validChars ft <- validChars restAddr <- many validChars let addr = fi : se : th : fo : ft : restAddr char '@' dom <- many1 alphaNum rest <- try ( string ".com" <|> string ".org" <|> string ".edu" ) <|> try ( string ".co.in" ) _ <- many dontCare return $ addr ++ ( '@': dom ++ rest ) --} emailAddress :: PB.Parser String emailAddress = conCatfun <$> ( many dontCare *> alphaNum ) <*> validChars <*> validChars <*> validChars <*> validChars <*> many alphaNum <*> ( char '@' *> many1 alphaNum ) <*> ( try ( string ".com" <|> string ".org" <|> string ".edu" ) <|> try ( string ".co.in" ) <* many dontCare ) where conCatfun a b c d e f dom rest = ( a : b : c : d : e : f ) ++ ( '@' : dom ) ++ rest collectEmail :: BS.ByteString -> String collectEmail email = case parse emailAddress "" email of Right addr -> addr Left err -> "" process :: ( Int , [ String ] ) -> BS.ByteString process ( k , xs ) = ( BS.pack "Case " ) `BS.append` ( BS.pack . show $ k ) `BS.append` ( BS.pack ": " ) `BS.append` ( BS.pack . show . length $ xs ) `BS.append` ( BS.pack "\n" ) `BS.append` ( BS.pack ( unlines . filter ( not . null ) $ xs ) ) main = BS.interact $ BS.concat . map process . zip [ 1.. ] . map ( map collectEmail . BS.words ) . tail . BS.lines
Mukeshs-MacBook-Pro:Haskell mukeshtiwari$ cat t.txt 2 svm11@gmail.com svm11@gmail.com svm12@gmail.co.in ~!@#$%^&*()<>?svm12@gmail.co.in~!@#$%^&*() Mukeshs-MacBook-Pro:Haskell mukeshtiwari$ ./Spoj_11105 < t.txt Case 1: 1 svm11@gmail.com Case 2: 2 svm11@gmail.com svm12@gmail.co.in svm12@gmail.co.in
Update
I tried again to solve this problem using regular expression. Following the tutorial, I wrote this code but I got compiler error because of old version of ghc on SPOJ ( GHC-6.10.4 ). It’s working fine on my system but I still have to test if it is correct and fast enough to get accepted.
import Data.List import Text.Regex.Posix import qualified Data.ByteString.Char8 as BS pat :: BS.ByteString pat = BS.pack "[^~!@#$%^&*()<>?,.]*[a-zA-Z0-9][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._]*@[a-zA-Z0-9]+.(com|edu|org|co.in)[^~!@#$%^&*()<>?,.a-zA-Z0-9]*" collectEmail :: BS.ByteString -> BS.ByteString collectEmail email = ( =~ ) email pat process :: ( Int , [ BS.ByteString ] ) -> BS.ByteString process ( k , xs ) = ( BS.pack "Case " ) `BS.append` ( BS.pack . show $ k ) `BS.append` ( BS.pack ": " ) `BS.append` ( BS.pack . show . length $ xs ) `BS.append` ( BS.pack "\n" ) `BS.append` ( BS.unlines xs ) main = BS.interact $ BS.concat . map process . zip [ 1 .. ] . map ( filter ( not . BS.null ) . map collectEmail . BS.words ) . tail . BS.lines
Mukeshs-MacBook-Pro:SPOJ mukeshtiwari$ cat t.txt 2 svm11@gmail.com svm11@gmail.com svm12@gmail.co.in %&^%&%&%&%&^%&^%&^%&^mukeshtiwari.iiitm@gmail.com%&%^&^%&%&%&%&^%&%&^%&^%&^% %$%$#%#%#%#%$#%&%&%&tiwa@gmail.com Mukeshs-MacBook-Pro:SPOJ mukeshtiwari$ ./Spoj_11105 < t.txt Case 1: 1 svm11@gmail.com Case 2: 3 svm11@gmail.com svm12@gmail.co.in mukeshtiwari.iiitm@gmail.com