2013 March 08 « My Weblog

Parsing Email ID

Now a days I am exploring one of the most awesome feature of Haskell, parsing. There are lot of parsing libraries but Parsec is the popular one. This parsing code is written for SPOJ EMAIL ID problem ( Unfortunately it’s getting time limit exceed. I have seen couple of python solution accepted so I am hopeful that there must be another algorithm probably using regular expressions to solve the problem ) but you can build more sophisticated email-id parser by adding more functionality. Tony Morris excellent parsing tutorial is must read.

import Data.List
import qualified Text.Parsec.ByteString as PB
import Text.Parsec.Prim
import Text.Parsec.Char
import Text.Parsec.Combinator
import qualified Data.ByteString.Char8 as BS
import Control.Applicative hiding ( ( <|> ) , many ) 

validChars :: PB.Parser Char
validChars  = alphaNum <|> oneOf "._" 

dontCare :: PB.Parser Char 
dontCare = oneOf "~!@#$%^&*()<>?,."

{--
emailAddress :: PB.Parser  String
emailAddress = do 
             _ <- many dontCare
             fi <- alphaNum
             se <- validChars
             th <- validChars
             fo <- validChars
             ft <- validChars
             restAddr <- many validChars
             let addr = fi : se : th : fo : ft : restAddr 
             char '@'
             dom <- many1 alphaNum 
             rest <- try ( string ".com" <|> string ".org"  
                  <|>  string ".edu" ) <|> try ( string ".co.in" )
             _ <- many dontCare
             return $  addr ++ (  '@': dom ++ rest ) 
   
--} 
          
emailAddress :: PB.Parser String
emailAddress = conCatfun <$> ( many dontCare *> alphaNum ) <*> validChars <*> 
               validChars <*> validChars <*> validChars <*> many alphaNum  <*> 
               ( char '@' *> many1 alphaNum ) <*> ( try ( string ".com" <|> 
               string ".org" <|>  string ".edu" ) <|> try ( string ".co.in" ) 
               <* many dontCare ) where 
                conCatfun a b c d e f dom rest = 
                       ( a : b : c : d : e : f ) ++ ( '@' : dom ) ++ rest 


collectEmail :: BS.ByteString -> String
collectEmail email = case  parse emailAddress "" email of
                        Right addr -> addr 
                        Left err ->  "" 

process :: ( Int , [ String ] ) -> BS.ByteString
process ( k , xs ) = ( BS.pack "Case " ) `BS.append` ( BS.pack . show $ k ) 
          `BS.append` ( BS.pack ": " ) `BS.append` ( BS.pack . show . length $ xs ) 
          `BS.append` ( BS.pack "\n" ) `BS.append` ( BS.pack  
          (  unlines . filter ( not . null ) $  xs ) )

main = BS.interact $ BS.concat .  map process . zip [ 1.. ] . map ( map collectEmail . 
       BS.words ) . tail . BS.lines

Mukeshs-MacBook-Pro:Haskell mukeshtiwari$ cat t.txt 
2
svm11@gmail.com
svm11@gmail.com svm12@gmail.co.in  ~!@#$%^&*()<>?svm12@gmail.co.in~!@#$%^&*()
Mukeshs-MacBook-Pro:Haskell mukeshtiwari$ ./Spoj_11105 < t.txt
Case 1: 1
svm11@gmail.com
Case 2: 2
svm11@gmail.com
svm12@gmail.co.in
svm12@gmail.co.in

Update

I tried again to solve this problem using regular expression. Following the tutorial, I wrote this code but I got compiler error because of old version of ghc on SPOJ ( GHC-6.10.4 ). It’s working fine on my system but I still have to test if it is correct and fast enough to get accepted.

import Data.List
import Text.Regex.Posix
import qualified Data.ByteString.Char8 as BS

pat :: BS.ByteString
pat = BS.pack "[^~!@#$%^&*()<>?,.]*[a-zA-Z0-9][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._]*@[a-zA-Z0-9]+.(com|edu|org|co.in)[^~!@#$%^&*()<>?,.a-zA-Z0-9]*"

collectEmail :: BS.ByteString -> BS.ByteString
collectEmail email = ( =~ ) email pat 

process :: ( Int , [ BS.ByteString ] ) -> BS.ByteString
process ( k , xs ) = ( BS.pack "Case " ) `BS.append` ( BS.pack . show $ k ) 
          `BS.append` ( BS.pack ": " ) `BS.append` ( BS.pack . show . length $ xs ) 
          `BS.append` ( BS.pack "\n" ) `BS.append` ( BS.unlines xs )


main = BS.interact $ BS.concat .  map process . zip [ 1 .. ] . 
       map ( filter ( not . BS.null ) . map collectEmail . BS.words ) . 
       tail . BS.lines

Mukeshs-MacBook-Pro:SPOJ mukeshtiwari$ cat t.txt 
2
svm11@gmail.com
svm11@gmail.com svm12@gmail.co.in %&^%&%&%&%&^%&^%&^%&^mukeshtiwari.iiitm@gmail.com%&%^&^%&%&%&%&^%&%&^%&^%&^% %$%$#%#%#%#%$#%&%&%&tiwa@gmail.com
Mukeshs-MacBook-Pro:SPOJ mukeshtiwari$ ./Spoj_11105 < t.txt 
Case 1: 1
svm11@gmail.com
Case 2: 3
svm11@gmail.com
svm12@gmail.co.in
mukeshtiwari.iiitm@gmail.com

March 8, 2013 Posted by tiwari_mukesh | Haskell, Programming | Haskell, Parsec, Parsing, SPOJ | Leave a comment

About

Hello All, My name is Mukesh Tiwari and I graduated from IIITM Gwalior in 2009. I am passionate about programming and mathematics. I love solving problems on SPOJ , UVA , Topcoder and Project Euler. I am curious about functional languages specially Haskell and it’s one the excellent language I encountered after python. I am also looking for some challenging functional programming job specially related to Haskell. You can reach me mukeshtiwariDOTiiitmATgmailDOTcom. Replace DOT with . and AT with @

March 2013

M T W T F S S

1 2 3

4 5 6 7 8 9 10

11 12 13 14 15 16 17

18 19 20 21 22 23 24

25 26 27 28 29 30 31

« Feb Apr »
Recent Comments

carl johnson on SPOJ 9126. Time to live
gratitude on SPOJ problem INTEGMAX
stan on SPOJ 9126. Time to live
Roni on SPOJ DIE HARD
avinish on SPOJ 8756. Shake Shake Sh…

March 2013
M	T	W	T	F	S	S
	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Twitter Updates
Tweets by mukesh_tiwari
Blog Stats
- 139,074 hits
Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Email Address:

Join 364 other subscribers
Search Articles

Search for:
Archives
Archives

My Weblog

Blog about programming and math

Parsing Email ID

Update

About

Recent Comments

Twitter Updates

Blog Stats

Follow Blog via Email

Search Articles

Archives

Site info

	carl johnson on SPOJ 9126. Time to live
	gratitude on SPOJ problem INTEGMAX
	stan on SPOJ 9126. Time to live
	Roni on SPOJ DIE HARD
	avinish on SPOJ 8756. Shake Shake Sh…