My Weblog

Blog about programming and math

While Interpreter

Now a days I am trying to learn about static program analysis and started reading the Principal of Program Analysis. In order to understand the concepts, the book introduces a small programming language While.

I thought about writing interpreter for While to improve my Haskell skills. The very first task is in compilation/interpretation is breaking the source code into keywords, identifiers, operators, numbers and other tokens, known as lexical analysis. These tokens are passed to syntax analysis phase to combine the tokens into well formed expressions, statements and programs according to the grammar. The output of syntax analysis is abstract syntax tree which gives structural representation of of the input. Once we have abstract syntax tree, we can interpret or manipulate it for code generation. If you are interested in code generation then see Stephen Dielh blog.

The grammar for while program is

Grammar for expression
a  ::= x | n | - a | a opa a
b  ::= true | false | not b | b opb b | a opr a
opa ::= + | - | * | /
opb ::= and | or
opr ::= > | < 

Grammar for statements
S  ::= x := a | skip | S1; S2 | ( S ) | if b then S1 else S2 | while b do S 

The very first task is to define abstract syntax tree and Haskell algebraic data type makes this task almost trivial.

module AST ( Opa (..), Opb (..), Opr (..), AExpr (.. ) , BExpr ( .. ), 
             Stmt ( .. )  ) where



data Opa = Add
         | Sub
         | Mul
         | Div
         deriving Show

data Opb = And 
         | Or 
         deriving Show

data Opr = Greater 
         | Less 
         deriving Show


data AExpr = Var String
           | Num Integer
           | Neg AExpr
           | ABin Opa AExpr AExpr
           deriving Show

data BExpr = Con Bool
           | Not BExpr
           | BBin Opb BExpr BExpr
           | AL Opr AExpr AExpr
           deriving Show

data Stmt = List [ Stmt ]
          | Assing AExpr  AExpr
          | If BExpr Stmt Stmt
          | While BExpr Stmt
          | Skip
          deriving Show

The next task is design the lexical analyzer. We will use Parsec for this purpose.

module Lexer where
import Text.Parsec
import qualified Text.Parsec.Token as T
import Text.Parsec.Language ( emptyDef )
import Text.Parsec.String ( Parser )


lexer :: T.TokenParser ()
lexer = T.makeTokenParser emptyDef 
            {
                T.commentStart = "{-"
              , T.commentEnd = "-}"
              , T.reservedOpNames = [ "+", "-", "*", "/", ":=", ">", "<",
                                      "not", "and", "or" ]
              , T.reservedNames = [ "if", "then", "else", "while", "do", "skip",
                                  "true", "false" ]

            }


identifier :: Parser String
identifier = T.identifier lexer

whiteSpace :: Parser ()
whiteSpace = T.whiteSpace lexer

reserved :: String -> Parser ()
reserved = T.reserved lexer

reservedOp :: String -> Parser ()
reservedOp = T.reservedOp lexer

parens :: Parser a -> Parser a
parens = T.parens lexer

integer :: Parser Integer
integer = T.integer lexer

semi :: Parser String
semi =  T.semi lexer

semiSep :: Parser a -> Parser [ a ]
semiSep = T.semiSep lexer

Now lexical analyzer will assist the parser for creating the abstract syntax tree. A while program is list of statements and statement consists of expressions. We can build parse the expression by Parsec expression builder by giving the table of operators and associativity.

module Parser ( whileParser ) where

import Text.Parsec
import Text.Parsec.Expr
import Text.Parsec.String ( Parser )
import Control.Applicative  hiding ( (<|>) )
import Lexer
import AST

aTable = [  [   Prefix ( Neg <$ reservedOp "-" )                ]  
          , [   Infix  ( ABin Mul <$ reservedOp "*" ) AssocLeft
              , Infix  ( ABin Div <$ reservedOp "/" ) AssocLeft ]
          , [   Infix  ( ABin Add <$ reservedOp "+" ) AssocLeft
              , Infix  ( ABin Sub <$ reservedOp "-" ) AssocLeft ]
         ]



bTable = [  [  Prefix ( Not  <$ reservedOp "not" )               ]
          , [  Infix  ( BBin And <$ reservedOp "and" ) AssocLeft ] 
          , [  Infix  ( BBin Or  <$ reservedOp "or"  ) AssocLeft ]
         ] 
      
aExpression :: Parser AExpr
aExpression = buildExpressionParser aTable aTerm where 
         aTerm =  parens aExpression 
              <|> Var <$> identifier
              <|> Num <$> integer


bExpression :: Parser BExpr
bExpression = buildExpressionParser bTable bTerm where 
         bTerm =  parens bExpression 
              <|> (  Con True   <$ reserved "true"  ) 
              <|> (  Con False  <$ reserved "false" )
              <|> try (  AL Greater <$>  ( aExpression  <* reserved ">" ) 
                                    <*>    aExpression )
              <|> (  AL Less    <$>  ( aExpression  <* reserved "<" ) 
                                <*>    aExpression ) 


  

whileParser :: Parser Stmt
whileParser = whiteSpace *> stmtParser <* eof where 
            stmtParser :: Parser Stmt
            stmtParser =  parens stmtParser 
                      <|> List <$> sepBy stmtOne semi
            stmtOne :: Parser Stmt
            stmtOne =  ( Assing <$> ( Var <$> identifier ) 
                                <*> ( reserved ":=" *> aExpression ) )
                   <|> ( If <$> ( reserved "if" *> bExpression <* reserved "then" ) 
                            <*>   stmtParser 
                            <*> ( reserved "else" *> stmtParser ) )
                   <|> ( While <$> ( reserved "while" *> bExpression <*  reserved "do" ) 
                               <*>   stmtParser )
                   <|> ( Skip <$ reserved "nop" ) 

We have abstract syntax tree so we can interpret our program. You can think of a program as collection of commands which manipulates some memory location.

module Interpreter ( evalProgram )  where

import qualified Data.Map as M
import AST

type Store = M.Map String Integer

evalA :: AExpr -> Store -> Integer
evalA ( Var v ) s  =  M.findWithDefault 0 v s
evalA ( Num n ) _ = n
evalA ( Neg e ) s = - ( evalA e  s ) 
evalA ( ABin Add e1 e2 ) s = evalA e1 s + evalA e2 s
evalA ( ABin Sub e1 e2 ) s = evalA e1 s - evalA e2 s
evalA ( ABin Mul e1 e2 ) s = evalA e1 s * evalA e2 s
evalA ( ABin Div e1 e2 ) s = div ( evalA e1 s ) ( evalA e2 s )


evalB :: BExpr -> Store -> Bool
evalB ( Con b ) _ = b 
evalB ( Not e ) s = not ( evalB e s )
evalB ( BBin And e1 e2 ) s = ( && ) ( evalB e1 s )  ( evalB e2 s ) 
evalB ( BBin Or  e1 e2 ) s = ( || ) ( evalB e1 s )  ( evalB e2 s )
evalB ( AL Greater e1 e2 ) s = ( evalA e1 s ) > (  evalA e2 s )
evalB ( AL Less e1 e2 ) s = ( evalA e1 s ) < ( evalA e2 s )    

interpret :: Stmt -> Store -> Store
interpret ( Assing ( Var v ) expr ) s =  M.insert v ( evalA expr s )  s
interpret ( List [] ) s = s
interpret ( List ( x : xs ) ) s = interpret ( List xs ) ( interpret x s )
interpret ( If e st1 st2 ) s 
          | evalB e s = interpret st1 s
          | otherwise = interpret st2 s
interpret ( While e st ) s 
    | not t = s   
    | otherwise =  interpret ( While e st ) w 
    where
     t = evalB e s
     w = interpret st s

evalProgram :: Stmt -> Store
evalProgram st = interpret st M.empty

Now every thing is complete so let’s add main and write some while program.

module Main where
import System.Environment
import Text.Parsec
import Parser
import Interpreter




main = do 
     ( file : _ ) <- getArgs
     program <- readFile file
     case parse whileParser "" program of 
        Left e -> print e >> fail "Parse Error"
        Right ast -> print  ( evalProgram  ast )

While program to compute the greatest common divisor

Mukeshs-MacBook-Pro:whileinterpreter mukeshtiwari$ cat Gcd.while 
a := 10 ;
b := 100  ;
while ( b > 0 ) do 
 (
	t := b ;
    	b := a - ( a / b ) * b ;
	a := t 
)

Factorial program

Mukeshs-MacBook-Pro:whileinterpreter mukeshtiwari$ cat Fact.while 
x := 10 ;
y := x ;
z := 1 ;
while ( y > 1 )
  do 
   ( 
      z := z * y ;
      y := y - 1 
   );
y := 0 

Since our language doesn’t have IO instruction so we will have to see which variable store the result. In gcd program, the variable t stores the result and variable z stores the factorial of number.

Mukeshs-MacBook-Pro:src mukeshtiwari$ ghc -fforce-recomp Main.hs
[1 of 5] Compiling AST              ( AST.hs, AST.o )
[2 of 5] Compiling Lexer            ( Lexer.hs, Lexer.o )
[3 of 5] Compiling Interpreter      ( Interpreter.hs, Interpreter.o )
[4 of 5] Compiling Parser           ( Parser.hs, Parser.o )
[5 of 5] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
Mukeshs-MacBook-Pro:src mukeshtiwari$ ./Main ../Gcd.while 
fromList [("a",10),("b",0),("t",10)]
Mukeshs-MacBook-Pro:src mukeshtiwari$ ./Main ../Fact.while 
fromList [("x",10),("y",0),("z",3628800)]

I am not expert in either Haskell or compiler so if you have any comments then please let me know. The complete source code on github.

Advertisements

January 30, 2014 Posted by | Haskell, Programming | , , , , , | Leave a comment

SPOJ EMAIL ID

Finally solved EMAIL ID using python though it was very hard for me to switch from Haskell to python. While doing this problem, I learned quite a lot about regular expressions, found some cool site like pythontutor and problem solving with python . Accepted code in python.


import re

if __name__ == "__main__":
    n = int ( raw_input() )
    c = 1
    while c <= n :
        email =  re.findall ( "[a-zA-Z0-9][a-zA-Z0-9._]{4,}@[a-zA-Z0-9]+\.(?:com|edu|org|co\.in)", raw_input() )  
        t = len ( email )
        print 'Case #' + str ( c ) + ': ' + str ( t )
        for i in xrange ( t ) : print email[i]
        c += 1





March 11, 2013 Posted by | Programming, python | , , , | Leave a comment

Parsing Email ID

Now a days I am exploring one of the most awesome feature of Haskell, parsing. There are lot of parsing libraries but Parsec is the popular one. This parsing code is written for SPOJ EMAIL ID problem ( Unfortunately it’s getting time limit exceed. I have seen couple of python solution accepted so I am hopeful that there must be another algorithm probably using regular expressions to solve the problem ) but you can build more sophisticated email-id parser by adding more functionality. Tony Morris excellent parsing tutorial is must read.

import Data.List
import qualified Text.Parsec.ByteString as PB
import Text.Parsec.Prim
import Text.Parsec.Char
import Text.Parsec.Combinator
import qualified Data.ByteString.Char8 as BS
import Control.Applicative hiding ( ( <|> ) , many ) 

validChars :: PB.Parser Char
validChars  = alphaNum <|> oneOf "._" 

dontCare :: PB.Parser Char 
dontCare = oneOf "~!@#$%^&*()<>?,."

{--
emailAddress :: PB.Parser  String
emailAddress = do 
             _ <- many dontCare
             fi <- alphaNum
             se <- validChars
             th <- validChars
             fo <- validChars
             ft <- validChars
             restAddr <- many validChars
             let addr = fi : se : th : fo : ft : restAddr 
             char '@'
             dom <- many1 alphaNum 
             rest <- try ( string ".com" <|> string ".org"  
                  <|>  string ".edu" ) <|> try ( string ".co.in" )
             _ <- many dontCare
             return $  addr ++ (  '@': dom ++ rest ) 
   
--} 
          
emailAddress :: PB.Parser String
emailAddress = conCatfun <$> ( many dontCare *> alphaNum ) <*> validChars <*> 
               validChars <*> validChars <*> validChars <*> many alphaNum  <*> 
               ( char '@' *> many1 alphaNum ) <*> ( try ( string ".com" <|> 
               string ".org" <|>  string ".edu" ) <|> try ( string ".co.in" ) 
               <* many dontCare ) where 
                conCatfun a b c d e f dom rest = 
                       ( a : b : c : d : e : f ) ++ ( '@' : dom ) ++ rest 


collectEmail :: BS.ByteString -> String
collectEmail email = case  parse emailAddress "" email of
                        Right addr -> addr 
                        Left err ->  "" 

process :: ( Int , [ String ] ) -> BS.ByteString
process ( k , xs ) = ( BS.pack "Case " ) `BS.append` ( BS.pack . show $ k ) 
          `BS.append` ( BS.pack ": " ) `BS.append` ( BS.pack . show . length $ xs ) 
          `BS.append` ( BS.pack "\n" ) `BS.append` ( BS.pack  
          (  unlines . filter ( not . null ) $  xs ) )

main = BS.interact $ BS.concat .  map process . zip [ 1.. ] . map ( map collectEmail . 
       BS.words ) . tail . BS.lines

Mukeshs-MacBook-Pro:Haskell mukeshtiwari$ cat t.txt 
2
svm11@gmail.com
svm11@gmail.com svm12@gmail.co.in  ~!@#$%^&*()<>?svm12@gmail.co.in~!@#$%^&*()
Mukeshs-MacBook-Pro:Haskell mukeshtiwari$ ./Spoj_11105 < t.txt
Case 1: 1
svm11@gmail.com
Case 2: 2
svm11@gmail.com
svm12@gmail.co.in
svm12@gmail.co.in

Update

I tried again to solve this problem using regular expression. Following the tutorial, I wrote this code but I got compiler error because of old version of ghc on SPOJ ( GHC-6.10.4 ). It’s working fine on my system but I still have to test if it is correct and fast enough to get accepted.

import Data.List
import Text.Regex.Posix
import qualified Data.ByteString.Char8 as BS

pat :: BS.ByteString
pat = BS.pack "[^~!@#$%^&*()<>?,.]*[a-zA-Z0-9][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._]*@[a-zA-Z0-9]+.(com|edu|org|co.in)[^~!@#$%^&*()<>?,.a-zA-Z0-9]*"

collectEmail :: BS.ByteString -> BS.ByteString
collectEmail email = ( =~ ) email pat 

process :: ( Int , [ BS.ByteString ] ) -> BS.ByteString
process ( k , xs ) = ( BS.pack "Case " ) `BS.append` ( BS.pack . show $ k ) 
          `BS.append` ( BS.pack ": " ) `BS.append` ( BS.pack . show . length $ xs ) 
          `BS.append` ( BS.pack "\n" ) `BS.append` ( BS.unlines xs )


main = BS.interact $ BS.concat .  map process . zip [ 1 .. ] . 
       map ( filter ( not . BS.null ) . map collectEmail . BS.words ) . 
       tail . BS.lines 
Mukeshs-MacBook-Pro:SPOJ mukeshtiwari$ cat t.txt 
2
svm11@gmail.com
svm11@gmail.com svm12@gmail.co.in %&^%&%&%&%&^%&^%&^%&^mukeshtiwari.iiitm@gmail.com%&%^&^%&%&%&%&^%&%&^%&^%&^% %$%$#%#%#%#%$#%&%&%&tiwa@gmail.com
Mukeshs-MacBook-Pro:SPOJ mukeshtiwari$ ./Spoj_11105 < t.txt 
Case 1: 1
svm11@gmail.com
Case 2: 3
svm11@gmail.com
svm12@gmail.co.in
mukeshtiwari.iiitm@gmail.com

March 8, 2013 Posted by | Haskell, Programming | , , , | Leave a comment

Parsing with Applicative Functors

While reading Write Yourself a Scheme in 48 Hours, I thought of converting the monadic codes of parsing into applicative style. Learn you a Haskell for Great Good has excellent explanation about functors and applicative functors. Real World Haskell has excellent chapter on using applicative functors for parsing. Below is my attempt to convert the monadic codes of parsing into applicative style for Scheme parsing. On parsing note, see the excellent tutorial of Tony Morris and Albert Y. C. Lie.

import Control.Applicative hiding ( many , ( <|> )  )
import Text.ParserCombinators.Parsec hiding ( spaces )

symbol :: Parser Char
symbol =  oneOf "!$%&|+-*/:<=?>@^_~#"

spaces :: Parser ()
spaces = skipMany space 

parseString :: Parser LispVal
parseString = String <$> ( char '"' *>  x <*  char '"' ) where 
            x = many ( noneOf "\"" ) 
--This is not completely applicative style because I have to take the result for testing conditions.
parseAtom :: Parser LispVal
parseAtom = do 
       atom <-  ( : ) <$> ( letter <|> symbol ) <*> many ( letter <|> digit <|> symbol )
       return $ case atom of 
                "#t" -> Bool True
                "#f" -> Bool False
                _ -> Atom atom


parseNumber :: Parser LispVal
parseNumber = Number . read <$> many1 digit


parseList :: Parser LispVal
parseList = List <$> sepBy parseExpr space 

parseDottedList :: Parser LispVal
parseDottedList = DottedList <$> ( endBy parseExpr space ) <*> ( char '.' *> spaces *> parseExpr )


parseQuoted :: Parser LispVal
parseQuoted =  f  <$>   ( char '\'' *> parseExpr ) where 
              f x =   List [ Atom "quote" , x ] 

parseExpr :: Parser LispVal
parseExpr =  parseAtom 
         <|> parseString
         <|> parseNumber 
         <|> parseQuoted
         <|> ( char '(' *> spaces *> ( ( try parseList ) <|> parseDottedList ) <* spaces  <* char ')' )

Complete code on github ( Apart from applicative parsing every other code is from Scheme parsing tutorial ). If you have any comments or suggestion then please let me know.

January 18, 2013 Posted by | Haskell, Programming | , , , | Leave a comment