My Weblog

Blog about programming and math

Plagiarism detection

I was going through this plagiarism detector and thought of writing a Haskell program which will replace some of words from input text by its synonyms. I think , this problem belongs to Machine Learning and Natural Language processing because some times putting synonyms will change the context of statement. This program is quite simple and it uses dictionary for synonyms. Here is my dictionary file “dict.txt” , input file “input.txt” and output file “output.txt” looks like.
dict file

capricious  fickle
happiness ecstasy

input file

Hi how are you doing . why happiness is capricious . 

output file

 Hi how are you doing . why ecstasy is fickle .

Haskell sourcecode

import Data.List
import System.Environment
import Data.Map ( Map )
import qualified Data.Map as Map
type Dictionary = Map String  String

-- Currently replace the word in dictionary  

helpChange :: [String] -> String -> Dictionary -> String 
helpChange [] ret _ = ret
helpChange ( x : xs ) ret dict = 
	case  Map.lookup x dict of 
		Nothing  -> helpChange xs (  ret ++ " " ++ x  ) dict 
		Just str -> helpChange xs (  ret ++ " " ++ str) dict   

 
changeWord :: String -> Dictionary -> String 
changeWord str dict =  final where 
	tmpstr = words str 
	final = helpChange tmpstr "" dict 
	
				
creatDic :: String -> Dictionary
creatDic  str = finalDict where 
	list = map words $ lines str
	tmpDict = foldl ( \dict [ a , b ] -> Map.insert a b dict  ) ( Map.empty ) list
	finalDict = foldl (\dict [ a , b ] -> Map.insert b a dict ) tmpDict list 
	

main = do
	[ input , output ] <- getArgs 
	inpStr <- readFile input
	tmpdict <- readFile "dict.txt"
	let dict = creatDic  tmpdict
	let final = changeWord inpStr dict
	writeFile output final 

If you are going to run this program then make sure full stop ( . ) has initial space and final space otherwise words function will put this character with some word [ end. ] which may be present [ end ] in the dictionary but due to full stop it will return Nothing.Also see this post.

Advertisements

October 4, 2011 - Posted by | Programming | , ,

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: