Plagiarism detection
I was going through this plagiarism detector and thought of writing a Haskell program which will replace some of words from input text by its synonyms. I think , this problem belongs to Machine Learning and Natural Language processing because some times putting synonyms will change the context of statement. This program is quite simple and it uses dictionary for synonyms. Here is my dictionary file “dict.txt” , input file “input.txt” and output file “output.txt” looks like.
dict file
capricious fickle happiness ecstasy
input file
Hi how are you doing . why happiness is capricious .
output file
Hi how are you doing . why ecstasy is fickle .
Haskell sourcecode
import Data.List import System.Environment import Data.Map ( Map ) import qualified Data.Map as Map type Dictionary = Map String String -- Currently replace the word in dictionary helpChange :: [String] -> String -> Dictionary -> String helpChange [] ret _ = ret helpChange ( x : xs ) ret dict = case Map.lookup x dict of Nothing -> helpChange xs ( ret ++ " " ++ x ) dict Just str -> helpChange xs ( ret ++ " " ++ str) dict changeWord :: String -> Dictionary -> String changeWord str dict = final where tmpstr = words str final = helpChange tmpstr "" dict creatDic :: String -> Dictionary creatDic str = finalDict where list = map words $ lines str tmpDict = foldl ( \dict [ a , b ] -> Map.insert a b dict ) ( Map.empty ) list finalDict = foldl (\dict [ a , b ] -> Map.insert b a dict ) tmpDict list main = do [ input , output ] <- getArgs inpStr <- readFile input tmpdict <- readFile "dict.txt" let dict = creatDic tmpdict let final = changeWord inpStr dict writeFile output final
If you are going to run this program then make sure full stop ( . ) has initial space and final space otherwise words function will put this character with some word [ end. ] which may be present [ end ] in the dictionary but due to full stop it will return Nothing.Also see this post.
No comments yet.