Counting Inversions
First week of algorithm course programming assignment is to compute the number of inversion count of given array. We have a given array A[0…n – 1] of n distinct positive integers and we have to count all the pairs ( i , j ) such that i < j and A[i] > A[j].
1. Brute Force
Brute force idea is very simple. Run two loops and keep counting the respective pairs for which i < j and A[i] > A[j]. The complexity of brute force is O( n2 ).
bruteForce :: [ Int ] -> Int bruteForce [] = 0 bruteForce [_] = 0 bruteForce ( x : xs ) = ( length . filter ( x > ) $ xs ) + bruteForce xs *Main> bruteForce [ 3 , 1 , 2] Loading package array-0.4.0.1 ... linking ... done. Loading package deepseq-1.3.0.1 ... linking ... done. Loading package bytestring-0.10.0.0 ... linking ... done. 2 *Main> bruteForce [ 2,3,8,6,1] 5
2. Divide and Conquer
We can modify the merge sort to compute the inversion count. The idea is to compute the inversion count of during the merge phase. Inversion count of array C resulting from merging two sorted array A and B is sum of inversion count of A, inversion count of B and their cross inversion. The inversion count of single element is 0 ( base case ). The only trick is compute the cross inversion. What will be the inversion count of following array if we merge them ?
A = [ 1 , 2 , 3 ]
B = [ 4 , 5 , 6 ]
Clearly 0 because all element of A is less than the first element of B. What is the inversion count of C = [ 4 , 5 , 6 ] and D = [ 1 , 2 , 3 ] ? First element of D ( 1 ) is less than first element of C ( 4 ) so we have total three pairs ( 4 , 1 ) ( 5 , 1 ) ( 6 , 1). Similarly for 2 and 3 ( total 9 inversion count ). It reveals that if we have any element y in D is less than certain element x in C then all the element of A after x will also be greater than y and we will have those many cross inversion counts. More clear explanation. Wrote a quick Haskell code and got the answer.
import Data.List import Data.Maybe ( fromJust ) import qualified Data.ByteString.Lazy.Char8 as BS inversionCnt :: [ Int ] -> [ Int ] -> [ Int ] -> Int -> ( Int , [ Int ] ) inversionCnt [] ys ret cnt = ( cnt , reverse ret ++ ys ) inversionCnt xs [] ret cnt = ( cnt , reverse ret ++ xs ) inversionCnt u@( x : xs ) v@( y : ys ) ret cnt | x <= y = inversionCnt xs v ( x : ret ) cnt | otherwise = inversionCnt u ys ( y : ret ) ( cnt + length u ) merge :: ( Int , [ Int ] ) -> ( Int , [ Int ] ) -> ( Int , [ Int ] ) merge ( cnt_1 , xs ) ( cnt_2 , ys ) = ( cnt_1 + cnt_2 + cnt , ret ) where ( cnt , ret ) = inversionCnt xs ys [] 0 mergeSort :: [ Int ] -> ( Int , [ Int ] ) mergeSort [ x ] = ( 0 , [ x ] ) mergeSort xs = merge ( mergeSort xs' ) ( mergeSort ys' ) where ( xs' , ys' ) = splitAt ( div ( length xs ) 2 ) xs main = BS.interact $ ( \x -> BS.pack $ show x ++ "\n" ) . fst . mergeSort . map ( fst . fromJust . BS.readInt :: BS.ByteString -> Int ) . BS.lines Mukeshs-MacBook-Pro:Puzzles mukeshtiwari$ ghc -fforce-recomp -O2 Inversion.hs [1 of 1] Compiling Main ( Inversion.hs, Inversion.o ) Linking Inversion ... Mukeshs-MacBook-Pro:Puzzles mukeshtiwari$ time ./Inversion < IntegerArray.txt 2407905288 real 0m8.002s user 0m7.881s sys 0m0.038s
I tried to compare this solution with GeeksforGeeks‘s solution written in C and it was a complete surprise to me.
Mukeshs-MacBook-Pro:Puzzles mukeshtiwari$ time ./a.out < IntegerArray.txt Number of inversions are 2407905288 real 0m0.039s user 0m0.036s sys 0m0.002s
Almost 200 times faster! My Haskell intuition told me that I am doing certainly some wrong. I started analyzing the code and got the point. In inversionCnt function, I am doing extra work for computing the length of list u ( O ( n ) ) so rather than doing this extra work, I can pass the length.
import Data.List import Data.Maybe ( fromJust ) import qualified Data.ByteString.Lazy.Char8 as BS inversionCnt :: [ Int ] -> [ Int ] -> [ Int ] -> Int -> Int -> ( Int , [ Int ] ) inversionCnt [] ys ret _ cnt = ( cnt , reverse ret ++ ys ) inversionCnt xs [] ret _ cnt = ( cnt , reverse ret ++ xs ) inversionCnt u@( x : xs ) v@( y : ys ) ret n cnt | x <= y = inversionCnt xs v ( x : ret ) ( pred n ) cnt | otherwise = inversionCnt u ys ( y : ret ) n ( cnt + n ) merge :: ( Int , [ Int ] ) -> ( Int , [ Int ] ) -> ( Int , [ Int ] ) merge ( cnt_1 , xs ) ( cnt_2 , ys ) = ( cnt_1 + cnt_2 + cnt , ret ) where ( cnt , ret ) = inversionCnt xs ys [] ( length xs ) 0 mergeSort :: [ Int ] -> ( Int , [ Int ] ) mergeSort [ x ] = ( 0 , [ x ] ) mergeSort xs = merge ( mergeSort xs' ) ( mergeSort ys' ) where ( xs' , ys' ) = splitAt ( div ( length xs ) 2 ) xs main = BS.interact $ ( \x -> BS.pack $ show x ++ "\n" ) . fst . mergeSort . map ( fst . fromJust . BS.readInt :: BS.ByteString -> Int ) . BS.lines Mukeshs-MacBook-Pro:Puzzles mukeshtiwari$ time ./Inversion < IntegerArray.txt 2407905288 real 0m0.539s user 0m0.428s sys 0m0.031s
Now 10 times slower and still lot of room for improvement but for now I am happy with this :). You can also try to solve INVCNT. Run this code on Ideone.
I was wondering whether there is way to structure this code so that we don’t have to do that many pattern matching. mergesort is function from [Int] -> [Int], extra inversion count which we need to carry can thought of logging. This is what I come up with
import Control.Monad
import Control.Monad.Writer
import Data.Monoid
merge :: [Int] -> [Int] -> Writer (Sum Int) [Int]
merge [] ys = return ys
merge xs [] = return xs
merge (x:xs) (y:ys)
| x <= y = do { temp <- merge xs (y:ys); return (x:temp) }
| otherwise = do
temp Writer (Sum Int) [Int]
mergeSort [] = return []
mergeSort [x] = return [x]
mergeSort xs = do
let (lx, rx) = splitAt (div (length xs) 2) xs
slx <- mergeSort lx
srx <- mergeSort rx
merge slx srx
Comment by divyanshu ranjan (@rdivyanshu) | July 29, 2013 |
bad formatting : http://ideone.com/YfP9cI
Comment by divyanshu ranjan (@rdivyanshu) | July 29, 2013 |