My Weblog

Blog about programming and math

Counting Inversions

First week of algorithm course programming assignment is to compute the number of inversion count of given array. We have a given array A[0…n – 1] of n distinct positive integers and we have to count all the pairs ( i , j ) such that i < j and A[i] > A[j].
1. Brute Force
Brute force idea is very simple. Run two loops and keep counting the respective pairs for which i < j and A[i] > A[j]. The complexity of brute force is O( n2 ).

bruteForce :: [ Int ] -> Int
bruteForce [] = 0
bruteForce [_] = 0
bruteForce ( x : xs ) = ( length . filter ( x > ) $ xs ) + bruteForce xs

*Main> bruteForce [ 3 , 1 , 2]
Loading package array-0.4.0.1 ... linking ... done.
Loading package deepseq-1.3.0.1 ... linking ... done.
Loading package bytestring-0.10.0.0 ... linking ... done.
2
*Main> bruteForce [ 2,3,8,6,1]
5

2. Divide and Conquer
We can modify the merge sort to compute the inversion count. The idea is to compute the inversion count of during the merge phase. Inversion count of array C resulting from merging two sorted array A and B is sum of inversion count of A, inversion count of B and their cross inversion. The inversion count of single element is 0 ( base case ). The only trick is compute the cross inversion. What will be the inversion count of following array if we merge them ?
A = [ 1 , 2 , 3 ]
B = [ 4 , 5 , 6 ]
Clearly 0 because all element of A is less than the first element of B. What is the inversion count of C = [ 4 , 5 , 6 ] and D = [ 1 , 2 , 3 ] ? First element of D ( 1 ) is less than first element of C ( 4 ) so we have total three pairs ( 4 , 1 ) ( 5 , 1 ) ( 6 , 1). Similarly for 2 and 3 ( total 9 inversion count ). It reveals that if we have any element y in D is less than certain element x in C then all the element of A after x will also be greater than y and we will have those many cross inversion counts. More clear explanation. Wrote a quick Haskell code and got the answer.

import Data.List
import Data.Maybe ( fromJust )
import qualified Data.ByteString.Lazy.Char8 as BS

inversionCnt :: [ Int ] -> [ Int ] -> [ Int ] -> Int -> ( Int , [ Int ] )
inversionCnt [] ys  ret  cnt = ( cnt , reverse ret ++ ys ) 
inversionCnt xs []  ret  cnt = ( cnt , reverse ret ++ xs )
inversionCnt u@( x : xs ) v@( y : ys )  ret  cnt
      | x <= y = inversionCnt xs v  (  x : ret   )   cnt
      | otherwise = inversionCnt u ys  ( y : ret )  ( cnt + length u )  

merge ::  ( Int , [ Int ] ) -> ( Int , [ Int ] ) -> ( Int , [ Int ] )
merge ( cnt_1 , xs ) ( cnt_2 , ys ) = ( cnt_1 + cnt_2 + cnt , ret ) where
        ( cnt , ret ) = inversionCnt xs ys []   0

mergeSort ::  [ Int ] -> ( Int , [ Int ] )
mergeSort [ x ] = ( 0 , [ x ] )
mergeSort  xs   = merge ( mergeSort xs' ) ( mergeSort ys' ) where
           ( xs' , ys' ) = splitAt ( div ( length xs ) 2 ) xs
          
main =  BS.interact $ ( \x -> BS.pack $ show x ++ "\n" ) . fst 
        .  mergeSort . map ( fst . fromJust .  BS.readInt :: BS.ByteString -> Int  ) 
        . BS.lines 


Mukeshs-MacBook-Pro:Puzzles mukeshtiwari$ ghc  -fforce-recomp -O2  Inversion.hs 
[1 of 1] Compiling Main             ( Inversion.hs, Inversion.o )
Linking Inversion ...
Mukeshs-MacBook-Pro:Puzzles mukeshtiwari$ time ./Inversion <  IntegerArray.txt 
2407905288

real	0m8.002s
user	0m7.881s
sys	0m0.038s

I tried to compare this solution with GeeksforGeeks‘s solution written in C and it was a complete surprise to me.

Mukeshs-MacBook-Pro:Puzzles mukeshtiwari$ time ./a.out <  IntegerArray.txt 
 Number of inversions are 2407905288 

real	0m0.039s
user	0m0.036s
sys	0m0.002s

Almost 200 times faster! My Haskell intuition told me that I am doing certainly some wrong. I started analyzing the code and got the point. In inversionCnt function, I am doing extra work for computing the length of list u ( O ( n ) ) so rather than doing this extra work, I can pass the length.

import Data.List
import Data.Maybe ( fromJust )
import qualified Data.ByteString.Lazy.Char8 as BS


inversionCnt :: [ Int ] -> [ Int ] -> [ Int ] -> Int -> Int -> ( Int , [ Int ] )
inversionCnt [] ys  ret _ cnt = ( cnt , reverse ret ++ ys ) 
inversionCnt xs []  ret _ cnt = ( cnt , reverse ret ++ xs )
inversionCnt u@( x : xs ) v@( y : ys )  ret n cnt
      | x <= y = inversionCnt xs v  (  x : ret   )  ( pred n ) cnt
      | otherwise = inversionCnt u ys  ( y : ret ) n ( cnt + n )  

merge ::  ( Int , [ Int ] ) -> ( Int , [ Int ] ) -> ( Int , [ Int ] )
merge ( cnt_1 , xs ) ( cnt_2 , ys ) = ( cnt_1 + cnt_2 + cnt , ret ) where
        ( cnt , ret ) = inversionCnt xs ys [] ( length xs )  0

mergeSort ::  [ Int ] -> ( Int , [ Int ] )
mergeSort [ x ] = ( 0 , [ x ] )
mergeSort  xs   = merge ( mergeSort xs' ) ( mergeSort ys' ) where
           ( xs' , ys' ) = splitAt ( div ( length xs ) 2 ) xs
          
main =  BS.interact $ ( \x -> BS.pack $ show x ++ "\n" ) . fst 
        .  mergeSort . map ( fst . fromJust .  BS.readInt :: BS.ByteString -> Int  ) 
        . BS.lines 

Mukeshs-MacBook-Pro:Puzzles mukeshtiwari$ time ./Inversion <  IntegerArray.txt 
2407905288

real	0m0.539s
user	0m0.428s
sys	0m0.031s

Now 10 times slower and still lot of room for improvement but for now I am happy with this :). You can also try to solve INVCNT. Run this code on Ideone.

July 4, 2013 - Posted by | Haskell, Programming | , , , ,

2 Comments »

  1. I was wondering whether there is way to structure this code so that we don’t have to do that many pattern matching. mergesort is function from [Int] -> [Int], extra inversion count which we need to carry can thought of logging. This is what I come up with

    import Control.Monad
    import Control.Monad.Writer
    import Data.Monoid

    merge :: [Int] -> [Int] -> Writer (Sum Int) [Int]
    merge [] ys = return ys
    merge xs [] = return xs
    merge (x:xs) (y:ys)
    | x <= y = do { temp <- merge xs (y:ys); return (x:temp) }
    | otherwise = do
    temp Writer (Sum Int) [Int]
    mergeSort [] = return []
    mergeSort [x] = return [x]
    mergeSort xs = do
    let (lx, rx) = splitAt (div (length xs) 2) xs
    slx <- mergeSort lx
    srx <- mergeSort rx
    merge slx srx

    Comment by divyanshu ranjan (@rdivyanshu) | July 29, 2013 | Reply

  2. bad formatting : http://ideone.com/YfP9cI

    Comment by divyanshu ranjan (@rdivyanshu) | July 29, 2013 | Reply


Leave a reply to divyanshu ranjan (@rdivyanshu) Cancel reply