Saturday, December 12, 2009

Today i decided to test out a compression algorithm which i had in my mind for a few months.
Let us assume we have a binary file (picture file for eg) 2kb gif file.

The hex data being...

A9 B0 C1 D2 D3 F5 ED 34 ..... blah blah (2048 bytes long).

Let us treat this as a huge number ( when converted to decimal should be around 3000 digits length approx)

if we try to write this huge number in the format.. X^ Y + R

where 1 < X < N
&
1 < Y < N

for eg.. 34^3430 + 394

we just need this expression to reproduce the picture file.

The memory size to hold the expression is just 11 bytes for a 2048 byte pic file.

The idea looked convincing for me.

I searched the net and found one forum thread discussing this topic and it had information on why it won't work.

I decided to test it out myself. It just took me an hour to realize why this algorithm won't work out.

N = X^Y + R

On a few sample runs i found the best and worst case scenarios for this algorithm.

Best case would be,

when X=Y ( or approx equal) and R =0(or very small OR comparable with X and Y);

Worst case would be,
when N can be represented as M + 2*sqroot(M). where M is an integer.

Generally,

when N is represented as X^Y + R

and X and Y are greater than 1,

then the following holds good
1 < X < sqroot(N)
1 < Y < log2(N)
1 < R < 2*sqroot(N)


For worst case scenario.

X =~ sqroot(N)
Y = 2
R =~ 2*sqroot(N)

Generally if the number is n digit long.. then its sqroot will be n/2 digits long.

total expression length is (n/2 + 1 + n/2)
so the expression itself will be as big as the number.
The complexity is O(n)


For Best case scenario:

X =~ Log(N)
Y =~ Log(N)
R =~ X

Total expression length is Log(Log N) + Log(Log N) + Log(N)

so the complexity is (Log(Log N)) which is like 99.9% compression. :)

But the problem is , The probability that a bigger file falls near the best case scenario is very very very less.

Since X ^ Y are often separated by miles in the number line , especially when X and Y are large numbers.

Needless to mention that even for a super computer this algorithm would take years to encode bigger files.

Case closed. :)

No comments: