I know there are algorithms like gzip to compress files, but I am wondering if there are alternatives to compressing files for my use case. I have created a P2P browser network that stores data in LocalStorage (which has a limit of 5mb of storage). Here is the process:
- Turn file into an ArrayBuffer
- Split bytes into chunks of 50kb
- Upload chunks to the network (peers get the data and store it in their LocalStorage)
The problem is that I had to increase the quota/limit for LocalStorage for file uploads to even work.
Currently I am thinking of an algorithm that looks at the byte array and then examines the array by chunks. Assume the array is of length N, the algo would start off by examining chunks with length N, then N-1, then N-2 and so on. If there was a large chunk of 0s of length L, for example, it would store this chunk as 0xL instead of the L number of 0s. If there was a chunk of 1s of length 43, for example, it would do the same, making it 1x43. Putting these compressed chunks in order would allow for lots of compression with varied chunk sizes. I don’t know how well this would work though since bytes tend to have different values and patterns may repeat themselves with more complexity than just repeating the same value over and over.
Any pointers to simple compression algorithms would be great! I would also like to discuss ways a new algorithm could be invented (because it’s fun!).
You also might want to take into consideration the computational overhead of compressing and decompressing each chunk instead of the entire file.
true, compressing text is relatively easy compared to videos, images and the like.
What you described is known as RLE (Run-Lenght Encoding). However it is not the most efficient compression algorithm out there. Worked well with simple images when using 16 coulors palettes … zip and family are based on LZ family of algorithms (the initials of the creators). Take a look to LZ77.
Then just follow the links
Yay thanks for the advice, it’s reassuring to come up with ideas that already exist because it means I am on the right track/way of thinking. I’ll take a look into LZ77!
Was going to recommend LZMA but you already got it. Nice
One of the things I am thinking about for the run-length encoding algorithm is to look for patterns that are more complex than just a repeated single-byte sequence. Think of how humans recognize 102030 or 123 or 2468 as patterns. You only need to remember the first number in the sequence and the pattern that follows from that.
Sure… that is the idea behind LZ … but follow your path and you will learn way more
dang I’m getting ahead of myself
I found this useful API for compressing and decompressing streams. Turning whatever byte array you have into a ReadableStream will allow you to use this native browser API to compress data! May be helpful to those who read this post: Compression Streams API - Web APIs | MDN