Compression for P2P Networks

prometheus · November 13, 2022, 10:00pm

I know there are algorithms like gzip to compress files, but I am wondering if there are alternatives to compressing files for my use case. I have created a P2P browser network that stores data in LocalStorage (which has a limit of 5mb of storage). Here is the process:

Turn file into an ArrayBuffer
Split bytes into chunks of 50kb
Upload chunks to the network (peers get the data and store it in their LocalStorage)

The problem is that I had to increase the quota/limit for LocalStorage for file uploads to even work.

Currently I am thinking of an algorithm that looks at the byte array and then examines the array by chunks. Assume the array is of length N, the algo would start off by examining chunks with length N, then N-1, then N-2 and so on. If there was a large chunk of 0s of length L, for example, it would store this chunk as 0xL instead of the L number of 0s. If there was a chunk of 1s of length 43, for example, it would do the same, making it 1x43. Putting these compressed chunks in order would allow for lots of compression with varied chunk sizes. I don’t know how well this would work though since bytes tend to have different values and patterns may repeat themselves with more complexity than just repeating the same value over and over.

Any pointers to simple compression algorithms would be great! I would also like to discuss ways a new algorithm could be invented (because it’s fun!).

messede · November 14, 2022, 1:07pm

You also might want to take into consideration the computational overhead of compressing and decompressing each chunk instead of the entire file.

prometheus · November 14, 2022, 1:59pm

true, compressing text is relatively easy compared to videos, images and the like.

0x00pf · November 14, 2022, 2:56pm

What you described is known as RLE (Run-Lenght Encoding). However it is not the most efficient compression algorithm out there. Worked well with simple images when using 16 coulors palettes … zip and family are based on LZ family of algorithms (the initials of the creators). Take a look to LZ77.

Then just follow the links

prometheus · November 14, 2022, 2:58pm

Yay thanks for the advice, it’s reassuring to come up with ideas that already exist because it means I am on the right track/way of thinking. I’ll take a look into LZ77!

c0z · November 14, 2022, 3:00pm

Was going to recommend LZMA but you already got it. Nice

prometheus · November 14, 2022, 3:01pm

One of the things I am thinking about for the run-length encoding algorithm is to look for patterns that are more complex than just a repeated single-byte sequence. Think of how humans recognize 102030 or 123 or 2468 as patterns. You only need to remember the first number in the sequence and the pattern that follows from that.

0x00pf · November 14, 2022, 3:18pm

Sure… that is the idea behind LZ … but follow your path and you will learn way more

prometheus · November 14, 2022, 3:20pm

dang I’m getting ahead of myself

prometheus · November 14, 2022, 4:31pm

I found this useful API for compressing and decompressing streams. Turning whatever byte array you have into a ReadableStream will allow you to use this native browser API to compress data! May be helpful to those who read this post: Compression Streams API - Web APIs | MDN

system · March 15, 2023, 2:00pm

This topic was automatically closed after 121 days. New replies are no longer allowed.