#Continued Fraction Data Encoding
I’m learning to program and some time ago while thinking about a new project I came up with an idea to use continued fraction as an method to encode data that would produce unique result each time even if the input was always the same.
I would like to hear your thoughts about this idea and implementation of it. I suspect that it may be an unoriginal idea but I don’t know of any sources that mention it.
Installing required development library (command bellow should work most of the time) sudo apt install libgmp-dev
##Idea
Interpret bytes of data as a big number which will be a nominator then add random denominator and write it in the continued fraction form. Example
It is simple example and it doesn’t work in this form for arbitrary binary files.
##Implementation
To make it work for any binary file two things were required:
Append random prefix to nominator to preserve any leading zero digits. I appended 20 digit long random prefix and postfix to nominator.
Calculate and preserve greatest common denominator for nominator and denominator because when decoding from the continued fraction format, the fraction is in the simplest form
When nominator is being generated during the encoding phase in encoder_generate_number() function then the numbers that are the part of it are zero padded 3 digits long (000, 001, …, 255). It allows to distinguish where they begin and end in the sequence.
Data is encoded in chunks of 1000 bytes long and saved in format “0 x y 0 x y …” where 0 marks the beginning of the encoded sequence of bytes, x is greatest common denominator and y is encoded sequence.
Here is this project implemented in C https://github.com/Rit-Onri-Momari/continued-fraction-codec
To be able to always have different output when encoding any input, even if it is always the same.
It is a form of encryption.
To obfuscate data. Also, it is very easy to extend the way the output looks to fit particular requirements.
To be able to encode binary files in plain text format.
First and second were the goals when I came up with this idea.
For example when taking into consideration 2 and 3 it is easily possible modify the algorithm to produce output that is a string of random uppercase letters “ADSVZUWEIR…”, and when encoding take 11 letters, 10 to represent digits and one for separator between them and the rest would be just a filler to be ignored during decoding. The output each time would be completely different in length and structure, but very easy to decode.
What do you mean by assigning numbers to words and then creating paragraphs from data?
Could you please provide small example, to help me get the idea?
Lets say I pull a load of words from the english dictionary and assign them each an index in an array. 0 could be “hello”, 1 could be “world”, 2 could be “goodbye” etc.
So 0 1 2 could be “hello world goodbye”, it makes it readable and less suspecting.
It would be possible to combine this into above algorithm but it would be kinda hard and would complicate and slow the whole process. The dictionary the would have to be used during encoding and decoding so it would have to be distributed with the encoded message.
It would certainly be an expensive process. But for creeping past NGFW, this over HTTP could do the trick for data exfiltration. It depends how extreme the firewall is you’re up against.
It cannot be a RNG. Indeed, randomness applied to number generation is a vast and complex field. It could maybe be considered as a seeded PRNG but I doubt about its chance to pass the NIST standard
Ooh right I missed that part
As encryption it’s not really useful as there isn’t really a way of decrypting it, as a hash its not random enough.
For obfuscation its the same thing, it might be useful when using a non-random number though, especially when the output is in binary format (0 1 0 == 0x000000010000). It could be used to bypass certain filters, however most filters would still consider it binary data (if using it as binary).
It could also be used is steganography.
For each coordinate shift one number of the output and add it to the coordinates value.
What do you mean that there is no way of decrypting it?
Could you provide an example how it could be used as an binary? (I have one method in mind, wonder if it is the same as yours) This encoding is meant to be used in the textual form, so it is possible to encode binary data in text. Just like for example in base64.
I think you are right about steganography because the output and the form of it makes it easy to hide it in some other data. Could you please provide more details about your idea and some example? What do you mean by coordinate?
If you mean this by crypter: What is a crypter? how it work's | Way To Hackintosh
Then I think you could have use of the property of this encoding that the output of the encoding is always unique even if you always encode the same data. If you ever have some shareable experience of real use cases of this encoding please feel free to write.
@Noswis
I think you are right.
But I think I’ve made a mistake drawing the idea without drawing the implementation idea that has things that make it work.
Random number have to be added to the nominator and denominator as described in Implementation section.
Append random prefix to nominator to preserve any leading zero digits. I appended 20 digit long random prefix and postfix to nominator
@oaktree
What do you mean? If you mean the code is unoptimized then it is wery likely.
Well when using a random denominator each time, I don’t see how you would decrypt it unless ofcourse the denominator is the key.
My binary comment was when using a non-ascii output (binary one)
With crypter I meant, have part of the binary encrypted and unencrypt it on the go, perhaps use a smart ascii-shellcode like decryptor to decrypt the numbers so the whole binary looks like some garbled text and a lot of numbers.
then again you could write an ascii stub (executable code where each byte is a printable character) and just use it to base64 decrypt. For more info: https://nets.ec/Ascii_shellcode
The random denominator (and nominator check out the Implementation section) doesn’t affect the decryption process. It is actually the foundation of this encoding.
Check out https://en.wikipedia.org/wiki/Continued_fraction for details.
In short this works this way:
Mathematically 7/8 can be written as [0; 1, 7] so if we have only [0; 1, 7] or 7/8 we know what number we are dealing with. This means that if we change denominator in any way for example 7/32 that can be written as [0; 4, 1, 1, 3] we retain our nominator where our data resides but the way it is written is completely different. And there infinite ways of doing it.
There are some quirks to make it work on binary data but they are described in the Implementation section.
I’ve implemented encoder and decoder in C that you can check out on GitHub link provided in the article. You can use it to decode the examples provided there or just play with it. Just keep in mind, that the size of encoded data for this version is about 12.7 times larger that input data (in bytes).