[TuT] Substitution Cracking [Breaking Ciphertexts]
Though not common to encounter custom encodings, it’s still a good idea to have at least a basic notion of what they are all about. We could easily say that most failures to get access to a server are due to data concealment. Most developers won’t bother to create custom ciphers and would just apply some of the well-known message digest algorithms or none at all. So in this tutorial, I’ll go through the following:
Substitution basics
- To get an idea what the tutorial will represent
Homophonic & Polygraphic substitutions
- Basic, yet important to know
Recovery methods and techniques
- Frequency analysis, shiftings and others
Now let’s get on with the definitions. Substitution is basically replacing plaintext letters, pairs of letters or groups of letters with ciphertext ones. Let’s take an example with the English alphabet which is actually used most of the time for substitutions:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
And make our substitution ciphertext values. They are all corresponding to each other as follows and therefore need not to exceed the number of plaintext values.
U L Z J D M X Y E A Q S B N I W P T V O R F C K G H
So imagine we need to encipher the plaintext ‘Keeper’. Following our substitution, it would look like that:
Qddwdt
A pretty basic PHP-based example of this would be the following:
<?php
error_reporting(0);
$input = strtoupper('Keeper');
$storing = array(
'A','B','C','D','E','F',
'G','H','I','J','K','L',
'M','N','O','P','Q','R',
'S','T','U','V','W','X','Y','Z'
);
$storing2 = array(
'U','L','Z','J','D','M',
'X','Y','E','A','Q','S',
'B','N','I','W','P','T',
'V','O','R','F','C','K','G','H'
);
print str_replace($storing,$storing2,$input);
?>
Homophonic substitution:
In homophonic substitutions, plaintext letters correspond to more than one ciphertext value. Let’s take an example of the following substitution scheme:
..
K = JGP
E = LXZ
P = HFY
R = OYB
...
So then our ciphertext of ‘Keeper’ would look like:
JGPLXZLXZHFYLXZOYB
Basic addition of strings, nothing awkward. As well as that, we could also kinda obfuscate the ciphertext values, mixing and merging uppercase, lowercase and reverse values. Suppose we have the following substitution where we include mixed strings:
A = kNP
B = TMq
..
A . B = 'kNP'+'TMq'
In that way, frequency analysis, vowel trowels, pairs of letters and anything related to decoding the ciphertext (or reversing the function) would be pretty much hindered. There could be made many additions to that concept as long as it maps more than one ciphertext value to a single letter from the alphabet (not necessary to be an alphabet at all).
Polygraphic substitution:
While homophonic substitution is just assigning more than one value, the polygraphic one maps entire groups/patterns of letters. For example, till now we had a single letter matching a corresponding another (basic substitution), single letter matching a string of other letters and now a whole pattern of letters grouped together are being represented by another group of ciphertext values. As you might guess this is getting even more secure for the encoding structure.
Suppose we have the following scheme:
ABC = ION
DEF = LKX
GHI = XBT
..
What the idea here would be is that the concatenation of the ciphertexts is still applicable. So there is nothing different except the process of grouping and mapping the values.
ABC (+) DEF = 'ION' + 'LKX'
..
Let’s see a reversible example with a polygraphic substitution in PHP:
<?php
error_reporting(0);
$input = strtoupper('Keeper');
$storing = array(
'ABC','DEF',
'GHI','JKL',
'MNO','PQR',
'STU','VWX','YZ'
);
$storing2 = array(
'ULZ','JDM',
'XYE','AQS',
'BNI','WPT',
'VOR','FCK','GH'
);
print strrev(str_replace($storing,$storing2,$input));
?>
Okay, so now that we have the basis and we know what the cracking is gonna be all about, let’s proceed with the different types of analysis and see how we can go about decoding custom ciphertexts (encodings). The idea here is to reverse the function for the enciphering. One of methods that is mostly being implemented in such occasions is the frequency analysis. Suppose we have the following ciphertext. Just a bunch of scrambled, words and sequence of letters that make no sense at first sight.
odawMcA :zznM dld nzK .moP zd nchska qlxlm eeiw zot ydpy gmipylnoC ,zot gmiyqdncyzo ho ydpy ci lniqk tN .lwid sooe tlpy ydpw yom ,smipy amd tdc tlpy ydpw tj kdwhfj ixalpdw vk cmxn vj koqlp nB .kwxawdj ot cmxn vj koqlp nQ .kdoncva o sj i lkP
Let’s put this on a frequency analysis scale:
Here we can check the repentance of each letter. Based on that we can make suggestions on our own substitution scheme and apply it to the ciphertext. For instance, we could map our own values and code a script to make the comparison studies. The best attack against any encoding is the polymorphic permutation where we can have different outputs on every looping of the ciphertext correspondence scheme. The idea of the permutation in addition to the polymorphism is that it generates logical sequence instead of scrambled possible values.
Now let’s do that in PHP to see how it’s being processed:
<?php
..
$wrdlst = 'dictionary.txt';
$output = 'Your cipher schemes and processes';
foreach($wrdlst as $line){
if(preg_match_all($output, $wrdlst, $matches)){
return str_replace($line, $output, $wrdlst);
}
}
?>
That’s just a module for a comparison script with a designated wordlist.
Now let’s see a frequency analysis of the repeated letters in the ciphertext:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Repeats in the ciphertext as follows and corresponds to above:
0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
The thing that we are doing in that situation is to generate large amounts of wordlists that would handle permutations as stated above and compare whether the generations match any logical English words. If not, we ought to apply another method. Another notion for that is to check the length of the ciphertext and try to divide it into equally allocated parts of strings and then proceed with the polymorphic permutations.
Conclusion:
Tried to keep things simple as always. After all, I’m just introducing members to those techniques and not marking up the way of their own contemplation. I’ll for sure make part two of that tutorial since I barely got into the substitution cracking techniques. For the next part of the series I’ll start on this, I’ll code a custom decipher class and implement it against an actual stream cipher. Thanks for reading!