The biggest problem with substitution ciphers (those that replace each letter with a particular other letter or symbol) is that they are vulnerable to frequency analysis. In any language, some letters are more common than others. By matching up the most common symbols with what you know the most common letters are, you can begin deciphering the message. Likewise, you can use rules like ‘a rare letter than almost always appears to the left of one specific more common letter is probably a Q.’ What is needed to strengthen such ciphers is a language in which words have no such ‘personality.’ Here is how to do it:
First, take all the short words (less than three letters) and assign them a random three digit code. Lengthening very short words further strengthens this approach because short words are the most vulnerable to frequency analysis; a single letter sitting with spaces on either side is probably ‘a’ or ‘i.’ Using three digit groups and 26 letters, you can assign 17,576 words. Now, take as many words from the whole language as you want to be able to use. For the sake of completeness, let’s use the entire Oxford English Dictionary. The 456,976 possible four letter groups more than suffice to cover every word in it, leaving some space for technical terms that we may want to encrypt but which might not be included. If we need even more possibilities, there are 11,881,376 five letter combinations.
This approach is cryptographically valuable for a number of reasons. Since the codes representing words have a random collection of letters, the letter frequency in a ‘translated’ message is also random. You no longer need to worry that some English letters are more common than others. Just as important, there are none of the ‘Q’ type rules by which to later attack the substitution cipher. The dictionary of equivalencies would not need to be secret; indeed, it should be widely available. Having the dictionary does not make encrypted messages more vulnerable, since they will have passed through a substitution cipher before being distributed and are fundamentally more robust to the cryptoanalysis of substitution ciphers than a message enciphered from standard English would be.
In the era of modern algorithms like AES, I doubt there is any need for the above system. Still, I wonder if there are any historical examples of this approach being used. If you have a computer to do the code-for-word and word-for-code substitutions, it would be quite a low effort mechanism to increase security.