Besides getting through the effort of designing and building your software application, distributing it as commercial or shareware posses new challenges, when it comes to protecting your work.
Usually you will either distribute a trial version that comes with certain limitations that can be lifted by providing a serial key, or you would have a demo version, non unlock-able, and the full version as a separate software. Either you choose, your work can still be compromised, and turned into a complete freeware by a malicious cracker.
The weak spots
Let’s assume you opt to distribute a trial version of your software, that does some of the job, but not all, and the user needs to purchase and enter a serial key to get the software unlocked and fully functional. You do that to allow the user to test the app, but at the same time to constrain him to purchase the full app to cover some of the time you put in making it.
The entire logic / code is contained in your trial version application, it just needs a key (an alphanumeric combination) to go as fully registered , fully functional software. When you distribute it over the Internet, you loose all control over it. Your entire code is there, your hours of work and effort, protected only by a license key.
It will get cracked. Based on its popularity anywhere between a few hours and a few days. The weak spots are exploited to either: create a keygen or crack the code.
We assumed our trial version is protected by a serial key. When the software starts, it checks if a key has been provided. If not it runs in trial mode and disables some of the software’s functionality (to allow only demo but not full functionality). Now when the user tries to register and enters the key, what do you do? You generate a key in memory, based on some mechanism that you’ve implemented and you consider it bullet-proof, and then you compare it against the user input. Right? Well… this is the easy way to implement software registration, but it is also very wrong.
Considering the security aspects , this implementation is so easy to compromise, it will take an attacker only a few minutes to analyze your binary code, and extract the serial key algorithm. Because your code contains it, you use it to generate a key in memory, before comparing it to the user’s input, remember? So having your algorithm, the attacker will simply build a separate application, that creates free serial key registration keys for your application. It basically turns it into freeware, and sorry to say, you will never get any revenue to compensate the hard work you put into creating your software. Then all the cracker needs to do , is to distribute your application’s installer packed with his serial key generator. This is how a keygen is created.
Cracking the code
There is another way to compromise your shareware’s security. Remember your registration scheme? If the user entered a wrong key, you say “Invalid serial key”. IF the user entered the correct key, you say “Thanks for registering, all limitations have been removed”.
Now all the cracker needs to do is to locate that IF . Then by changing only one byte, it will turn the IF statement to something like if (TRUE) … . So is the key ok? TRUE! Is the software registered? TRUE. With only one byte changed and your code’s integrity compromised, the cracker has turned your shareware into… freeware. Again your work is in vain. This is what cracking the code refers to, in just a few words it is altering your software’s code integrity, to make it think it is a legitimate, registered, full copy and work without any trial version limitations.
This sounds terrible, is there anything I can do?
Well what you need to do, is to fix the two critical problems:
1. make sure your serial key algorithm is NOT included in your application.
2. make sure your code can’t be compromised, by making it check it’s own integrity
Easy to say, more difficult to put into practice. But certainly not impossible, as I took some time to locate these weak spots and after doing some research on possible solutions, I finally came up with something very close to a bullet proof shareware security solution.
The serial key vulnerability and one way of solving it
The typical serial key implementation can be iterated in the following steps:
1. the computer / mobile device has a certain ID, be it related to hardware, or to the user of the software, like the login name or the user’s email address. We want to bind the serial key algorithm to this ID (unique or not) to make sure the key will only unlock ONE shareware copy (or at least to limit the number of copies that can be activated with a single key). Without this particular feature, compromising the software is trivial: a malicious user (doesn’t even have to be a cracker), releases your global key to the public domain: now anyone could use it to register their copies. This case is trivial so it is not worth discussing.
So we use the unique (or pseudo unique) ID to generate the key. This operation is usually a type of scrambling the unique ID into a regulated form. If the ID can be an email address or variable length, than the serial key algorithm is usually a kind of hashing algorithm.
Example: let’s say the unique ID is the user’s email address: firstname.lastname@example.org and the serial key algorithm will simply compute the MD5 out of that text: email@example.com —> MD5 —> b58c6f14d292556214bd64909bcdb118 . This standard 256bit sequence is our key (or we could just take the first few characters of it).
2. Now a user wants to purchase your software on your website. He enters his email in the purchase form, pays, and receives his key (as per the example, the MD5 of the provided email address). He then takes the key and enters it in the appropriate field of the software he just payed for.
3. The software checked the device ID of that particular system. To stick to the example, it will be the email address configured for that computer/mobile device, the buyer’s email address. The license key is computed ( the MD5 as per the example) and checked against the user input. If the texts match, the software is registered and the limitations removed.
An attacker will analyze the application code, see the algorithm to generate the key from the given unique ID, and simply create a new application to do just that. For free. The keygen is ready and your application compromised.
As I already said, the solution is NOT to include the serial key mechanism in your distributed application. Thanks to asymmetric encryption, this is possible.
I will keep it simple:
1. The device has a unique/pseudo unique ID that we want to use for the reasons explained above.
2. We generate a pair of keys,a private one and a public one when we design the application. The public one is included in your application. The private one is not.
3. When the user purchases a serial key on your webpage, he will provide the unique ID (his email address to keep a parallel to the vulnerability exampled above). We use the secret private key to encrypt the unique ID provided. The encrypted message IS THE SERIAL KEY and the user receives that. Of course, we can structure it to a given standard length, using hyphens and alphanumeric characters or whatever for is appropriate.
4. The user then goes to our shareware software. In the registration screen he enters the key he just purchased. This time, we go a different route: we decrypt the key using our public key, embedded in the application. If they key is correct, we get the unique ID and we check that against the one computed on the device our software is running on.
As you see, there is no key generating algorithm included. There is nothing a cracker can do, to build a keygen, because he doesn’t have our private key. Problem solved. My personal choice was RSA. Of course, the cracker can still alter the binary code of our application, to make it think it is already registered, bypassing the key mechanism completely:
The code integrity vulnerability and how to protect against code changes
So now we have a bullet proof registration scheme, and no cracker in this world will ever be able to make a keygen, unless he is good at cryptanalysis and willing to crack our private/public key pair (this will probably never happen). The remaining issue is our code can still be changed, even without having the source code, by directly modifying the binary code with the help of diss-assemblers and other specialized tools. By doing so, an attacker can modify bits of our code’s logic, changing it completely. And the target is usually the part of code where we check the trial version / full version conditions. A simply byte is all it takes to make our checks futile, and have the code running as full version. This type of attack is called Patching and refers to altering the code integrity.
What can we do?
An approach is to check the code integrity. If you are a software developer you should know what a hash is. Using a simple hashing mechanism, we can compute a signature of our code. When the code is run, the first thing it does would be to make a self-check, and see if its own hash signature is the expected one. Will this work? Let’s take an example:
1. We compile our final code, before preparing to release our application. The HASH is 72f463e25b46f536af8e0fb24243bf13 (just a random example). What the code does, is when it starts, it generates its own HASH and checks if it is 72f463e25b46f536af8e0fb24243bf13. If it’s not, it means our code has been compromised so we stop the code. If the hash checks, we are not modified and can proceed running the application as normal.
2. So we take the hash 72f463e25b46f536af8e0fb24243bf13, add it to our code where we check the self generated hash with the expected one and recompile.
3. The problem is that by adding the hash and recompiling, we have changed the code’s hash. If we take the new hash, add it in the code and recompile, we go back to step 2.
It is a vicious circle!
To solve this problem, I had to find a different approach. It wasn’t easy, but at the end it was all about answering a simple question: “What does a programmer do, that a cracker will never do?” . And there is a very reasonable answer to that. The answer is: the programmer will write all the code, put a lot of time and effort into the application details, and build it one brick after the other. The cracker will never do any of that. He is just for a quick hit and run attack.
To put this into an example: imagine the tiny details for putting a button to some exact coordinates, making it to an exact given size, aligning the elements with a given, precise padding, selecting a color of a given code, and many other details behind the application, not necessarily related to the UI. The programmer invests the time to build all the details.
So the approach to save the vulnerability is to link the application tiny details together with the hashing code. Let’s take an example and for the sake of simplicity we assume we have a very simple application that contains just an interface with a given button, located at x,y and of size w,h. We know that the final code will have a hash sequence or a given length, but we can assume it at least 32Bytes long as in the case of MD5. We can assume we are using MD5 just to fix all the details of this example.
As the hashing sequence is long, we can assume with good probability that it will contains a large set of numbers. The following steps are crucial to be understood correctly:
1. We finalize all aspects of the application we want to protect, have it in the final stage just before distributing it over the Internet
2. We identify some crucial variables, that if not set correctly will make the application totally useless: UI size parameters, array dimensions, allocation sizes (nice one, if corrupted, the application will crash), algorithm related parameters and so on. The more we select, the stronger the protection .
3. We know that once we generate the hashing sequence, we will have a set of numbers, containing probably all the values we identified at 2. If this is a problem, we can either: 3a) use a longer hashing sequence 3b) use the hashing sequence to seed a pseudo random linear generator, that will put out plenty of values. Either we choose, it is important to define a standard. My approach was to generate a hashing sequence of 256 8bit values (that is 256bytes of hashing data).
4. We have identified the crucial application parameters at 2. We have a set of numbers at 3, containing amongst other data the numeric values we need for our parameters. We can’t pre-program them in our code, because of the vicious circle issue presented above. But what we can do is to build an array, to identify all relevant values in the hashing array.
We know our button must be at x,y and of size w,h. After compiling the code our hashing looks something like: a,b,c,x,w,e,y,q,a,h. All the values are there, x,y,w,h . We need to build the index table and instruct the initialization of our protection parameters to take the values as follows: x=hash , y=hash , w=hash, h=hash .
To sum up, the application will compute it’s own hash sequence at startup, then load the EXTERNAL index table, then init the crucial parameters with the corresponding values HASH_SEQUENCE[INDEX_TABLE]. To make things easier we’ll need a tool to compute the index table automatically. We know the parameters we need, the tool will just find their positions in the hashing sequence of our application’s final code. If the code is altered in any way (a cracker’s attack), the vital parameters will be the wrong values, as the hash sequence changes (because of the code change) and the index table now points to the wrong values. The buttons coordinates, but all the other protected parameters would get unpredictable wrong values. So the only way to have the application running (the cracker also wants that), is to leave it unmodified.
If we recompile, we will need to do all this security process all over again, as our hashing sequence will change and the indexes need to be updated.
A cracker will now need to guess the crucial parameters, either by analyzing the code (but this is hard, as a cracker will not invest time in calculating all the requirements like the developer did), or by dumping the memory and identifying the values at runtime. Either path is complicated, and the protection increases with the number of parameters used.
I remember when I developed this approach, feeling pleased for finally closing the vicious circles and having a better way to protect my work.
Then this bullet proof technique stood in place for years , protecting my own applications, when the average cracking period was of only a few days after my apps were released. I didn’t put any patents on it yet, and I am releasing it for free use to the public domain hoping it will prove useful to my fellow software developers.
This article has 3 Comments
Good paper, sincerely you will do more hard the work of the crackers/reversers.
But i think that you are wrong about some method, i explain you:
-To take a md5 sum of your code is not needed to get a sum of ALL your code, only little bits are required, your “validation” function or only some parts of code NOT ALL CODE, so you don´t need to do a sum of your hash hardcoded entering in a “crazy loop”.
-One good cracker (aka reverser) for sure will analyze and study your code for a long time if is required.
-One reverser “probably” will be a good developer in many languages like asm, c , c++, java, python and different cpu architectures.
-One reverser will put a memory break point in your HASH or your index table and will recalculate the index table with the correct values in runtime if is needed.
-To do that probably this reverser will injects code in your application and patch it with this method (there are many ways to inject own code in your code).
And remember one think, when your trick/protection is broken is broken for ever, the reverser will remember your method for ever , probably he will save the documentation in a paper and if the protection is hard, he will spend time writing a paper about that protection and how break it 😉
The reversing is nothing about breaking shareware or making free apps, is about REVERSING, the word means it all.
There are many teams in internet that learn together code and how break it.
Thanks for the valuable feedback.
E foarte fain si didactic scris articolul “Software security in Shareware”. Felicitari!
Cred insa ca la capitolul “The code integrity vulnerability and how to protect against code changes” ramane totusi o vulnerabilitate: spui ca aplicatia si calculeaza hash sequence-ul pentru byte-code-ul propriu , foloseste tabelul de indecsi stocat extern (fisier text banuiesc) si initializeaza variabilele si componentele UI critice cu acele valori. Conceptul este interesant, dar cred ca tocmai la pasul de generare a hash-ului ar putea fi mânărită aplicatia.
Nu am lucrat in ASM, dar banuiesc ca se pot identifica care sunt instructiunile care furnizeaza hash sequence-ul catre algoritmul de populare (ala bazat pe external index table) a valorilor. In acel loc se poate altera codul binar ca in loc sa furnizeze hash-ul calculat pe baza binary-code-ului, sa furnizeze tot timpul aceeasi valore hardcodata la nivel binar.
De exemplu rulezi pas cu pas aplicatia originala intr-un debugger ASM, vezi ce valoare are hash-ul calculat si apoi alterezi codul ca sa furnizeze tot timpul acea valoare catre algoritmul de populare. Dupa ce algoritmul de verificare a integritatii binare (byte-code-ul) a fost compromis, poti linistit altera codul binar mai departe pentru a scurtcircuita aplicatia in celalalt punct de interes major pentru un cracker: locul in care se verifica daca aplicatia ruleaza in mod trial sau full version si activeaza feature-urile corespunzatoare.
Acum nu stiu la ce fel de aplicatii te refereai. Daca e vorba de aplicatii Android, banuiesc ca acolo hash sequence-ul poate fi calculat extern, de catre OS. Dar pe Windows insa cred ca ar putea fi sparta protectia de integritate a codului binar urmand pasii pe care i-am explicat mai sus.
Ce parere ai?