As developers we all know that we can cover our code as much as we can, but the odd behaviours will be there waiting to our code is in production to bother the users of our products.
Recently my team and I have been fighting with really extrain bugs, all because we decided to migrate one of our application servers to a new one with newer versions of OS and of course, newer versions of all the 3rd components we use as well. We use lame to encode the audio files users upload into our system, and this process worked fine in the old server and in the new server with the same execution line we were getting very poor quality files. We debugged the entire code and it was exactly doing the same on both servers, we had to review the release notes of the different versions of lame we were using and, with no luck, we couldn’t find what was causing the error. After a long process of collective thinking, tests etc, we could notice that our code was printing a very large number (6 digits) to set the bps param. In the oldest version of lame it makes the quality good while in the newer version it makes it poor. Finally we got an idea of where the issue was going to be.
We went back to our code and all the parameters were right and all the lines were in place, but one of the guys could notice a slight difference in one line… the difference between =~ and = ~ was causing the error.
I wanted to share this example because all of us have to waste time on that type of issues when we are responsible of a product with real users. How to prevent this, use as much tests as you can in your code, but that won’t bullet proof your code either.