Static Code Analysis
Static code analysis or Source code analysis is the method of debugging a program without running the code. Static analysis helps the programmers align with the standard code guidelines. It also helps in building an understanding of the code structure.
Static analysis tools are a life saver for programmers who have to deliver high-quality code in a short period of time. Developers have to ensure quality standards to pass a sprint and static code analysis does just the right thing. The technique helps identify weaknesses in the code which might be a problem for the developers or the QA later on in the SDLC (Software development lifecycle). While developers can perform manual reviews with peers and seniors, automated tools to quickly perform a source code analysis can be game-changing.
Static code analysis unlike dynamic analysis is performed at an earlier stage in the development lifecycle. It begins before the testing part and in companies using DevOps as their primary tool, static code analysis is scheduled around the Create step.
Types of Static Analysis
There are multiple types of performing a static code analysis to scan for potential weaknesses in the source code. Such techniques are more often than not, derived from the compiler systems.
Data Flow Analysis
Data flow analysis is a technique in which, without executing the program, the runtime information of the programs is collected and the data points are used to identify the potential issues in the code. The following are three terms used in Data Flow Analysis which we’ll further look at in detail:
- Basic Block (Code)
- Control Flow Graph (Data flow)
- and Control Flow Path (Data path)
Basic block (Code)
A basic block is a sequence of code statements. The control flow begins at the start of the statement and exits the end without any kind of hurdles in the path except maybe at the end (at the last instruction of the block). This form of confined form of a basic block makes it highly flexible to analyze.
Example:
Control Flow Graph
A technique used to determine the control flow of a program is called Control Flow Analysis. It is expressed in a control flow graph (CFG).
Control Flow Path
A graphical representation of all the paths traveled across by a program while being executed is known as a control flow path. It usually contains the entry block where the control enters into the graph and an exit block where all the controls leave.
A basic example of a control flow path:
Each node in the above graph represents a block, arrows depict the paths from one block to another. A node is marked as the “entry” block if the arrow associated with it only has an exit edge whereas a node with an arrow having just the entry edge is marked as an “exit” block.
Taint Analysis
Taint analysis is a way of identifying variable vulnerability patterns in the source code of a program. It detects and alerts the developers about any irregular flow of information that could affect the whole or some parts of the system. The possible vulnerable functions are called sinks and if any “Tainted” variable passes by the sink and is not sanitized, it’s then flagged as a vulnerability.
Ruby and Perl are some of the languages that now come with a built-in taint checking mechanism and can be used in certain conditions.
Lexical Analysis
Lexical analysis, also known as Tokenization is a process of transforming a character sequence into a sequence of Lexical tokens. This helps in understanding the source code easier to understand and a lot more flexible to manipulate.
A tool that performs Lexical analysis is called a tokenizer, lexer, or scanner. Let’s look at an example of a pre-tokenized code and how its tokens are created after lexical analysis:
Pre-tokenized Source Code:
Example of Tokens Created:
How to choose the best static analysis tool?
There are quite a few things you need to consider before making the final decision on your static code analysis tool.
Programming language
Different analysis tools are designed for different languages and it’s essential to choose the right tool that supports your language. For instance, your project could use multiple languages like PHP, Java, and C++. Hence choosing an analyzer designed only for C++ might not serve the best purpose if that’s not the major language in your project. Project code with lines up to 10,000 can be tested by a manual review as well.
Integration
While working in an organization, a team responsible for configuring developmental tools may have already chosen the tools for a project. Choosing the right analyzer tool that seamlessly integrates with the chosen tools will make your life so much easier, and the whole SDLC (software development lifecycle) would go smoothly.
Most IDEs provide plugins that can be used to integrate seamlessly. It helps save time and team effort on any further configurations. However, if this is not the case then you might have to deal with some tedious things like figuring out how to integrate the analyzer with your process, how to create, test, and provide support for importing and exporting with the API platforms, etc.
False positives
Lots of tools give false positives while testing a program and this could lead to serious mistrust and frustration with the tool. The code runs in the tool but provides numerous false positives resulting in an inefficient code analysis which eventually would undermine the efforts of the development teams. You need to look for a tool with fewer false positives as its selling point. Test it in your runtime environment and make sure it works with your application development.
Although the analyzers provide numerous mechanisms to suppress insignificant warnings, the false positives can be marked so that the code can be analyzed without all the fuss. But nevertheless, this is an impending doom that developers will have to go through to configure and interact with the tool to simplify the checks.
Documentation and Support
While choosing the best static code analysis tools, developers must look for appropriate documentation and support for the tool, so that it’s easier for them to integrate the tool into their development process. Things to look for in documentation include examples of troubleshooting the errors in addition to the generic configuration manual.
But regardless of how detailed you are in the documentation, it cannot solve all your problems. Therefore, a tool should have expert support with high-quality representatives to help solve the issues in real-time and to get you past the finish line with complete help and support. Look for the red flags in support when they start throwing your problem around from one manager to another, you know it’s going to take a month to solve the problem when this happens. Be on the lookout for such signs and deal with them hands-on.
Coverage of Security and Safety standards
One of the most significant things to check the code with respect to the major security and safety standards, which helps you greatly reduce the risk when developing software. If you know beforehand that your project requires some certain standards, then it’s pertinent to study the features of each tool. Evaluate and document how it covers the coding standards, different types of errors, and the depth of analysis.
Benefits and Drawbacks of Static Analysis
Like every other technique, static code analysis has its strengths and weaknesses when it comes to analyzing the vulnerabilities in a program. Let’s look at each of them in detail.
Benefits
- It evaluates the complete code in an application which in turn improves the code quality
- Automated tools provide a more efficient and speedy workaround than using manual code reviews
- Human factors and error can be reduced by using automated tools
- Static code analysis tools allow the developers to go into more depth while debugging code
- Static code analysis is scalable to a great extent and can also be run repeatedly
- It works wonders for things that automatic tools can find easily and with high confidence, for example, SQL Injection flaws, buffer overflows, etc.
- Executable in an offline development environment as well
While there are many benefits of static code analysis, it comes with a few drawbacks. The following points should be taken into consideration by companies when using this technique:
Drawbacks
- A tool might tell you there is a defect, but may not be able to tell what the defect is
- False positives are a possibility when automated tools are run
- Static code analysis can be more time taking than its counterparts
- Numerous security vulnerabilities are hard to find and cannot be done so by automated tools. Even though the tools are improving but it can still be hard to find issues automatically in an event of authentication problems, access control issues, and a few others.
- Libraries (system and third party) could be skipped while analyzing the program.
Static Analysis vs Dynamic Analysis
We’ve talked a lot about static code analysis, as the name suggests, “Static” code analysis debugs a program while it is static (not executed) but how’s it different from “Dynamic analysis”?
- Both techniques are used for identifying vulnerabilities and defects. But what differentiates between the two is “where” each technique is executed in the security development lifecycle.
- Dynamic code analysis considers security after a program is run, unlike static code analysis which focuses on defects before executing a program. It could be during unit testing or similar steps. It’s quite possible that a few defects may not be caught by unit testing which could’ve been identified by the static code analysis tools.
- Dynamic testing focuses on errors after the execution. Both of these techniques used together are often referred to as “Glass-box testing”.
Final Thoughts
Experienced and mature software companies assess their software for security flaws and vulnerabilities at each step of the software development lifecycle from design to post-release analytics and testing. As the systems grow bigger and more complex, reliability of software and clean coding practices are a must to enable optimized performance & efficiency. Static code analysis is one such technique that can improve the overall structure of the source code, and a source code is one of the most important parts of a program. Integrating static analysis tools can significantly reduce the errors and defects in an application.
While there are other techniques available that yield good results in the same or later cycles of development, a balanced amalgamation of static code analysis with other techniques can give your software company the best of both worlds. It’s very important to choose the right static analysis tools for your development process according to the programming languages your team is working on. Static code analysis tools can help your organization to be safe from security vulnerabilities and improve overall software security. Code review is a fair method of achieving the above-mentioned standards of software security but automated tools help identify coding errors without much human effort and reduce the chances of human error as well.