Optus Data Breach: How proactive API visibility could have it
Introduction
On 22 September 2022, Australia's second-largest telecom company, Optus, announced that it had suffered a significant data breach. Though the data breach details are not public, reports indicate that the breach was caused due to a human error. In this blog post, we will dive deeper into what happened and how organizations can take steps to avoid such a breach.
Cause
Optus exposed its customer identity database to an Application Programming Interface (API) for use in the internal systems. Such an exercise is prevalent in large organizations that must keep changing access controls to various databases in the company. However, in this case, the API was publicly available without authentication. This means anyone could access that data via the API endpoint <span class="code">https://www.optus.com.au/mcssapi/rp-webapp-9-common/customer-management/contact-person/{contactId}?lo=en_US&sc=SS</span>.
This is a classic case of enumeration error, where the following user's information can be obtained by querying <span class="code">contactId + 1</span>. The hackers iterated over the <span class="code">contactId</span> to get the personal data of over 2 million Optus users.
How could it have been avoided?
With the speed of software development, it often becomes increasingly difficult to keep a check on applications that have access to customer data. Restricting access and manual verification can slow down the developer's speed and reduce the innovation cycle of an organization. Also, with hundreds of code commits daily, it becomes humanly impossible for data security and privacy engineers to keep track of all the changes and identify privacy vulnerabilities.
Some form of automation is required to help organizations deal with this issue. Organizations need real-time visibility of the data flows across multiple applications and databases. They need to know which applications or repositories have access to which data and how it is being exposed to the world. Using assessments or manual threat modeling techniques to acquire this information is slow and often incomplete.
To solve this, organizations must move their assessments closer to the developer workflows. With shift-left processes, organizations can monitor, identify and fix such data security vulnerabilities. This includes integrating tools and workflows into the CI/CD pipelines to automate risk detection and avoid time-consuming assessments. It also helps prevent any last-minute surprises from the privacy or security teams and allows teams have clear visibility of the constraints.
Imagine if Optus could identify that a new test network was accessing their customer identity database and that they have exposed this information via a public API. All of this happened as soon as the developer committed to these changes. This would have raised red flags in the application security team, and they could have moved swiftly to secure the code.
Detecting APIs using Privado open source
To detect real-time data flows and minimize application security issues, we can use Privado, an open-source static code scanning tool that helps developers and organizations monitor, detect and resolve data security issues at the time of commit.
For this example, we will take a repository that processes user data. Let's assume we have a legacy repository, BankingSystem-Backend, with all the security configurations that handle private customer data. We first scan the repository with Privado and find the existing APIs processing user data. To scan the project, we follow these steps:
Step 1 - Clone the Required Repositories
git clone https://github.com/saurabh-sudo/BankingSystem-Backend
git clone https://github.com/Privado-Inc/privado
Step 2 - Install Privado
curl -o- https://raw.githubusercontent.com/Privado-Inc/privado-cli/main/install.sh | bash
Step 3 - Run the scan
privado scan BankingSystem-Backend
After the scan is complete, we can view the results at <span class="code">BankingSystem-Backend/.privado/privado.json</span>. To interpret the results, you can check the following guide. Under the “Collections“ section in the result, we see that Privado scan has detected 2 APIs that collect user data. However, there are a few more APIs in the app that are also handling some data - might not be user data. One such example is shown below:
@GetMapping("/getById/{id}")
public ResponseEntity getById(@PathVariable Long id) throws Exception {
try {
Account acc = accountDao.findById(id).get();
return ResponseEntity.ok(acc);
} catch (Exception e) {
System.out.println("Error is " + e);
return new ResponseEntity(HttpStatus.BAD_REQUEST);
}
}
By default, Privado scan does not detect this since it is unclear if the data is linked to the user. So lets suppose we are browsing the code and discover this snippet above. We scratch our heads a bit and decide that we should start tracking any data leading to the Account DAO that we have since this looks like data can go in and out from here. To do this, we can define a simple rule in Privado to detect APIs that expose all entries in the database, using the JDBC’s <span class="code">findById()</span> method that fetches a particular entry from the database. We then mark those leakages as sinks. We must place the above rule in the Privado project's <span class="code">rules/sinks/storages/jdbc/java.yaml</span> file:
sinks:
- id: Storages.SpringFramework.AccountDao.Read
name: AccountDao JBDC Connector
patterns:
- "(i?)com.common.BankData.dao.AccountDao.*findById.*"
tags:
Now, to pass these custom rules to the scan, we run the <span class="code">privado scan BankingSystem-Backend -c <path/to/privado> </span>. On scanning the repository with our custom rules, we can see that Privado detects the new API in the code and maps out the data flow in the application. The below screenshot displays the data flow of the passport number of the user as highlighted by the red box.
It also generates a line-by-line code analysis that can be viewed by developers to understand the flow of information inside the code. Note that this process does not solve the problem itself. However, it helps improve visibility around data flow in an organization and collaboration between developers and data security engineers. This is particularly useful in large organizations, where hundreds of code changes are made daily, and new repositories are added to the codebase.
You can check out the tool yourself on GitHub. Feel free to post any issues or contribute to the project.
Anuj Agrawal is a Developer Relations Engineer at Privado