Legal protection in the era of big data

By Diana van Hout

The societal desire for control, certainty, and consistency is bigger than ever. In a variety of scientific fields, researchers are trying to predict and influence human behavior by using big data. Tax administrations also use big data to make better predictions on non-compliance and to target their auditing more effectively. Undoubtedly, big data has many advantages, but, in this blog, I will focus on the risks of big data, the consequences for taxpayer’s rights, and the possible ethical concerns regarding the future of legal protection for taxpayers. In this blog, I will underline the importance of redesigning the framework of legal protection in order to protect taxpayers’ rights in the near future.

1. Big data

Big data refers to the process of gathering large quantities of information, structured and otherwise, and converting that information into usable values. Three relevant processes can be distinguished regarding big data: collection, analysis, and usage. Many definitions of big data also refer to the three Vs: volume, velocity, and variety. In my opinion, two more Vs are becoming more relevant in tax law: veracity and value. After all, the accuracy of the data used by the tax administration has to be verifiable and reliable, which I will discuss later.

2. Privacy

The availability of data has increased exponentially in the past few years. An increasing number of day-to-day actions take place digitally and so registering those actions has become easier. Applications like the Google search engine and social media like Facebook reveal much about a person’s life, like their friends and search history. At the same time, storing gathered information is becoming cheaper and easier. Furthermore, the legislation to facilitate the systematic collection of data by governments and the exchange among national and foreign governmental entities have increased over time. The recurring justification is that this data collection will make the society safer and fairer e.g. because it is used to detect tax abuse. There is, however, no way for civilians to extricate themselves from the registration of their personal information. Participation in today’s society requires the use of smartphones and computers, and often civilians will consent to data collection because they believe they have nothing to hide. As data collection penetrates more and more into the private life of civilians, a more complete image of their life is formed.

However, the following question remains: To what degree can the tax administration collect these taxpayers’ personal data, and under which circumstances? Therefore, I argue that a connection must be made between technical capabilities and the precision of the law that governs their usage. As technology allows for data collection in more invasive ways, the law must be more precise about which types of information may be collected and how that information can be used for tax purposes. The law should provide a clear boundary to force tax administrations to be more conscious about their data collections, especially because mass surveillance often yields few results. Probably, the best solution to decrease the collection of personal data of taxpayers would be to make taxation less reliant on sensitive information although this could result in a tax system that ignores the taxpayer’s ability to pay.

3. Data mining and profiling

Data mining refers to pairing information by using algorithms. The goal is mostly to discover patterns in data sets. These patterns can be used to define profiles of taxpayers and to predict which tax returns generate the highest risk on mistakes (predictive modelling). For example, data science in the Netherlands has shown that recently divorced taxpayers have a higher chance of submitting an incorrect tax return. While profiling can be very effective, since tax auditing can then be reduced to the group of taxpayers that generate the highest risk on non-compliance, it also introduces a risk of prejudice and unequal treatment of these taxpayers. These taxpayers may notice that they are audited more frequently compared to other taxpayers. This can have two effects. The first one is a chilling effect. Taxpayers, for example, refrain from requesting certain tax deductions to avoid examinations by the tax administration. The second one is that profiling might cause a self-fulfilling prophecy, leading taxpayers to exhibit the exact behavior that is expected of them. From a government perspective, profiling might cause tunnel vision and a blind trust in the outcome of the system (value). A frequent examination of the tax returns of specific categories of taxpayers will probably result in more corrections, which will reinforce the perspective on this category of taxpayers as being non-compliant. Moreover, it can lead to a tax administration that has become blind to new types of tax avoidance because big data does not provide information or insights in new or future situations―unless it leads to new patterns.

4. Function creep

Sometimes data from different data sets are collected and connected. Function creep is the use of data sets beyond the purpose for which is was originally intended. The risk of accidental connections or false connections is inherent in data mining and function creep. When enough data is collected, two variables can always be connected and false assumptions can be made. Furthermore, correlation does not imply a causal connection.

The tax administration collects a lot of information that is useful to other government agencies. Because of this, the law provides for the interagency exchange of information. This kind of provisions will likely be expanded in the future. A clear advantage of this exchange is that a more complete picture of a person can be formed. The downside is that it becomes more difficult to check the veracity of the underlying information. This may lead to false conclusions as happened in the robo-debt scandal in Australia. In addition, function creep may also cause a vacuum in the system of legal protection. This risk is exemplified by a recent case before the Dutch Supreme Court in which a civilian’s data was collected by the police for law enforcement purposes. The law requires that this information be deleted as soon as possible given the reason for their collection. Through interagency data-exchange, however, the data ended up with the tax administration, who is allowed to keep all data for seven years. Legal protection apparently does not follow the information, thus big data can alter civilians’ legal protection.

At the same time, connecting data sets magnifies the risk of data leaks. Large collections of data and the increasing number of data exchanges exposes taxpayers to a much greater risk in the case of information being compromised. Data protection is not always up to standard in all countries, which is important when information is exchanged internationally. Thus, these risks must be kept in mind when new legislation is enacted and new bilateral or multilateral information exchange treaties are ratified.

5. Transparency

Because of quantum computers, auditors will be able to analyze data even better, which makes their conclusions more complex and harder to verify. Additionally, because of artificial intelligence, like machine learning, it will become harder to scrutinize how a computer learns. Therefore, it is important that there is transparency in order to verify the value and veracity of big data. Tax administrations have to be transparent about the correlations and causal connections of the data used and paired. In this regard, I conclude that visualization has to be added to the aforementioned Vs of big data because a lack of transparency would lead to distrust of the tax administration. After all, the tax administration is able to collect extraordinary detailed information about taxpayers but the tax administration is not transparent about the use of all that information. This does not mean that the tax administration has to reveal all its examination strategies, but a more transparent approach would also stimulate corporate compliance.

6. Conclusion

The use of big data by the tax administration means that new choices have to be made regarding the legal protection of taxpayers. This development calls for a fundamental reflection on the fiscal legal protection in the future. Big data will undoubtedly improve efficiency in taxation, but this does not justify accepting just any applications of big data since it can seriously jeopardize taxpayers’ rights.