EY refers to the global organization, and may refer to one or more, of the member firms of Ernst & Young Global Limited, each of which is a separate legal entity. Ernst & Young Global Limited, a UK company limited by guarantee, does not provide services to clients.
How EY can help
-
Addresses data sharing and integration challenges while empowering agencies to navigate the criminal justice continuum.
Read more
How privacy-preserving record linkage works
Privacy-preserving record linkage relies on three complementary methods.
First, identity attributes are transformed at the source. Names, addresses, identifiers and contact information are converted into encrypted and de identified representations before they ever leave an agency’s control. Multiple representations of the same attribute can coexist, each optimized for a different type of matching scenario — high precision identifiers, imprecise free-text fields or phonetic similarities. While deterministic, irreversible encryption forms the base, it’s not enough for the problem space; encrypted values of “David” and “Dave” may be very far apart, thus hindering matches by similarity. That’s where locally sensitive hashing comes in; identity attributes are encrypted such that the vector distance between encrypted values is roughly proportional to the difference in inputs.
Second, matching occurs without decryption. Using vector algebra and symbolic logic, systems can evaluate whether two protected records are likely to refer to the same individual. Because the process relies on deductive rules rather than probabilistic inference, the results are repeatable, explainable and auditable — qualities that are especially important in justice environments.
Third, relationships are modeled as graphs that exhibit the commutative property. If record A links to B, and B links to C, the system infers that A links to C, thus detecting broader identity clusters even when individual records are sparse or incomplete. Crucially, the logic used to establish each link remains transparent and is inscribed on the edges of the graph.
Why determinism and explainability matter
Many modern data systems lean heavily on probabilistic models and machine learning. While powerful, these approaches can introduce opacity, drift and challenges around explainability. In the language of mathematics, any approach that uses inductive logic — drawing general conclusions from specific instances — can never be “truth preserving.” And truth is an important word in justice.
In contrast, deterministic, rule-based linkage is “truth preserving” and provides repeatable, explainable results. Analysts can add additional rules as they learn more about the data. They can author different rulesets for different matching confidence levels. Every matching decision can be traced back to the exact predicates that produced it. This matters not just for technical teams but also for governance, legal review and public trust.
Unlocking value without increasing risk
When identity linkage no longer requires sharing raw PII, new possibilities emerge.
Agencies can generate cross-jurisdictional metrics without exposing personal identities. Situational awareness can be enhanced through event-based notifications that do not require the exposure of identity information. Third-party analytics and cloud environments can be leveraged without expanding breach risk or liability.
Perhaps most importantly, agencies retain control over their own confidentiality posture. Different organizations can apply different privacy transformations to the same types of data, without needing to coordinate or disclose their methods. Intelligence is shared, but identity remains protected.
A shift in mindset
Privacy-preserving record linkage is more than a technical solution — it represents a shift in how institutions think about data collaboration. In an era of accelerating policy change, rising public scrutiny and expanding analytical ambition, that shift may prove essential. The future of justice data is not about choosing between insight and privacy. It is about designing systems that deliver both — by design, not by exception.