March 28, 2023
.

What are PII, PCI, and PHI?

In an increasingly digital world where customer data is being collected at various touchpoints, the protection of personal information is becoming increasingly important for businesses worldwide.

Patricia Thaine
Founder, Chairwoman, Thought Leader

In an increasingly digital world where customer data is collected at every touchpoint, understanding what types of information require protection is one of the most foundational questions in data privacy. Three acronyms come up repeatedly across compliance frameworks, legal discussions, and enterprise security conversations: Personally Identifiable Information (PII), Payment Card Industry data (PCI), and Protected Health Information (PHI).

Each refers to a distinct category of sensitive personal information, each has its own regulatory origin, and each carries specific obligations for organizations that handle it. Depending on your industry, you may be subject to rules governing one, two, or all three of these data types simultaneously.

This article provides a clear, authoritative explanation of each type, how they relate to one another, and what the regulatory landscape looks like across jurisdictions. It is intended both for privacy professionals who need a reliable reference and for business leaders who want to understand what compliance actually requires of their organization.

What Is PII (Personally Identifiable Information)?

PII is the broadest of the three categories. It refers to any information that can be used to distinguish or trace an individual's identity, either on its own or in combination with other data. While the term is widely used across industries and regulatory frameworks, it does not originate from a single federal statute in the United States. Its most commonly cited definition comes from the Office of Management and Budget (OMB) Memorandum M-07-16, which defines PII as:

"information which can be used to distinguish or trace an individual's identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual."

In practice, PII includes a wide range of data: names, dates of birth, mailing addresses, telephone numbers, Social Security numbers, email addresses, zip codes, account numbers, license numbers, vehicle identifiers, URLs, static IP addresses, biometric identifiers such as fingerprints, and photographic facial images. Critically, PII also captures any information where it is "reasonably foreseeable" that it will be linked with other data to identify an individual. This makes PII an expansive concept, and intentionally so.

It is worth noting that while PII is a U.S.-origin term, equivalent concepts exist in other jurisdictions. The California Consumer Privacy Act (CCPA) and Canada's PIPEDA use the term "personal information," while the EU's General Data Protection Regulation (GDPR) and the proposed New York Privacy Act use "personal data." The underlying principle is the same: information that relates to an identifiable natural person deserves protection.

Because PII is defined so broadly, it functions as an umbrella category. Both PHI and PCI, discussed below, fall within the scope of PII. However, their sensitivity and the nature of the potential harm from their misuse are significant enough to warrant their own dedicated regulatory frameworks.

What Is PHI (Protected Health Information)?

PHI is a subcategory of PII that refers specifically to individually identifiable health information. It is defined and protected under the U.S. Health Insurance Portability and Accountability Act (HIPAA), specifically under the HIPAA Privacy Rule.

The formal HIPAA definition of PHI contains five elements. First, PHI is information that is created or received by a covered entity (such as a healthcare provider, health plan, or healthcare clearinghouse) or a business associate. Second, the content must relate to an individual's past, present, or future physical or mental health, the provision of healthcare, or payment for healthcare. Third, the information must identify or be reasonably likely to identify the individual. Fourth, the information must be transmitted or maintained in any form, whether electronic, paper, or oral. Fifth, the information must not fall within one of the defined exclusions, such as certain employment records.

In plain terms, PHI encompasses the kind of information found in a medical record: diagnoses, treatment plans, lab results, prescriptions, and the billing records tied to those services. The combination of health data with identity is what makes PHI particularly sensitive. A disclosed medical diagnosis can affect a person's employment, insurance eligibility, and personal relationships in ways that go far beyond a simple data breach.

For organizations operating in the healthcare industry or pharmaceutical and life sciences, managing PHI is one of the most consequential compliance responsibilities they face. Mishandling it can result in significant HIPAA penalties, reputational damage, and, most importantly, real harm to patients.

Outside the U.S., comparable frameworks exist. Ontario's Personal Health Information Protection Act (PHIPA) uses the term "personal health information." The GDPR classifies health data as a "special category of personal data" under Article 9, requiring a higher standard of protection than ordinary personal data and permitting processing only under specific legal bases.

What Is PCI Data (Payment Card Industry Data)?

PCI data is a different kind of subcategory. Unlike PHI, which is defined by a federal statute, PCI data is governed by the PCI Data Security Standard (PCI DSS), a standard developed not by a government body but by the PCI Security Standards Council, an independent organization established by major credit card networks including Visa, Mastercard, American Express, Discover, and JCB.

PCI DSS protects what it calls "account data," which is divided into two categories. The first is cardholder data, which includes the Primary Account Number (PAN) that identifies the card issuer and account holder, the cardholder's name, the card expiration date, and the service code. The second is Sensitive Authentication Data (SAD), which includes card validation codes (CVV/CVC), full magnetic stripe or chip data, PINs, and PIN blocks.

The compliance obligations under PCI DSS apply to any organization that stores, processes, or transmits cardholder data. This includes merchants, payment processors, and any third-party service provider in the payment chain. Compliance is enforced contractually: organizations that accept card payments agree to comply with PCI DSS as a condition of being permitted to do so. Failure to comply can result in fines, increased transaction fees, and ultimately the loss of the ability to accept card payments.

For organizations in financial services and insurance, the obligation to protect PCI data is often layered on top of existing regulatory requirements, creating a complex web of overlapping compliance obligations. Contact centers that handle payment transactions over voice or digital channels face a particularly acute challenge, as PCI data can appear in call recordings, chat transcripts, and other unstructured formats that are difficult to secure without dedicated tooling.

The EU equivalent concept is found in the Payment Services Directive 2 (PSD2), which uses the terms "personalized security credentials" and "sensitive payment data" to describe information that must be protected in electronic payment transactions.

How Are PII, PCI, and PHI Related?

When you look at the formal definitions together, a clear hierarchy emerges. PII is the broadest category, capturing any information that can identify an individual. PHI and PCI are both subcategories of PII: health data identifies individuals, and so does payment card data when combined with other account information. All three can be used, alone or in combination, to distinguish or trace a person's identity.

The reason these subcategories exist as distinct concepts in U.S. law is largely historical and political. The U.S. has never enacted a single comprehensive federal data protection law covering all types of personal information in the way the GDPR does for Europe. Instead, regulation has developed sector by sector, responding to the most acute perceived harms. Health information was regulated through HIPAA because the sensitivity of medical data, and the political will to protect it, was strong enough to overcome the typical resistance to federal legislation in this area. Payment card data was regulated through the private sector, because major card networks had a direct financial incentive to reduce fraud and could impose standards contractually without requiring legislative action.

The situation in Europe reflects the inverse logic. The GDPR succeeded in establishing a general data protection framework precisely because the EU operates as a supranational body where member states could reach a consensus, even if that required years of negotiation and built-in flexibility for national variation. Health data is treated as a special category under GDPR rather than receiving its own separate law. On the payment side, Europe has struggled to establish a harmonized card payment regime, partly because of greater reliance on proprietary national schemes, though the European Central Bank has noted ongoing efforts toward a unified European card payment system.

The practical takeaway for compliance teams is this: the regulatory framework you operate under depends on where your organization is located, where your data subjects are located, and what type of data you handle. Most large organizations are subject to multiple overlapping frameworks simultaneously.

A Comparison of PII, PCI, and PHI at a Glance

The following table summarizes the key distinctions between the three data types.

 

Meaning

Origin

Examples

Terms in other jurisdictions

PII

Personally Identifiable Information

U.S. (federal); not defined in any act; most commonly used definition is from OMB Memorandum M-07-16

Name, date of birth, mailing address, telephone number, Social Security number (SSN), email address, zip code, account numbers, certificate/license numbers, vehicle identifiers including license plates, uniform resource locators (URLs), static internet protocol addresses, biometric identifiers (e.g., fingerprints), photographic facial images, or any other unique identifying number or characteristic, and any information where it is reasonably foreseeable that the information will be linked with other information to identify the individual

Personal information (e.g., CCPA, PIPEDA); Personal data (GDPR, proposed New York privacy act)

PCI

Payment Card Industry

PCI is sometimes used as a shorthand for the information protected under the PCI Data Security Standard (PCI DSS)

Cardholder data: Primary account numbers (PAN) that identifies the issuer and the cardholder account; cardholder name; expiration date; service code; 

Sensitive Authentication Data (SAD) which is information used to authenticate cardholders and/or authorize payment card transactions, including card validation verification codes/values (CVV), full track data (from magnetic stripe or equivalent on a chip), PINs, and PIN blocks.

‘personalized security credentials’ and ‘sensitive payment data’ (EU’s PSD2)

PHI

Protected Health Information

U.S. HIPAA’s Privacy Rule

Individually identifiable information relating to a person’s health contained in medical records, such as medical diagnoses, treatment information, as well as lab results and billing information

Personal health information (PHIPA), Special categories of personal data (GDPR)

Why Identifying These Data Types in Your Organization Is a Compliance Prerequisite

Understanding what PII, PCI, and PHI mean in theory is only the first step. The harder, more operationally demanding challenge is knowing where these data types live within your organization, and ensuring they are handled appropriately.

This is where many organizations run into difficulty. PII, PHI, and PCI do not always appear in structured databases where they are easy to query and audit. They frequently appear in unstructured formats: clinical notes, support call transcripts, chat logs, email threads, PDF documents, and free-text fields. These formats are harder to scan, classify, and protect, and they are often overlooked in compliance programs that focus exclusively on structured data.

Gaining visibility into unstructured data is not optional. If you cannot identify where sensitive data exists across your organization, you cannot make informed decisions about what technical and organizational measures are required, whether under HIPAA, PCI DSS, the GDPR, or any other applicable framework.

Limina's data de-identification platform is purpose-built for exactly this challenge. Built by linguists and powered by the latest advances in machine learning, Limina identifies 50+ entities of PII, PHI, and PCI in unstructured data across 52+ languages. Because it is context-aware and understands the nuances of language and entity relationships within documents, it does not rely on simple pattern matching that misses indirect identifiers or novel data formats. The result is a level of accuracy that generic tools cannot match.

If your organization handles health data, financial data, or any other form of sensitive personal information and you are not confident in your visibility across unstructured data sources, that is a significant compliance gap. Get in touch with our team to understand how Limina can help close it.

Free Resource Bundle

Your PII detection has gaps.
Here's the data to prove it.

Benchmark report, enterprise case study, and a 15-point production-readiness checklist — free for engineering teams evaluating PII detection.

Benchmark Whitepaper
Boehringer Case Study
Readiness Checklist

What Does This Mean for Your Industry?

The stakes around PII, PCI, and PHI differ by sector, but no industry that handles personal data is exempt from these considerations.

Healthcare organizations and those in pharma and life sciences face some of the most demanding obligations, particularly around PHI. HIPAA's breach notification requirements, minimum necessary standards, and business associate obligations create a compliance infrastructure that must be maintained across every system and workflow that touches patient data, including increasingly, AI systems trained on or operating with clinical data.

Financial services firms and insurers face a layered landscape of obligations covering PCI data, PII under state and federal law, and increasingly, AI governance requirements that reference privacy principles. Insurance organizations in particular handle large volumes of sensitive personal and health data in claim files, medical records, and underwriting documents, much of it in unstructured form.

Contact centers sit at an unusual intersection: they handle PCI data in real time during payment transactions, they may handle PHI if they serve healthcare clients, and they generate massive volumes of unstructured data in call recordings and transcripts. For contact center teams, the ability to automatically detect and redact sensitive data across voice and text is not a nice-to-have. It is a compliance requirement.

Regardless of your industry, the underlying need is the same: a clear understanding of what sensitive data exists in your systems, where it is, and how it is being protected. If your current approach to PII, PHI, and PCI detection leaves gaps in unstructured data, connect with the Limina team to explore a more comprehensive solution.

Related Articles