What is PII?
Personally identifiable information (PII) is information linked to a specific individual that can be used to discover that person’s identity, such as their social security number, full name, email address, or phone number.
As people increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has increased. For example, companies collect personal data from customers to understand their markets, and consumers easily provide their phone numbers and home addresses to sign up for services and make online purchases.
Sharing PII can have its benefits, as it allows companies to tailor products and services to their customers’ wants and needs—for example, by offering more relevant search results in navigation apps. However, the growing stockpiles of PII accumulated by organizations are attracting the attention of cybercriminals.
Direct Versus Indirect Identifiers
Personally identifiable information comes in two types: direct identifiers and indirect identifiers. Direct identifiers are specific to an individual and include things like a passport number or driver’s license number. A single direct identifier is often enough to determine a person’s identity.
Indirect identifiers are not unique. They include more general personal information, such as race and birthplace. While a single indirect identifier cannot identify an individual, a combination of them can. For example, 87% of U.S. citizens (link is outside of IBM.com) could be identified based solely on their gender, zip code, and date of birth.
Sensitive PII versus non-sensitive PII
Not all personal data is considered personally identifiable information. For example, data about a person’s streaming habits is not personally identifiable information. In fact, it would be difficult, if not impossible, to identify a person based solely on what they watched on Netflix. Personally identifiable information only refers to information that points to a specific person, such as the type of information you might provide to verify your identity when you contact your bank.
Some personally identifiable information is more sensitive than others. Sensitive personally identifiable information is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen.
- Full name of a person
- Mother’s maiden name
- Phone number
- IP address
- Place of birth
- Date of birth
- Geographic information (zip code, city, state, country, etc.)
- Employment information
- Email address or mailing address
- Race or ethnicity
- Religion
Non-sensitive personal information is often publicly available. For example, phone numbers may be listed in a phone book, and addresses may be listed in a local government’s public records. Some data privacy regulations do not require the protection of non-sensitive personal information, but many companies still implement safeguards. This is because criminals could cause problems by gathering multiple pieces of non-sensitive personal information.
For example, a hacker could access a person’s bank account application using their phone number, email address, and mother’s maiden name. The email gives them a username. Phone number spoofing allows hackers to receive a verification code. The mother’s maiden name provides an answer to the security question.
When does sensitive information become PII?
Context also determines whether something is considered PII. For example, aggregated, anonymous geolocation data is often considered generic personal data because a user’s identity cannot be isolated.
However, individual records of anonymous geolocation data can become PII, as demonstrated in a recent lawsuit filed by the Federal Trade Commission (FTC) (link is outside of ibm.com).
The FTC alleges that data broker Kochava sold geolocation data that counted as PII because the company’s “custom data feeds allow buyers to identify and track specific mobile device users.” For example, a mobile device’s location at night likely matches a user’s home address and could be combined with property records to discover their identity. »
Technological advances also make it easier to identify individuals using less information, potentially lowering the threshold for what is considered personally identifiable information in general.
For example, researchers at IBM® and the University of Maryland have developed an algorithm (link is external to ibm.com).
This algorithm identifies specific individuals by combining anonymous location data with publicly available information on social networking sites.
Data privacy laws and PII
According to McKinsey (link is external to ibm.com), 75% of countries have implemented data privacy laws governing the collection, retention, and use of personally identifiable information. Complying with these regulations can be difficult, as different jurisdictions may have different or even conflicting rules.
The rise of cloud computing and remote work also poses a challenge. In these environments, data may be collected in one location, stored in another, and processed in a third. Different regulations may apply to data at each stage, depending on geographic location.
Industry-Specific Privacy Regulations
Some industries also have their own data privacy regulations. In the United States, the Health Insurance Portability and Accountability Act (HIPAA) governs how healthcare organizations collect and protect patients’ medical records and personal information.
Similarly, the Payment Card Industry Data Security Standard (PCI DSS) is a global financial industry standard that governs how credit card companies, merchants, and payment processors handle sensitive cardholder information.
Research suggests that organizations struggle to navigate this varied landscape of laws and industry standards. According to ESG (link is external to ibm.com), 66% of companies that underwent data privacy audits in the past three years failed at least once, and 23% failed three or more times.
Failure to comply with relevant data privacy regulations can result in fines, reputational damage, business loss, and other consequences for organizations. For example, Amazon was fined $888 million for GDPR violations in 2021 (link is external to ibm.com).
Protecting Pii
Hackers steal personal information for many reasons: to commit identity theft, to blackmail, or to sell it on the black market, where they can get up to $1 per Social Security number and $2,000 USD for a passport number (link is external to ibm.com).
Hackers can also target personal information as part of a broader attack: they can hold it hostage using ransomware or steal personal information to take control of executives’ email accounts and use it in phishing and business email compromise (BEC) scams.
- Identify all personally identifiable information in the organization’s systems.
- Minimize the collection and use of personally identifiable information and regularly dispose of information that is no longer needed.
- Categorize personally identifiable information based on sensitivity.
- Apply data security controls. Examples of controls include:
- Encryption: Encrypting personally identifiable information in transit, at rest, and in use using homomorphic encryption or sensitive computing can help ensure the security and compliance of personally identifiable information, regardless of where it is stored or processed.
- Identity and Access Management (IAM): Two-factor or multi-factor authentication can place more barriers between hackers and sensitive data. Similarly, enforcing the principle of least privilege through a zero-trust architecture and role-based access controls (RBAC) can limit the amount of personally identifiable information that hackers can access if they break into the network.
- Training: Employees learn how to properly handle and dispose of personally identifiable information. Employees also learn how to protect their own personally identifiable information. This training covers areas such as anti-phishing, social engineering, and social media awareness.
Conclusion
Write an incident response plan in the event of a personal data leak or breach. It’s worth noting that NIST and other data privacy experts often recommend applying different controls to different data sets based on their sensitivity. Using strict controls for non-sensitive data can be cumbersome and costly.