General PII Regular Expression
These regular expressions cover general pii such as email addresses and phone numbers.
France
France Phone Numbers
This regular expression can be used to redact french phone numbers that include the country code but does not have delimiters.
\b([0O]?[1lI][1lI])?[3E][3E][0O]?[\dOIlZEASB]{9}\b
Germany
German Phone Numbers
This regular expression can be used to redact German phone numbers.
\b[\d\w]\d{2}[\d\w]{6}\d[\d\w]\b
United Kingdom
UK Phone Numbers
This regular expression can be used to redact UK phone numebrs that include country code but does not have delimiters.
\b([0O]?[1lI][1lI])?[4A][4A][\dOIlZEASB]{10,11}\b
United States
US Phone Numbers
This regular expression can be used to redact US phone numbers. It is recommended to test this regex on a website like regex 101 with the phone numbers that appear in your document set before running it in a project to validate that it will match. This regular expression may be overly aggressive so sampling is recommended.
\b((\+|\b)[1l][\-\. ])?\(?\b[\dOlZSB]{3,5}([\-\. ]|\) ?)[\dOlZSB]{3}[\-\. ][\dOlZSB]{4}\b
US Street Addresses
This regular expression uses a multi state conditional hint to redact street addresses. Similar to phone numbers, this regular expressoni should be sampled for desired results before running against the full document set.
\b\d{1,8}\b[\s\S]{10,100}?\b(AK|AL|AR|AZ|CA|CO|CT|DC|DE|FL|GA|HI|IA|ID|IL|IN|KS|KY|LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH|OK|OR|PA|RI|SC|SD|TN|TX|UT|VA|VT|WA|WI|WV|WY)\b\s\d{5}\b
Universal Regular Expressions
Email Addresses
This regular expression will match and redact full Email Addresses.
\b[a-z0-9._%\+\-—|]+@[a-z0-9.\-—|]+\.[a-z|]{2,6}\b
Birth Dates
This regular expression uses contextual hints to locate and redact dates that are in proximity to words that typcially denote birth dates. It's important to understand that while dates are regular patterns, birth dates are not.
If a non-birth-date exists within close proximity of our contextual hits, it will be redacted. The contextual hint words in this regular expression are:
- birth
- birthdate
- birthday
- dob
- born
\b(birth|birthdate|birthday|dob|born)\W+(?:\w+\W+){0,5}?(?<REDACT>(\d{4}|\d{1,2})[\/\-]\d{1,2}[\/\-](\d{4}|\d{1,2}))\b
IPv4
This regular expression will match and general IPv4 addressess
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
IPv6
This regular expression will match and general IPv6 addresses
\b([\d\w]{4}|0)(\:([\d\w]{4}|0)){7}\b