In a Blackout 2.xx, the interface after running a job will show the results in two categories. One being "redactions placed" and the other being "redactions missing". Upon the first read, "redactions missing" may make it seem like the redactions were not placed at all. However, what "redactions missing" means is that Blackout has flagged a redaction for review. Most of the time, Blackout will still place the requested redaction but will occasionally flag some redactions for review for various reasons.
This was streamlined in Blackout 3.0. We no longer use the terminology of "redactions missing". Now this is called "documents with warnings". This will better signal to the client that there is a document that needs to be reviewed.
Sometimes Blackout does miss redactions. Most commonly this happens due to the volatility of OCR. It is very possible for Blackout to miss a redaction that was found in the extracted text of a document.
The most common solution:
On a document with redactions, we can view how Blackout sees a document by clicking the "View HOCR" button in the upper right-hand corner of the Blackout Markup Navigator. (image attached will show where these buttons are located, view text is the raw OCR data, and view HOCR data is with the HTML rendering).
Common issues we see are:
- Individual letters being changed to numbers (i's to 1's, A's to 4's, l's to |'s, etc.)
- Border lines or grid boxes of forms causes noise that tricks the OCR engine into thinking the borders are letters.
- Text dense documents having words combined into one another, generally also miss-OCRing those characters as well.
Generally, re-imaging these documents at higher DPI or as a JPEG rather than a TIFF will help the OCR better identify phrases. However, depending on the urgency, this may not be the more time-efficient method.