Foreign language support
Multi-language support
What to know
- Blackout supports over 100 language packs.
- These language packs can be mixed and run together in a single image project or excluded from use in a project altogether.
- The higher level of accuracy of a language pack, the slower it will run.
- Blackout and Tesseract use the alpha-3 code for the representation of names of languages. To understand which code corresponds to which language, please refer to ISO 639.2.
Language packs
- The most accurate language packs are available for download from the tessdata GitHub page.
- A version that processes more quickly, but is less accurate can also be used if the time on a project is tight
Fast tessdata language packages
- Packages from the more accurate language packages will be used for this guide
What else to consider
- Milyli does not provide support for the accuracy of a language pack.
- In some cases, you may have better results running a language pack in isolation, even if English exists on the documents.
- Always test a sample set of documents before running the full set.
Installing additional language packages
How to do it
- Navigate to the Tesseract Language page.
- Locate the language that will be used in the case using the 3 letter code from ISO 639.2 link above.
- Click on the link for the trained data.
- Click the Download button to download the trained data.
- Log into the Relativity environment as a system administrator.
- Navigate to Application & Scripts | Resource Files.
- Click New Resource File.
- Click the … button next to Application.
- Search for and select Blackout.
- Click the Ok button.
- Click the Choose File button for the Resource File field.
- Navigate to and select the trained data file that was downloaded.
- Click the Save button to add the resource to Blackout.
Note: Blackout will automatically load the file and it will be available for use in image projects. English is loaded by default, so there is no need to upload the trained data for it.
Enabling language support for image projects
What to know
- By default, Blackout will not be in multi-language mode.
- To enable multi-language mode, a new instance setting will need to be added and set to true.
How to do it
- Log into the Relativity environment as a system administrator.
- Navigate to the Instance Settings tab.
- Search for MultiLanguageMode for the instance setting name and confirm it does not exist.
- If it does exist, change the value to True.
- If it does not exist, click the New Instance Setting button.
- Input the following values:
Setting | Description |
---|---|
Name | MultiLanguageMode |
Value Type | True/False |
Value | True |
Initial Value | True |
Section | Milyli.Blackout |
Encrypt | No |
Description | Enable the ability to specify a language for image projects. |
- After creating or updating this instance setting, a language field will now be visible on the image project create and edit page.
Running non-English language projects
What to know
- By default, all image projects will run with the language code 'eng.'
- Running projects for other languages is straight forward.
- To determine the language code that should be used, refer to the trained data file. The letters that appear before the decimal will be used as the language code in the project.
Project example
- In this example, the Spanish language trained data will be used to OCR and review the documents in the saved search.
- In order to get the best results, there are a few very important things to note:
- Multiple languages are supported.
- Whenever there is a chance that the language being used will be combined with another language, such as English, both sets of trained data can be used at once. To do so, simply join the language codes with the plus (+) symbol.
- The languages will be prioritized from left to right.
- In this sample, Spanish matches will be prioritized first, and then English matches.
Document language settings in Relativity
- Documents should be loaded into Relativity with the proper language settings.
- This is an important component of interacting with images for non-English languages and a critical requirement for non-Romance languages.
- For example, if documents with Japanese text are being run through Blackout, they will need to be loaded from a processing engine that supports processing Japanese documents with the appropriate language settings.
- If using Relativity processing, this is as simple as selecting the Japanese language in either the processing set or processing profile.
- This ensures that the extracted text is encoded correctly and can be used for comparison when determining how much QC is needed.