Sintelix offers industry leading ways of extracting information from unstructured text and performing unstructured data analysis. It can analyze huge volumes of unstructured information and combine multiple sources of information to identify links between people, places, crimes, vehicles and other entities. It is intended for use by intelligence analysts in police forces and law enforcement agencies as well as having uses in the private sector for investigations into fraud, counterfeiting or money laundering.
Why do intelligence analysts need to search unstructured data?
As part of an investigation, the investigator, police officer or agent will enter any information that they have discovered into a computer system. They will enter data into a structured database, adding information into pre-determined, structured fields such as suspect name, suspect height, suspect hair color and so on. It’s just like filling out an online form.
This process is designed to capture all of the important information for a case in structured data fields which can then be visualized and analyzed using software such as i2 Analyst’s Notebook according to a pre-determined map.
In theory all of the information that an analyst will need should be in structured data fields. However, in reality there are many reasons why relevant information might not be entered there, for example:
- There is no appropriate field – due to the way that the main database has been structured, the investigator cannot find an appropriate field to enter the data. We have all been in the position where we have ended up adding text to an ‘other information’ field because there was no obvious place to put it in the main form. If an investigator has information about a crime which doesn’t fit easily into a predefined form then it could easily end up as unstructured text.
- Information is missed – it is entirely possible that an investigator might miss a piece of information due to human error, especially if they are entering a large amount of information into a system. The possibility for making mistakes is always there.
- Information simply doesn’t seem to be important – at the point of data entry, the investigator may not know which pieces of data an intelligence analyst will view as important. The investigator is unlikely to have the full picture of the case and in many investigations only time will reveal what is truly important and relevant information.
- Cognitive bias – It may be that there is some unconscious bias in the investigator’s mind when they enter the information into the system. This is perfectly natural and unintentional, but humans will often unwittingly give more emphasis to the pieces of information which support a theory that they have (confirmation bias). There are many more examples of cognitive bias, all of which help the human mind to deal with a world which contains huge volumes of data and which allow us to act quickly and evaluate that information on a daily basis. In a criminal investigation however, that bias may well skew an investigation.
- Purpose of the data entry – if a report is being entered after a traffic stop then the data that is deemed to be important may be different than if the same information was being entered for a different investigation. If a suspect appears in two different investigations then the data that has been recorded for each of the cases may have a slightly different emphasis.
Using unstructured data
Not only do intelligence analysts need to be able to search unstructured text, they can also benefit from the impartiality that a software package can offer. All data is extracted
without any human bias and from there the analyst can use their knowledge and expertise to interpret the data and work out what is truly important in a case.
For example, in a store robbery witnesses were able to identify the suspect’s accent. If there is no structured data field for that piece of information, or if the information doesn’t seem particularly important then it will be entered into the written narrative of the witness interview. A similar accent is reported at another robbery, and then another. The importance of that piece of data begins to becomes clear: it could be used to link a series of crimes or used to narrow down a pool of suspects. If the intelligence analyst working on the case is able to easily search unstructured data as part of their investigation then all of the information about accent or place of birth that has been entered as part of every report can be used as part of their investigation.
Add into this the wealth of information that can be taken from external data sources such as email, the internet and reports from other agencies and you begin to see the importance of unstructured text.