- Created by Verena Surrey, last modified on Aug 10, 2021
Averbis Health Discovery: User Manual
Version 6.1.0, 06/08/2021
0. Changes in Health Discovery 6.0
Averbis Health Discovery GUI enables you to no longer be limited to using the REST API in your production scenario. You can now also import documents into Health Discovery and get the results easily via bulk export.
But the Health Discovery version 6 has much more to offer. You can look forward to:
- a seamless integration of the REST API into Python
- new enhancements and changes to the API format
- Apache UIMA v3 upgrade
- lots of new and improved annotators
For more information on " What is new and what has been improved in the Health Discovery 6.0?", please follow this link: Benefits of HD6.
To get a detailed overview of our REST API changes - What has changed in HD6?
1. Overview
Health Discovery is a text mining and machine learning platform for analyzing large amounts of patient data. With Health Discovery, medical documents can be analyzed and searched for diagnoses, symptoms, prescriptions, special findings, and other criteria. Heterogeneous patient data in both structured and unstructured forms can be harmonized and analyzed by text mining, and can be accessed and searched via a unified interface.
Health Discovery has a modular structure. The various functionalities are roughly divided into the following categories:
General: There are some general modules in which projects and users with corresponding rights and roles can be created.
Sources: There are several ways to invite documents to Health Discovery. Documents can be imported from your own client or from any server from file or database.
Terminology: Health Discovery allows you to create your own terminology or import terminologies. These can be integrated into text mining, and terms of these terminologies can be found in texts.
Text Analysis: This category contains various modules for configuring text mining pipelines, starting text mining processes, and viewing text mining results. Different text mining pipelines can also be compared with each other.
Search: Health Discovery contains a semantic full-text search that can be configured and used in the various modules of this category.
Classification: Health Discovery contains a machine learning-based classification module. Users can sort documents manually or automatically into different categories. An intuitive interface enables the training and evaluation of machine learning models.
The user manual is intended to give you a quick introduction to Health Discovery with the "Getting started" section. Then the text mining components and pipelines that are included in Health Discovery are described in detail.
2. Getting started
2.1. Login to Health Discovery and create a project
Step 1: Enter the URL of Health Discovery in a web browser and login with your user name and password. If you don’t know the URL or your credentials, contact your system administrator.
Step 2: To create a new project go to the "Project Administration" section (1) and click the button "Create Project" (2).
Step 3: Enter a project name into the field "Name" (3) of the dialog "Create Project" and click "Save" (4).
After the project was created successfully, you can open the project by clicking on the newly added name in the list of the "Project Administration" section.
2.2. Import documents
Step 1: On page "Home", select "Project Administration" and click on a project name. You are redirected to the "Project Overview" page of this project.
Step 2: On the "Project Overview" page, choose module "Import Documents", click on "New Import", give your import batch a name, select the Importer Type "Text Importer" and the documents to be imported. You can import a single file or a zip container with multiple files.
Make sure that the zip container doesn’t contain (hidden) subfolders and that the files have the correct file extension.
Step 3: By clicking on "Import", the document import starts. You can click on the "Refresh" button to the right of your document import to see the progress.
You can reach the "Project Overview" page at any time via the breadcrumb navigation in the upper left by clicking on "default".
2.3. Run a text mining process
Health Discovery typically contains predefined pipelines that are already available when the application starts. Therefore, you can start text mining processing immediately after importing the first documents. This goes as follows:
Step 1: On the "Project Overview" page, select "Pipeline Configuration" and start a text mining pipeline, e.g "discharge"
Starting the pipeline may take a few minutes, as a lot of information is loaded into the main memory.
Step 2: Switch back to "Project Overview" and select "Processes".
Step 3: Click on "New Text Analysis"
Step 4: Give your text mining process a name, select the document source and the text mining pipeline, and click Ok.
Step 5: The text analysis starts now. By clicking on the browser refresh you can monitor the progress of the text analysis.
2.4. View text mining results
As soon as a text mining process has the state "idle", you can see the results in the Annotation Editor by clicking on the process name.
The Annotation Editor shows the results of your text analytic process.
2.5. Configure your own pipeline
If you want to build your own pipeline from existing text mining components, proceed as follows:
Step 1: In "Project Overview", click on "Pipeline Configuration".
Step 2: Click on "Create Pipeline".
Step 3: Give your pipeline a name, optionally a description and click on "Ok".
Step 4: Click the pen icon ("Edit Pipeline") to the right of your pipeline.
Step 5: Select the desired components from the components on the right by clicking on the corresponding left arrow. For more information about the available components and which upstream components they require, see Available Text Mining Annotators & Web Service Specification
2.6. Create your first terminology
If you want to create your own terminology, proceed as follows:
Step 1: In "Project Administration", select "Terminology Administration".
Step 2: Click on "Create Terminology".
Step 3: Assign a "Terminology ID", a "Label", a "Version". Choose whether the terminology should have a hierarchy or not. Leave the "Concept type" on "de.averbis.extraction.types.Concept" and the "Encrypted export" on disabled. Select the language(s) in which the terminology is to be created. Then click on "Ok".
Step 4: Switch to the "Terminology Editor" by going to the "Project Overview" page and clicking on "Terminology Editor".
Step 5: Click on the "plus" to the right of your terminology to create the first concept.
Step 6: Enter a "Concept ID", a "Preferred Term" and optionally a "Comment" and click "Ok".
Step 7: By clicking on the button "Add Terms" you can add more synonyms to the concept. If you want to add more than one synonym, click on "add another term". Once all synonyms have been inserted, click on "Ok".
Step 8: By clicking on the "plus" to the right of your newly created concept you can create further sub-concepts.
2.7. Download your terminology
Step 1: Go to "Terminology Administration" module.
Step 2: Choose your terminology and click the icon "Preparing OBO download".
The preparation time depends on the size of your terminology. Once the download is ready, a notification appears in the bell symbol in the upper menu bar.
Step 3: Refresh the page using the refresh-button. Now the button for downloading the terminology is activated.
Step 4: Click on the download icon and save the the OBO file.
2.8. Integrate own terminologies into a text mining pipeline
You can import your own terminologies to Health Discovery. Optionally, a mapping mode for each synonym can be imported, too. To import terminologies, you must convert them to the OBO file format. The minimal structure of your OBO terminology looks like the example below.
synonymtypedef: DEFAULT_MODE "Default Mapping Mode" synonymtypedef: EXACT_MODE "Exact Mapping Mode" synonymtypedef: IGNORE_MODE "Ignore Mapping Mode" [Term] id: 1 name: First Concept synonym: "First Concept" DEFAULT_MODE [] synonym: "First Synonym" IGNORE_MODE [] synonym: "Second Synonym" EXACT_MODE [] [Term] id: 2 name: First Child is_a: 1 ! First Concept
To import terms with mapping modes, the OBO terminology begins with the synonym type definitions ("synonymtypedef"), as shown in the first three lines of the OBO terminology in the example above. The "synonymtypedef" are optional and need only to be applied when using mapping modes. Each concept begins with the flag "[TERM]", followed by an "id" and a preferred name with the flag "name". After that you can add as many synonyms as you like with the flag "synonym", followed by the desired mapping mode (optionally). Note: if you would like to define a mapping mode for your concept name, you have to add the term as synonym, as shown in the example for "First Concept". Furthermore, if your terminology contains a hierarchy, you can use "is_a" to refer to other concepts of your terminology.
2.8.1. Import a terminology
To import a terminology like the one shown above, proceed as follows:
Step 1: In "Project Overview", click on "Terminology Administration".
Step 2: Click on "Create New Terminology". Fill in the dialog as described in Create your first terminology.
Step 3: Once you have created a terminology, click the up arrow icon to the right of the terminology.
Step 4: In the "Import Terminology" dialog, select "OBO Importer" as import format. Then select the terminology you want to import from the file system. Click on "Import".
Step 5: By clicking on the "Refresh" button to the right of the terminology you can check the progress of the import. When the terminology has been fully imported, the state changes to "Terminology imported".
Step 6: To browse your terminology, switch to the "Terminology Editor" by going to the "Project Overview" page and clicking on "Terminology Editor".
After successful terminology import, terms, hierarchies and mapping modes can be checked in the Terminology Editor.
2.9. Use the Web Service
All text mining pipelines configured and started in Health Discovery can also be accessed via web service. To do this, proceed as follows:
Step 1: Add the suffix "/rest/swagger-ui.html" to the URL of Health Discovery (e.g. https://<YOURURL>/health-discovery/swagger-ui.html)
Step 2: In the green upper menu bar, select the spec "REST API v1"
Step 3: Click on "Text Analysis" and then on "/ /rest/v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}/analyseText ".
Step 4: Click on the button on the upper right "Try it out"
Step 5: Add API token as a string to "api-token" (1) (for more details of generating an API token follow this Link REST API Overview), "discharge" to pipelineName (2) (or the name of another started pipeline) and "default" to "projectName" (3).
Step 6: Add any text in the field "text".
Step 7: The field "language" can be left blank for the "discharge" pipeline, as the pipeline automatically recognizes the language.
Step 8: Click on the blue botton "Execute" on the buttom left (4).
Step 9: You receive the response in the response body section.
3. Available Text Mining Annotators & Web Service Specification
Health Discovery contains a number of pipelines and text mining components. These can be configured in the "Pipeline Configuration" module. The individual components are described below. In addition to a short description of the component, it specifies which types the components require as input and which type they generate. A web service example of an annotation of the corresponding type is also given.
3.1. BiologicallyDerivedProducts
3.1.1. Description
A biologically derived product is a material substance originating from a biological entity intended to be transplanted or infused into another biological entity. Examples for a biologically derived product include hematopoietic stem cells such as bone marrow, peripheral blood, or cord blood extraction. This annotator extracts the information about the type of the transplanted biological product, the amount of transplanted cells and the date in the context of allogeneic transplantations.
Currently, the annotation is limited to the extraction of the biological product of CD34-positive stem cells.
3.1.2. Input
3.1.3. Output
Annotation Type: de.averbis.types.health.BiologicallyDerivedProduct
Attribute | Description | Type |
---|---|---|
| The volume of the product which was transplanted. |
|
| Temporal information (date or date interval) about the transplantation. Please see types: Date, DateInterval |
|
matchedTerm
| Matching synonym of the biologically derived product concept. | String |
dictCanon
| Preferred term of the biologically derived product concept. | String |
| The ID of the concept. | String |
| The name of the terminology source. | String |
uniqueID
| Unique identifier of the concept of the format 'terminologyId:conceptID'. | String |
negatedBy
| Specifies the negation word, if one exists. | String |
3.1.4. Terminology Binding
Name | Languages | Version | Identifier | Comment |
---|---|---|---|---|
Averbis Lab Terminology | EN, DE | 2.0 | Averbis-Lab-Terminology_2.0 | Laboratory and vital signs parameters, ID based on LOINC codes (LOINC parts) composed by Averbis. |
3.1.5. Web Service Example
Text Example: "On 11/11/2008 transfusion of 4.5x 106 CD34-positive cells/kg"
{ "begin": 29, "end": 42, "type": "de.averbis.types.health.BiologicallyDerivedProduct", "coveredText": "4.5x 106 CD34", "id": 1839, "negatedBy": null, "quantity": 4500000, "matchedTerm": "CD34+", "dictCanon": "CD34+", "conceptID": "78002-3", "source": "Averbis-Lab-Terminology_2.0", "time": { "begin": 3, "end": 13, "type": "de.averbis.types.health.Date", "coveredText": "11/11/2008", "id": 1840, "kind": "DATE", "value": "2008-11-11" }, "uniqueID": "Averbis-Lab-Terminology_2.0:78002-3 }
3.2. Chimerism
3.2.1. Description
This component annotates information about chimerism. In the field of transplantation medicine, a chimerism analysis is performed after stem cell or bone marrow transplantation to determine whether the recipient’s hematopoietic system is only derived from the donor or not. The chimerism is called "complete" if more than 95% of the tested hematopoietic cells originate from the donor, otherwise the chimerism is called "mixed".
3.2.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.2.3. Output
Annotation Type: de.averbis.types.health.Chimerism
Attribute | Description | Type |
---|---|---|
| The kind of the actual chimerism. Possible values (default is underlined): null | COMPLETE | MIXED |
|
| Numeric value of chimerism. |
|
| Date of chimerism analysis. Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2009-02-17) |
|
3.2.4. Web Service Example
Text Example: "Chimärismusanalyse vom 17.11.2008: Nachweis von 85,2 % Donorzellen."
{ "begin": 48, "end": 66, "type": "de.averbis.types.health.Chimerism", "coveredText": "85,2 % Donorzellen", "id": 1469, "date": null, "kind": "MIXED", "value": 85.2 }
3.3. Clinical Section Keyword
3.3.1. Description
Clinical Section Keyword is a keyword from which an eventual Clinical Section is derived, e.g. "Patient History" is a keyword for Anamnesis Section. Not every Clinical Section Keyword leads to a Clinical Section.
3.3.2. Input
Above this annotator, the following annotators must be included in the pipeline
3.3.3. Output
Annotation Type: de.averbis.types.health.ClinicalSectionKeyword
Attribute | Description | Type |
---|---|---|
| Preferred term of the section concept. |
|
| Unique identifier of the section concept of the format 'terminologyId:conceptID'. |
|
| The ID of the concept. |
|
| The name of the terminology source. |
|
| Matching synonym of the section concept. |
|
negatedBy
| Specifies the negation word, if one exists. | String |
3.3.4. Terminology Binding
Name | Languages | Version | Identifier | Comment |
---|---|---|---|---|
clinical-Sections | EN, DE | 1.0 | clinical_sections_de, clinical_sections_en | Types of clinical sections, ID predominantly based on LOINC codes composed and enriched with synonyms by Averbis. |
3.3.5. Web Service Example
Text Example: "Medication Citation|Active|CM| TraMADol HCl - 50 MG Oral Tablet;TAKE 1 TABLET 3 TIMES DAILY.; RPT~Tylenol Arthritis Ext Relief 650 MG TBCR;TAKE 1 TABLET 3-4 TIMES DAILY.; RPT~CeleBREX 200 MG Oral Capsule;TAKE 1 CAPSULE DAILY.; RPT~Folbic TABS;; RPT~Folic Acid 1 MG Oral Tablet;TAKE 1 TABLET DAILY.; RPT~PredniSONE 10 MG Oral Tablet;TAKE 1 TABLET AS NEEDED.; RPT~Cholestyramine 4 GM Oral Packet;MIX THE CONTENTS OF 1 POWDER PACKET WITH 2 TO 6 OZ OF NONCARBONATED BEVERAGE AND DRINK 3 TIMES DAILY.; RPT~Methotrexate 2.5 MG Oral Tablet;TAKE 1 TABLET WEEKLY.; RPT~Citracal Plus Oral Tablet;TAKE 2 TABLET DAILY; RPT~Multi Vitamin Daily TABS;TAKE 1 TABLET DAILY.; RPT~Miscellaneous Medication;Schiff "Move Free". 400 MG taken once daily; RPT"
{ "begin": 0, "end": 10, "type": "de.averbis.types.health.ClinicalSectionKeyword", "coveredText": "Medication", "id": 16279, "negatedBy": null, "matchedTerm": "Medication", "dictCanon": "Medication", "conceptID": "29549-3", "source": "clinical_sections_en", "uniqueID": "clinical_sections_en:29549-3" }, { "begin": 676, "end": 686, "type": "de.averbis.types.health.ClinicalSectionKeyword", "coveredText": "Medication", "id": 16280, "negatedBy": null, "matchedTerm": "Medication", "dictCanon": "Medication", "conceptID": "29549-3", "source": "clinical_sections_en", "uniqueID": "clinical_sections_en:29549-3" }
3.4. Clinical Section
3.4.1. Description
This component detects sections in medical documents. These sections can refer to diagnoses, medications, therapies, etc.
3.4.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.4.3. Output
Annotation Type: de.averbis.types.health.ClinicalSection
Attribute | Description | Type |
---|---|---|
| Clinical Section Keyword is a keyword from which an eventual Clinical Section is derived. | |
label
| The label of the section, e.g. "LaboratorySection", "MedicationSection", "AnamnesisSection" | String |
3.4.4. Terminology Binding
Name | Languages | Version | Identifier | Comment |
---|---|---|---|---|
clinical-Sections | EN, DE | 1.0 | clinical_sections_de, clinical_sections_en | Types of clinical sections, ID predominantly based on LOINC codes composed and enriched with synonyms by Averbis. |
3.4.5. Web Service Example
Text Example: "Medication Citation|Active|CM| TraMADol HCl - 50 MG Oral Tablet;TAKE 1 TABLET 3 TIMES DAILY.; RPT~Tylenol Arthritis Ext Relief 650 MG TBCR;TAKE 1 TABLET 3-4 TIMES DAILY.; RPT~CeleBREX 200 MG Oral Capsule;TAKE 1 CAPSULE DAILY.; RPT~Folbic TABS;; RPT~Folic Acid 1 MG Oral Tablet;TAKE 1 TABLET DAILY.; RPT~PredniSONE 10 MG Oral Tablet;TAKE 1 TABLET AS NEEDED.; RPT~Cholestyramine 4 GM Oral Packet;MIX THE CONTENTS OF 1 POWDER PACKET WITH 2 TO 6 OZ OF NONCARBONATED BEVERAGE AND DRINK 3 TIMES DAILY.; RPT~Methotrexate 2.5 MG Oral Tablet;TAKE 1 TABLET WEEKLY.; RPT~Citracal Plus Oral Tablet;TAKE 2 TABLET DAILY; RPT~Multi Vitamin Daily TABS;TAKE 1 TABLET DAILY.; RPT~Miscellaneous Medication;Schiff "Move Free". 400 MG taken once daily; RPT"
{ "begin": 0, "end": 734, "type": "de.averbis.types.health.ClinicalSection", "coveredText": "Medication Citation|Active|CM\nTraMADol HCl - 50 MG Oral Tablet;TAKE 1 TABLET 3 TIMES DAILY.; RPT~Tylenol Arthritis Ext Relief 650 MG TBCR;TAKE 1 TABLET 3-4 TIMES DAILY.; RPT~CeleBREX 200 MG Oral Capsule;TAKE 1 CAPSULE DAILY.; RPT~Folbic TABS;; RPT~Folic Acid 1 MG Oral Tablet;TAKE 1 TABLET DAILY.; RPT~PredniSONE 10 MG Oral Tablet;TAKE 1 TABLET AS NEEDED.; RPT~Cholestyramine 4 GM Oral Packet;MIX THE CONTENTS OF 1 POWDER PACKET WITH 2 TO 6 OZ OF NONCARBONATED BEVERAGE AND DRINK 3 TIMES DAILY.; RPT~Methotrexate 2.5 MG Oral Tablet;TAKE 1 TABLET WEEKLY.; RPT~Citracal Plus Oral Tablet;TAKE 2 TABLET DAILY; RPT~Multi Vitamin Daily TABS;TAKE 1 TABLET DAILY.; RPT~Miscellaneous Medication;Schiff \"Move Free\". 400 MG taken once daily; RPT", "id": 16310, "label": "Medication", "keyword": { "begin": 0, "end": 10, "type": "de.averbis.types.health.ClinicalSectionKeyword", "coveredText": "Medication", "id": 16311, "negatedBy": null, "matchedTerm": "Medication", "dictCanon": "Medication", "conceptID": "29549-3", "source": "clinical_sections_en", "uniqueID": "clinical_sections_en:29549-3" }
3.5. Diagnoses
3.5.1. Description
This component detects a condition, problem, diagnosis, or other event, situation, issue, or clinical concept that has risen to a level of concern. Optional: As an additional annotation to the diagnoses component DiagnosisCandidate can be visualized too. This component can be optionally activated which specifically detect diagnosis candidates to optimize DRG coding.
3.5.2. Input
Above this annotator, the following annotators must be included in the pipeline:
To get the full functionality, the following annotators should also be included below this annotator in the given order:
3.5.3. Output
Annotation Type: de.averbis.types.health.Diagnosis
Attribute | Description | Type |
---|---|---|
| Preferred term of the condition. |
|
| The matching synonym of the Diagnosis. |
|
| Unique identifier of a concept of the format 'terminologyId:conceptID'. |
|
| The ID of the concept. |
|
| The name of the terminology source. |
|
approach
| Information about the text mining approach used to generate the annotation. Possible values: DictionaryLookup | SimilarityMatching | DocumentClassification | DerivedByLabValue | String |
confidence
| For approaches using machine learning (e.g. "DocumentClassification"), the confidence is calculated that the respective annotation has been correctly generated. Possible value range: 0-1 Note: Annotations generated with non-machine learning approaches such as terminology mappings (approach = "DictionaryLookup") are reflected with a confidence value of 0. | Double |
onsetDate
| The onset date of the diagnosis, if given in the text. Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17) Please note: The onsetDate is only annotated if the pear component "Disease Onset Date" is integrated in the text analysis pipeline used. The preconfigured pipelines do not contain this component, thus the value of the onset feature is represented as null. |
|
| Specifies the negation word, if one exists. |
|
| Verification status of the actual diagnosis. Possible values (default is underlined): null | NEGATED | ASSURED | SUSPECTED | DIFFERENTIAL |
|
| Clinical status of the actual diagnosis. Possible values (default is underlined): null | ACTIVE | RESOLVED |
|
| The kind of the diagnosis. Possible values (default is underlined): null | main | secondary |
|
| The laterality of the diagnosis. Possible values (default is underlined): null | RIGHT | LEFT | BOTH |
|
| The laterality of the diagnosis. Possible values (default is underlined): null | RIGHT | LEFT | BOTH WARNING: This feature is deprecated and will be removed in V6 of Health Discovery. It will be replaced by the equivalent attribute 'side'. |
|
belongsTo
| Indicates, whether the diagnosis belongs to a donor or recipient (e.g. in case of transplantations) or to a family member. Possible values (default is underlined): null | FAMILY | OTHER | String |
Annotation Type (optional): de.averbis.types.health.DiagnosisCandidate
Attribute | Description | Type |
---|---|---|
| Preferred term of the condition. |
|
| The ID of the concept. |
|
approach | Information about the text mining approach used to generate the annotation. Possible values: DictionaryLookup | SimilarityMatching | DocumentClassification | DerivedByLabValue | String |
confidence
| For approaches using machine learning (e.g. "DocumentClassification"), the confidence is calculated that the respective annotation has been correctly generated. Possible value range: 0-1 Note: Annotations generated with non-machine learning approaches such as terminology mappings (approach = "DictionaryLookup") are reflected with a confidence value of 0. | Double |
| Verification status of the actual diagnosis. Possible values (default is underlined): null | NEGATED | ASSURED | SUSPECTED | DIFFERENTIAL |
|
| Clinical status of the actual diagnosis. Possible values (default is underlined): null | ACTIVE | RESOLVED |
|
belongsTo
| Indicates, whether the diagnosis belongs to a donor or recipient (e.g. in case of transplantations) or to a family member. Possible values (default is underlined): null | FAMILY | OTHER | String |
3.5.4. Terminology Binding
Country | Name | Version | Identifier | Comment |
---|---|---|---|---|
United States | ICD-10-CM | 2021 | ICD10CM_2021 | International Classification of Diseases, 10th Edition, Clinical Modification, 2021, enriched with synonyms from SNOMED CT and by Averbis. |
Germany | ICD-10-GM | 2021 | ICD10GM_2021 | International Classification of Diseases, 1 0th Edition, German Modification, 2021, enriched with synonyms by Averbis. |
3.5.5. Web Service Example
Text Example for Diagnosis: "suspected history of appendicitis"
{ "begin": 10, "end": 33, "type": "de.averbis.types.health.Diagnosis", "coveredText": "history of appendicitis", "id": 788, "negatedBy": null, "side": null, "matchedTerm": "History of appendicitis", "verificationStatus": "SUSPECTED", "kind": null, "confidence": 0, "onsetDate": null, "source": "ICD10CM_2021", "clinicalStatus": "RESOLVED", "approach": "DictionaryLookup", "laterality": null, "dictCanon": "Personal history of other diseases of the digestive system", "conceptID": "Z87.19", "belongsTo": null, "uniqueID": "ICD10CM_2021:Z87.19" }
Text Example for DiagnosisCandidate: "suspected history of appendicitis"
{ "begin": 10, "end": 33, "type": "de.averbis.types.health.DiagnosisCandidate", "coveredText": "history of appendicitis", "id": 788, "verificationStatus": "SUSPECTED", "confidence": 0, "clinicalStatus": "RESOLVED", "approach": "DictionaryLookup", "dictCanon": "Personal history of other diseases of the digestive system", "conceptID": "Z87.19", "belongsTo": null, }
3.6. Diagnosis Status
3.6.1. Description
The annotator recognizes the status of diagnoses. Different status includes, for example, "suspected" or "history of".
3.6.2. Input
Above this annotator, the following annotator must be included in the pipeline:
3.6.3. Output
This annotator sets the features belongsTo
, verificationStatus
and clinicalStatus
in annotations of type Diagnosis
and changes conceptID
and uniqueID
if the diagnosis does not belong to the patient but e.g. to a family member.
3.6.4. Web Service Example
Text Example 1 (ClinicalStatus): "history of appendicitis"
{ "begin": 0, "end": 23, "type": "de.averbis.types.health.Diagnosis", "coveredText": "history of appendicitis", "id": 750, "negatedBy": null, "side": null, "matchedTerm": "History of appendicitis", "verificationStatus": null, "kind": null, "confidence": 0, "onsetDate": null, "source": "ICD10CM_2021", "clinicalStatus": "RESOLVED", "approach": "DictionaryLookup", "laterality": null, "dictCanon": "Personal history of other diseases of the digestive system", "conceptID": "Z87.19", "belongsTo": null, "uniqueID": "ICD10CM_2021:Z87.19" }
Text Example 2 (FamilyDiagnosis): "father has diabetes mellitus"
{ "begin": 11, "end": 28, "type": "de.averbis.types.health.Diagnosis", "coveredText": "diabetes mellitus", "id": 820, "negatedBy": null, "side": null, "matchedTerm": "Diabetes mellitus", "verificationStatus": null, "kind": null, "confidence": 0, "onsetDate": null, "source": "ICD10CM_2021", "clinicalStatus": null, "approach": "DictionaryLookup", "laterality": null, "dictCanon": "Family history of diabetes mellitus", "conceptID": "Z83.3", "belongsTo": "FAMILY", "uniqueID": "ICD10CM_2021:Z83.3" }
3.7. Disambiguation
3.7.1. Description
In case of ambiguous annotations this component decides which annotations should be valid in the given context, e.g. within a list of laboratory values the parameter 'Calcium' represents a laboratory parameter and not an ingredient.
3.7.2. Input
This component requires annotations of at least one of the following types:
3.7.3. Output
Only the annotation which is evaluated as valid is maintained the other(s) are discarded.
3.7.4. Web Service Example
There is no special web service return for Disambiguation.
3.8. Enumerations
3.8.1. Description
This component detects enumerations. The enumerations are recognized based on atomic text units (e.g. chunks) and conjunctions (e.g. the word "and").
3.8.2. Input
Above this annotator, the following annotators must be included in the pipeline:
- LanguageDetection or LanguageSetter
any components which include concept annotators like LabValues, Diagnoses , Medications, Morphology or Topography.
3.8.3. Output
This component sets the following internal type that is not visible in the annotation editor:
Annotation Type*: de.averbis.types.Enumeration
3.8.4. Web Service Example
*The enumeration itself is not returned in the web service. However, the following example shows that both diagnoses are assigned the status "SUSPECTED".
Text Example: "suspicion of bronchitis or asthma bronchiale"
{ "begin": 13, "end": 23, "type": "de.averbis.types.health.Diagnosis", "coveredText": "bronchitis", "id": 1137, "negatedBy": null, "side": null, "matchedTerm": "Bronchitis", "verificationStatus": "SUSPECTED", "kind": null, "confidence": 0, "onsetDate": null, "source": "ICD10CM_2021", "clinicalStatus": null, "approach": "DictionaryLookup", "laterality": null, "dictCanon": "Bronchitis, not specified as acute or chronic", "conceptID": "J40", "belongsTo": null, "uniqueID": "ICD10CM_2021:J40" }, { "begin": 27, "end": 44, "type": "de.averbis.types.health.Diagnosis", "coveredText": "asthma bronchiale", "id": 1138, "negatedBy": null, "side": null, "matchedTerm": "Bronchial asthma", "verificationStatus": "SUSPECTED", "kind": null, "confidence": 0, "onsetDate": null, "source": "ICD10CM_2021", "clinicalStatus": null, "approach": "DictionaryLookup", "laterality": null, "dictCanon": "Unspecified asthma, uncomplicated", "conceptID": "J45.909", "belongsTo": null, "uniqueID": "ICD10CM_2021:J45.909" }
3.9. GenericTerminologyAnnotator
3.9.1. Description
The generic Terminology Annotator recognizes terms from terminologies created in Health Discovery’s Terminology Editor module.
3.9.2. Input
Above this annotator, the following annotator must be included in the pipeline:
3.9.3. Output
The component creates annotations of type:
Annotation Type: de.averbis.extraction.types.Concept
Attribute | Description | Type |
---|---|---|
| Preferred term of the concept. |
|
| Unique identifier of a concept of the format 'terminologyId:conceptID'. |
|
| The concept id. |
|
| The name of the terminology source. |
|
| The matching synonym of the terminology source. |
|
negatedBy
| Specifies the negation word, if one exists. | String |
The exact type depends on the terminology files used and the concept types specified in them.
3.9.4. Configuration
The GenericTerminologyAnnotator has various parameters to annotate texts to terms from the terminologies defined and maintained in the terminology modules of Health Discovery. The various parameters are listed in table below.
Name | Description | Type | MultiValued | Mandatory |
---|---|---|---|---|
| Names of the source terminologies. |
|
|
|
| Apply lookup based on stems. |
|
|
|
| Apply lookup based on segments. |
|
|
|
3.9.4.1. Configuration Example 1: useStemLookup and useSegmentLookup inactived.
Let us first consider the case where the parameters useSegmentLookup and useStemLookup are disabled. In these cases, a mapping still takes place, namely a basic, simple mapping. In this case, all terms from the terminology are mapped as follows:
- Mapping Modus: Simple
- Case Sensitivity: Upper and lower case, some punctuation and the occurrence of stop words (e.g. 'of', 'the', 'a', ',') are ignored.
- Word Order: The word order in text and terminology is not important for a match.
Example: the term "Appendix Inflammation" is mapped to the text snippet "inflammation of the appendix".
3.9.4.2. Configuration Example 2: useStemLookup activated
Now we activate the mode "useStemLookup". This will now apply a stemming to the mapping, which reduces inflected (or sometimes derived) words to their word stems, base or root forms:
- Mapping Modus: Stemming
- Case Sensitivity: Upper and lower case, some punctuation and the occurrence of stop words (e.g. 'of', 'the', 'a', ',') are ignored.
- Word Order: The word order in text and terminology is not important for a match.
Example: the term "Inflamed Appendix" is mapped to the text snipped "inflammation of the appendix".
3.9.4.3. Configuration Example 3: useSegmentLookup actived
The segment lookup mode uses a dictionary-based approach to decompose compound words into their word components. The term "decompounding" is often used for this purpose. It can be helpful in so-called agglutinating languages, which combine many words into new compound words. Conversely, however, there is a risk that only parts of words in texts are mapped to a term, resulting in false positive hits. Therefore, the segment mode should only be used in exceptional cases.
- Mapping Modus: Segmenting (Decompounding)
- Case Sensitivity: Upper and lower case, some punctuation and the occurrence of stop words (e.g. 'of', 'the', 'a', ',') are ignored.
- Word Order: The word order in text and terminology is not important for a match.
3.9.4.4. Activating the "Exact mode" in terminology administration
The Exact Mode is, like the simple mode, automatically activated in each pipeline, so you will not find a parameter for this in the GenericTerminologyAnnotator. The user has special influence on the terms to be mapped in exact mode. This mode is only applied to a specific part of the terminology, namely to exactly those terms for which the user has actively set the mapping mode to "EXACT" in the terminology editor. The "EXACT" mapping mode ensures that the corresponding term is found in exactly the same spelling as the term, i.e. in the same uppercase and lowercase letters, in the same word order and without any pre-processing in the form of e.g. stemming:
- Mapping Modus: Exact
- Case Sensitivity: Upper and lower case is considered, stop words are preserved.
- Word Order: The word order in text and terminology must be the same.
3.9.5. Web Service Example
Text Example: "appendicitis"
{ "begin": 0, "end": 12, "type": "de.averbis.types.health.Concept", "coveredText": "Appendizitis", "id": 303, "matchedTerm": "Appendizitis", "dictCanon": "Appendizitis", "conceptID": "2", "source": "test_1.0", "uniqueID": "test_1.0:2" }
3.10. Gleason Score
3.10.1. Description
This component recognizes Gleason score annotations.
3.10.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.10.3. Output
Annotation Type: de.averbis.types.health.GleasonScore
Attribute | Description | Type |
---|---|---|
| The combined score. |
|
| The primary grade (not always available). |
|
| The secondary grade (not always available). |
|
3.10.4. Web Service Example
Text Example: "Gleason Pattern 3(60%) + 4(40%) = 7"
{ "begin": 0, "end": 35, "type": "de.averbis.types.health.GleasonScore", "coveredText": "Gleason Pattern 3(60%) + 4(40%) = 7", "id": 1855, "score": "7", "primaryGrade": "3", "secondaryGrade": "4" }
3.11. GvHD
3.11.1. Description
This component recognizes information about the occurrence of a GvHD (Graft-versus-Host-Disease).
3.11.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.11.3. Output
Annotation Type: de.averbis.types.health.GvHD
Attribute | Description | Type |
---|---|---|
| Preferred term of the concept. |
|
matchedTerm
| The matching synonym of the GvHD concept in the terminology. | String |
uniqueID
| Unique identifier of the concept of the format 'terminologyId:conceptID'. | String |
| The ID of the concept. |
|
source
| The name of the terminology source. | String |
confidence
| For approaches using machine learning (e.g. "DocumentClassification"), the confidence is calculated that the respective annotation has been correctly generated. Possible value range: 0-1 Note: Annotations generated with non-machine learning approaches such as terminology mappings (approach = "DictionaryLookup") are reflected with a confidence value of 0. | Double |
negatedBy
| Specifies the negation word, if one exists. | String |
approach
| Information about the text mining approach used to generate the annotation. Possible values: DictionaryLookup | SimilarityMatching | DocumentClassification | DerivedByLabValue | String |
onsetDate
| The onset date of the diagnosis, if given in the text. Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17) Please note: The onsetDate is only annotated if the pear component "Disease Onset Date" is integrated in the text analysis pipeline used. The preconfigured pipelines do not contain this component, thus the value of the onset feature is represented as null. | String |
verificationStatus
| Verification status of the GvHD diagnosis. Possible values (default is underlined): null | NEGATED | ASSURED | SUSPECTED | DIFFERENTIAL | String |
clinicalStatus
| Clinical status of the GvHD diagnosis. Possible values (default is underlined): null | ACTIVE | RESOLVED | String |
kind
| The kind of the diagnosis. Possible values (default is underlined): null | main | secondary | String |
side
| The laterality of the diagnosis. Possible values (default is underlined): null | RIGHT | LEFT | BOTH | String |
laterality
| The laterality of the diagnosis. Possible values (default is underlined): null | RIGHT | LEFT | BOTH WARNING: This feature is deprecated and will be removed in V6 of Health Discovery. It will be replaced by the equivalent attribute 'side'. | String |
belongsTo
| Indicates, whether the diagnosis belongs to a donor or recipient (e.g. in case of transplantations) or to a family member. Possible values (default is underlined): null | DONOR | FAMILY | RECIPIENT | String |
| GvHD status. Possible values (default is underlined): null | ACUTE | CHRONIC |
|
| Grade of the GvHD diagnosis. Possible values (default is underlined): null I | II | III | IV |
|
| Stage of the GvHD diagnosis. Possible values (default is underlined): null | 1 | 2 | 3 | 4 | LIMITED | EXTENDED |
|
| Organ diagnosed with GvHD. Possible values (default is underlined): null | SKIN | LIVER | INTESTINAL | EYE | LUNG | CONNECTIVE TISSUE | MUCOSA | VAGINAL |
|
| The date of the diagnosis. Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17) |
|
GvHD
is a subtype of Diagnosis
, i.e. it inherits all features.
3.11.4. Web Service Example
Text Example: "Akute Transplantat-gegen-Wirt Erkrankung Stadium 3 der Haut, Schweregrad III"
{ "begin": 0, "end": 76, "type": "de.averbis.types.health.GvHD", "coveredText": "Akute Transplantat-gegen-Wirt Erkrankung Stadium 3 der Haut, Schweregrad III", "id": 1346, "date": null, "organ": "SKIN", "negatedBy": null, "side": null, "matchedTerm": "Akute Transplantat-gegen-Wirt Erkrankung Stadium 3 der Haut", "verificationStatus": null, "kind": null, "confidence": 0, "onsetDate": null, "source": "ICD10GM_2021", "clinicalStatus": null, "approach": "DictionaryLookup", "laterality": null, "stage": "3", "dictCanon": "Stadium 3 der akuten Haut-Graft-versus-Host-Krankheit", "grade": "III", "continuanceStatus": "ACUTE", "conceptID": "L99.13*", "belongsTo": null, "uniqueID": "ICD10GM_2021:L99.13*" }
3.12. Health Measurements
3.12.1. Description
This component detects measurements in medical texts.
3.12.2. Input
Above this annotator, the following annotator must be included in the pipeline:
When generating a measurement annotation a NumericValue and a unit is combined. The LaboratoryParameter
annotation allows the generation of a measurement even when a unit is missing, e.g. Hb 11.
3.12.3. Output
Annotation Type: de.averbis.types.health.Measurement
Attribute | Description | Type |
---|---|---|
| The unit of the measurement. |
|
| Normalized string value of the unit. |
|
| Normalized value of the measurement. This value is the result of the transformation of the numeric value according to the transformation of the unit to its standard unit. |
|
| The numeric value of the measurement. |
|
| The dimension of the unit, e.g. [M] standing for mass in the example below. |
|
3.12.4. Web Service Example
Health Measurements are only returned in the context of a LabValues and Medications.
3.13. Health Preprocessing
3.13.1. Description
This pipeline block is responsible for preprocessing the input documents and preparing the minimal set of required annotations which serve as input for the subsequenet components. Among others, this pipeline block recognizes and annotates words, sentences, abbreviations, temporal expressions and numerical values. Additionally, it filters out the stopwords (i.e, commonly used words which carry no important significance) and improves the sentence segmentation altered by abbreviations.
For the optimal functionality of the subsequent components, it is recommended to run the Health Preprocessing beforehand.
3.13.2. Input
Above this annotator, one of the following annotators must be included in the pipeline:
3.13.3. Output
This component generates annotations which will be processed by the subsequent components, e.g. words, sentences, abbreviations, temporal expressions and numerical values.
3.13.4. Web Service Example
The annotations generated by the preprocessing pipeline block are not returned in the web service.
3.14. HealthPostProcessing
3.14.1. Description
This pipeline block contains annotators that can be used for postprocessing annotations of previous pipeline components. The first element provided in postprocessing is the "Blacklist Removal Annotator", which is described in more detail below.
3.14.2. Input
This pipeline block is based on the fact that the annotators, whose output is changed by a post-processing component, are previously included in the pipeline.
3.14.3. Output
The output of this pipeline block depends on the components which are used as part of the postprocessing.
3.14.4. Web Service Example
The annotations generated by the postprocessing pipeline block are not returned in the web service.
3.14.5. Blacklist Removal Annotator
3.14.5.1. Description
This annotator is a component of the HealthPostProcessing pipeline block. It can be used to remove annotations using a blacklist. The blacklist consists of several blacklist terms that can be set via parameters in the pipeline configuration on the Health Discovery User Interface.
By default, only the following annotation types are removed by the annotator:
- CodingCandidate
- Diagnosis
- Drug
- Ingredient
- LaboratoryParameter
- LaboratoryValue
- Medications
3.14.5.2. Input
This component does not require any specific annotators. It relies on the assumption that the affected annotators (see above) are defined in the typesystem.
3.14.5.3. How to use the Blacklist Removal Annotator
Step 1: Go to "Pipeline Configuration"
Step 2: Stop the pipeline in which you would like to use the Blacklist Annotator and click the button "edit pipeline".
Step 3: Select from the list of available annotators on the right panel the component "HealthPostProcessing" and add the component as last position to your pipeline.
S tep 4: Click on the HealthPostProcessing component in your pipeline to display the containing components.
Step 5: Click on "BlacklistAnnotationRemover" and fill in the text passages you would like to remove from the text analysis output.
Step 6: Save and restart the pipeline.
Please note: 1) choose the terms you enter carefully, because they change the output of the pipeline for all documents that are analyzed with this pipeline, and 2) the parameter "ignoreCase" can be used to define whether the terms should be treated case-sensitive or not. If you want terms to be handled differently, add a second HealthPostProcessing block in your pipeline.
3.14.5.4. Output
This component only removes annotations, no new annotations are added.
3.14.5.5. Web Service Example
This component removes annotations, thus there is no return in the web service.
3.15. HLA
3.15.1. Description
This component annotates information about HLA (human leukocyte antigen).
3.15.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.15.3. Output
Annotation Type: de.averbis.types.health.HLA
Attribute | Description | Type |
---|---|---|
| Preferred term of HLA. |
|
matchedTerm
| The matching synonym of the GvHD concept in the terminology. | String |
uniqueID
| Unique identifier of the concept of the format 'terminologyId:conceptID'. | String |
| The ID of the concept. |
|
source
| The name of the terminology source. | String |
negatedBy
| Specifies the negation word, if one exists. | String |
| Date of observation. (Format: YYYY-MM-DD) |
|
| Date of sampling (Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2009-02-17)) | String |
| Date of receipt of sample. (Format: YYYY-MM-DD) |
|
belongsTo
| Indicates, whether the diagnosis belongs to a donor or recipient (e.g. in case of transplantations). Possible values (default is underlined): null | DONOR | RECIPIENT | String |
| Paternal HLA manifestation. |
|
| Maternal HLA manifestation. |
|
Annotation Type: de.averbis.types.health.HLAValue
Attribute | Description | Type |
---|---|---|
| Allele group of actual HLA. |
|
| Specific protein of actual HLA. |
|
| Synonymous DNA substitution within the coding region. |
|
| Differences in non-coding region. |
|
| Suffix to code changes in expression. |
|
3.15.4. Web Service Example
Example of HLA in a medical record (table view)
HLA-A | |
---|---|
Patient | 0101, 6801 |
Donor | 0101, 6801 |
Example of HLA table converted to text:
HLA-A
Patient
0101,6801
Donor
0101,6802
Output when text send to Web Service:
{ "begin": 0, "end": 5, "type": "de.averbis.types.health.HLA", "coveredText": "HLA-A", "id": 1272, "date": null, "negatedBy": null, "matchedTerm": "HLA-A", "dictCanon": "HLA-A", "receiptDate": null, "conceptID": "LP18319-1", "source": "Averbis-Lab-Terminology_2.0", "female": { "begin": 19, "end": 23, "type": "de.averbis.types.health.HLAValue", "coveredText": "6801", "id": 1274, "alleleGroup": "68", "noncodingRegionVariant": null, "protein": "01", "synonymousDNA": null, "expressionNote": null }, "samplingDate": null, "belongsTo": "RECIPIENT", "uniqueID": "Averbis-Lab-Terminology_2.0:LP18319-1", "male": { "begin": 14, "end": 18, "type": "de.averbis.types.health.HLAValue", "coveredText": "0101", "id": 1273, "alleleGroup": "01", "noncodingRegionVariant": null, "protein": "01", "synonymousDNA": null, "expressionNote": null } }, { "begin": 0, "end": 5, "type": "de.averbis.types.health.HLA", "coveredText": "HLA-A", "id": 1275, "date": null, "negatedBy": null, "matchedTerm": "HLA-A", "dictCanon": "HLA-A", "receiptDate": null, "conceptID": "LP18319-1", "source": "Averbis-Lab-Terminology_2.0", "female": { "begin": 35, "end": 39, "type": "de.averbis.types.health.HLAValue", "coveredText": "6802", "id": 1277, "alleleGroup": "68", "noncodingRegionVariant": null, "protein": "02", "synonymousDNA": null, "expressionNote": null }, "samplingDate": null, "belongsTo": "DONOR", "uniqueID": "Averbis-Lab-Terminology_2.0:LP18319-1", "male": { "begin": 30, "end": 34, "type": "de.averbis.types.health.HLAValue", "coveredText": "0101", "id": 1276, "alleleGroup": "01", "noncodingRegionVariant": null, "protein": "01", "synonymousDNA": null, "expressionNote": null }
3.16. Irradiation
3.16.1. Description
This component recognizes information about a previous irradiation therapy.
3.16.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.16.3. Output
Annotation Type: de.averbis.types.health.Irradiation
Attribute | Description | Type |
---|---|---|
| Preferred term of the Irradiation concept. |
|
| Matching synonym of the Irradiation concept. |
|
| Unique identifier of the Irradiation concept of the format 'terminologyId:conceptID'. |
|
| The concept id. |
|
| The name of the terminology source. |
|
negatedBy
| Specifies the negation word, if one exists. | String |
| The irradiation dose. |
|
| Temporal information (date interval) about the irradiation therapy. |
|
Annotation Type: de.averbis.types.health.IrradiationDose
Attribute | Description | Type |
---|---|---|
| The irradiation dose kind. Possible values (default is underlined): null | FRACTIONAL |
|
| The dose. |
|
3.16.4. Web Service Example
Text Example (Irradiation): "Fraktionierte Ganzkörperbestrahlung (TBI) über opponierende Felder mit einer Gesamtdosis von 12 Gy vom 18.11. bis 20.11.2008"
{ "begin": 0, "end": 41, "type": "de.averbis.types.health.Irradiation", "coveredText": "Fraktionierte Ganzkörperbestrahlung (TBI)", "id": 1854, "negatedBy": null, "matchedTerm": "Ganzkörperbestrahlung", "dictCanon": "Bestrahlung", "irradiationDose": { "begin": 93, "end": 98, "type": "de.averbis.types.health.IrradiationDose", "coveredText": "12 Gy", "id": 1855, "dose": { "begin": 93, "end": 98, "type": "de.averbis.types.health.Measurement", "coveredText": "12 Gy", "id": 1856, "unit": "Gy", "normalizedUnit": "m²/s²", "normalizedValue": 12, "value": 12, "dimension": "[L]²/[T]²" }, "kind": "FRACTIONAL" }, "conceptID": "10037794", "source": "Averbis-Therapy_1.0", "uniqueID": "Averbis-Therapy_1.0:10037794", "dateInterval": { "begin": 99, "end": 124, "type": "de.averbis.types.health.DateInterval", "coveredText": "vom 18.11. bis 20.11.2008", "id": 1857, "endDate": "2008-11-20", "kind": "DATEINTERVAL", "value": "[2008-11-18, 2008-11-20]", "startDate": "2008-11-18" }
3.17. Lab Values
3.17.1. Description
This component detects laboratory values and vital signs, such as blood pressure levels, ECOG (Eastern Cooperative Oncology Group) and NYHA (New York Heart Association) performance status, left ventricular ejection fraction and many more.
The annotation of measurements is already integrated in this pipeline block. If measurements are needed for other components (e.g. for Medications), they should be executed afterwards. For more details of measurements see Health Measurements.
3.17.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.17.3. Output
Annotation Type: de.averbis.types.health.LaboratoryValue
Attribute | Description | Type |
---|---|---|
| Parameter of actual laboratory value. |
|
| Measurement of actual laboratory value. |
|
| A optional relative assessment of the fact. |
|
| Lower reference value of actual laboratory value. |
|
| Upper reference value of actual laboratory value. |
|
| Interpretation of fact depending on reference values or interpretation in text (also possible without fact). Possible values (default is underlined): null | normal | abnormal | high | low |
|
| Qualitative value of the actual laboratory value. |
|
belongsTo
| Indicates, whether the laboratory value belongs to a donor or recipient (e.g. in case of transplantations) or to a family member. Possible values (default is underlined): null | DONOR | FAMILY | RECIPIENT | String |
Annotation Type: de.averbis.types.health.LaboratoryParameter
Attribute | Description | Type |
---|---|---|
| Preferred term of the LaboratoryParameter concept. |
|
| Matching synonym of the LaboratoryParameter concept. |
|
| Unique identifier of the LaboratoryParameter concept of the format 'terminologyId:conceptID'. |
|
| The concept id. |
|
| The name of the terminology source. |
|
negatedBy | Specifies the negation word, if one exists. | String |
Annotation Type: de.averbis.types.health.QualitativeValue
Attribute | Description | Type |
---|---|---|
value
| Qualitative statement on a laboratory value. Possible values (default is underlined): null | 1- | - - | 2- | - - - | 3- | 1+ | ++| 2+| +++| 3+ | APPROPRIATE | BORDERLINE | EVIDENCE | NEGATIVE | NO_EVIDENCE | NOT_QUANTIFIABLE | PARTIAL | POSITIVE | SPECKLED | STAINING | UNKOWN | String |
modifier
| Describes the characteristic of a qualitative value. Possible values (default is underlined): null | ABNORMAL | ALTERNATING | BORDERLINE | CENTROMERE | CIRCULAR | CONTINUOUS | CYTOPLASMATIC | DEMONSTRABLE | HOMOGENEOUS | MODERATE | NOT | NOT_QUANTIFIABLE | NUCLEOLAR | PERINUCLEOLAR | QUANTIFIABLE | QUALITATIVE | REPEATED | STRONG | WEAK | String |
NEW: Standard Feature
This type has now all standard features: 'begin' 'end'; 'type', 'coveredText' and 'id'
Annotation Type: de.averbis.types.health.BloodPressure
Attribute | Description | Type |
---|---|---|
| Measurement of systolic blood pressure. |
|
| Measurement of diastolic blood pressure. |
|
| Interpretation of systolic and diastolic values depending on named interpretations in the text. Possible values (default is underlined): null | normal | abnormal | high | low |
|
Annotation Type: de.averbis.types.health.ECOG
Attribute | Description | Type |
---|---|---|
| Stage of the he ECOG (Eastern Cooperative Oncology Group) Performance Status (numeric scale). |
|
Annotation Type: de.averbis.types.health.NYHA
Attribute | Description | Type |
---|---|---|
stage
| Stage of the he ECOG (Eastern Cooperative Oncology Group) Performance Status (numeric scale). |
|
Annotation Type: de.averbis.types.health.Organism
Attribute | Description | Type |
---|---|---|
| Matching synonym of the organism concept found in the text. |
|
dictCanon
| Preferred term of the organism concept. | String |
kind
| The kind of the organism, e.g. 'Bacterium', 'Virus' or 'Fungus' | String |
conceptID
| The ID of the concept. | String |
source
| The name of the terminology source. | String |
uniqueID
| Unique identifier of the concept of the format 'terminologyId:conceptID'. | String |
| Specifies the negation word, if one exists. | String |
3.17.4. Terminology Binding
Name | Languages | Version | Identifier | Comment |
---|---|---|---|---|
Averbis Lab Terminology | EN, DE | 2.0 | Averbis-Lab-Terminology_2.1 | Laboratory and vital signs parameters, ID based on LOINC codes (LOINC parts) composed by Averbis. |
SNOMED-CT Bacteria | EN, DE | 2020 | SNOMED-CT-Bacteria_2020 | Terminology of bacteria, ID based on SNOMED-CT codes composed and enriched by Averbis. |
SNOMED-CT Fungus | EN, DE | 2020 | SNOMED-CT-Fungus_2020 | Terminology of fungi, ID based on SNOMED-CT codes composed and enriched by Averbis. |
SNOMED-CT Virus | EN, DE | 2020 | SNOMED-CT-Virus_2020 | Terminology of viruses, ID based on SNOMED-CT codes composed and enriched by Averbis. |
3.17.5. Web Service Example
Example 1 (LabValue with interpretation): "Uric acid 9.6 mg/dl (3.5-7.0)"
{ "begin": 0, "end": 29, "type": "de.averbis.types.health.LaboratoryValue", "coveredText": "Uric acid 9.6 mg/dl (3.5-7.0)", "id": 2247, "factAssessment": null, "fact": { "begin": 10, "end": 19, "type": "de.averbis.types.health.Measurement", "coveredText": "9.6 mg/dl", "id": 2248, "unit": "mg/dL", "normalizedUnit": "kg/m³", "normalizedValue": 0.096, "value": 9.6, "dimension": "[M]/[L]³" }, "interpretation": "high", "parameter": { "begin": 0, "end": 9, "type": "de.averbis.types.health.LaboratoryParameter", "coveredText": "Uric acid", "id": 2246, "negatedBy": null, "matchedTerm": "Uric Acid", "dictCanon": "Urate", "conceptID": "LP15935-7", "source": "Averbis-Lab-Terminology_2.1", "uniqueID": "Averbis-Lab-Terminology_2.1:LP15935-7" }, "upperLimit": { "begin": 25, "end": 28, "type": "de.averbis.types.health.Measurement", "coveredText": "7.0", "id": 2250, "unit": "mg/dL", "normalizedUnit": "kg/m³", "normalizedValue": 0.07, "value": 7, "dimension": "[M]/[L]³" }, "qualitativeValue": null, "lowerLimit": { "begin": 21, "end": 24, "type": "de.averbis.types.health.Measurement", "coveredText": "3.5", "id": 2249, "unit": "mg/dL", "normalizedUnit": "kg/m³", "normalizedValue": 0.035, "value": 3.5, "dimension": "[M]/[L]³" }, "belongsTo": null }
Text Example 2 (QualitativeValue): "CMV antibody strong positive"
{ "begin": 0, "end": 28, "type": "de.averbis.types.health.LaboratoryValue", "coveredText": "CMV antibody strong positive", "id": 838, "factAssessment": null, "fact": null, "interpretation": null, "parameter": { "begin": 0, "end": 12, "type": "de.averbis.types.health.LaboratoryParameter", "coveredText": "CMV antibody", "id": 837, "negatedBy": null, "matchedTerm": "CMV antibody", "dictCanon": "Cytomegalovirus Ab", "conceptID": "LP37878-3", "source": "Averbis-Lab-Terminology_2.1", "uniqueID": "Averbis-Lab-Terminology_2.1:LP37878-3" }, "upperLimit": null, "qualitativeValue": { "begin": 13, "end": 28, "type": "de.averbis.types.health.QualitativeValue", "coveredText": "strong positive", "id": 839, "modifier": "STRONG", "value": "POSITIVE" }, "lowerLimit": null, "belongsTo": null }
Text Example 3 (BloodPressure): "BP 129/61 mmHg"
{ "begin": 0, "end": 14, "type": "de.averbis.types.health.BloodPressure", "coveredText": "BP 129/61 mmHg", "id": 1072, "systolic": { "begin": 3, "end": 6, "type": "de.averbis.types.health.Measurement", "coveredText": "129", "id": 1073, "unit": "mmHg", "normalizedUnit": "kg/(m·s²)", "normalizedValue": 17198.538, "value": 129, "dimension": "[M]/([L]·[T]²)" }, "diastolic": { "begin": 7, "end": 14, "type": "de.averbis.types.health.Measurement", "coveredText": "61 mmHg", "id": 1074, "unit": "mmHg", "normalizedUnit": "kg/(m·s²)", "normalizedValue": 8132.642, "value": 61, "dimension": "[M]/([L]·[T]²)" }, "interpretation": null }
Text Example 4 (ECOG Performance Status): "Patient's performance status is ECOG 2."
{ "begin": 32, "end": 38, "type": "de.averbis.types.health.ECOG", "coveredText": "ECOG 2", "id": 1008, "stage": "2" }
Text Example 5 (NYHA Classification): "NYHA Class II"
{ "begin": 0, "end": 13, "type": "de.averbis.types.health.NYHA", "coveredText": "NYHA Class II", "id": 675, "stage": "2" }
Text Example 6 (Organism, LabValue): "Klebsiella pneumoniae positiv"
{ "begin": 0, "end": 29, "type": "de.averbis.types.health.LaboratoryValue", "coveredText": "Klebsiella pneumoniae positiv", "id": 832, "factAssessment": null, "fact": null, "interpretation": null, "parameter": { "begin": 0, "end": 21, "type": "de.averbis.types.health.LaboratoryParameter", "coveredText": "Klebsiella pneumoniae", "id": 833, "negatedBy": null, "matchedTerm": "Klebsiella pneumoniae", "dictCanon": "Klebsiella pneumoniae", "conceptID": "56415008", "source": "SNOMED-CT-Bacteria_2020", "uniqueID": "SNOMED-CT-Bacteria_2020:56415008" }, "upperLimit": null, "qualitativeValue": { "begin": 22, "end": 29, "type": "de.averbis.types.health.QualitativeValue", "coveredText": "positiv", "id": 834, "modifier": null, "value": "POSITIVE" }, "lowerLimit": null, "belongsTo": null } { "begin": 0, "end": 21, "type": "de.averbis.types.health.Organism", "coveredText": "Klebsiella pneumoniae", "id": 835, "negatedBy": null, "matchedTerm": "Klebsiella pneumoniae", "dictCanon": "Klebsiella pneumoniae", "kind": "Bacterium", "conceptID": "56415008", "source": "SNOMED-CT-Bacteria_2020", "uniqueID": "SNOMED-CT-Bacteria_2020:56415008" }
3.18. Language Detection
3.18.1. Description
This component recognizes and sets the text language. It currently supports German and English. In contrast to the LanguageSetter, this component decides individually for each document which language it is and sets the language accordingly.
If no language can be detected the language is set to 'German'.
3.18.2. Input
The component does not expect any annotations.
3.18.3. Output
The component sets the parameter 'documentLanguage' in the type
de.averbis.types.health.DocumentAnnotation
3.18.4. Web Service Example
Text Example: "this is a sample text."
{ "begin": 0, "end": 22, "type": "de.averbis.types.health.DocumentAnnotation", "coveredText": "this is a sample text.", "id": 678, "language": "en", "version": null }
3.19. LanguageSetter
3.19.1. Description
A language setter sets the text language in a document. It should only be used if the language is the same for all documents that are sent to this pipeline.
3.19.2. Input
The component does not expect any annotations.
3.19.3. Output
The component sets the parameter documentLanguage.
3.19.4. Configuration
Name | Description | Type | MultiValued | Mandatory |
---|---|---|---|---|
| The document language to set if not already set in CAS. |
|
|
|
| If true an existing document language will be overwritten. |
|
|
|
3.19.5. Web Service Example
The language is currently not returned in the web service.
3.20. Laterality
3.20.1. Description
This component annotates the laterality or body site of different annotation types, e.g. Diagnosis
, Procedure
and Ophthalmology
.
3.20.2. Input
Above this annotator, the following annotators must be included in the pipeline:
This annotator must be included above the annotators whose feature 'side' it sets.
3.20.3. Output
This annotator sets the feature 'side' in above mentioned annotation types.
3.20.4. Web Service Example
As a standalone component, this doesn’t return anything in the web service.
3.21. Medications
3.21.1. Description
This component detects medications, which are a combination of the active ingredient or preparation, a strength, a dose frequency, the dose form, the route of administration and date intervals or a single date.
3.21.2. Input
Above this annotator, the following annotators must be included in the pipeline:
For the annotation of measurements either the LabValues block or the HealthMeasurements block should be executed beforehand.
3.21.3. Output
Annotation Type: de.averbis.types.health.Medication
Attribute | Description | Type |
---|---|---|
| Drug or multi drug of the actual medication. Multi-Value Field |
|
| Dose frequency of the actual medication. Possible forms are a general |
|
| Dose form of the actual medication. |
|
| Temporal information (date or date interval) about the actual medication. Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17) |
|
| The routes of administration of this medication, presented as String(s). Please see Web Service Example 2 for more details. Multi-Value Field |
|
| Amount of medication per unit of time, e.g., 2 doses. |
|
| Status of the medication. Possible values (default is underlined): null | ADMISSION | ALLERGY | INPATIENT | DISCHARGE | NEGATED | CONSIDERED | INTENDED | FAMILY |CONDITIONING_TREATMENT |
|
termTypes | Additional information on clinical drug, e.g. semantic clinical drug (RxNorm TermType). Multi-Value Field | TTY |
Annotation Type: de.averbis.types.health.Date
Attribute | Description | Type |
---|---|---|
| Kind of the date information, here: "DATE". | String |
value
| Value of the date. Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17) | String |
Annotation Type: de.averbis.types.health.DateInterval
Attribute | Description | Type |
---|---|---|
| Kind of the date information, here: "DATEINTERVAL" | String |
value
| Value of the date. Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17) | String |
startDate
| First date of the date interval. | String |
endDate
| Second date of the date interval. | String |
Annotation Type: de.averbis.types.health.Drug
Attribute | Description | Type |
---|---|---|
| Ingredient of the drug. |
|
| Strength of the drug. |
|
Drugs with more than one ingredient (multi drugs) are also detected and consist of multiple Drug-annotations.
Annotation Type: de.averbis.types.health.Ingredient
Attribute | Description | Type |
---|---|---|
| Preferred term of Ingredient. |
|
| Matching synonym of Ingredient. |
|
| Unique identifier of Ingredient of the format 'terminologyId:conceptID'. |
|
| The concept id. |
|
| The name of the terminology source. |
|
negatedBy | Specifies the negation word, if one exists. | String |
Annotation Type: de.averbis.types.health.strength
Attribute | Description | Type |
---|---|---|
| Preferred term of Ingredient. |
|
| Matching synonym of Ingredient. |
|
| Unique identifier of Ingredient of the format 'terminologyId:conceptID'. |
|
| The concept id. |
|
| The name of the terminology source. |
|
negatedBy | Specifies the negation word, if one exists. | String |
unit | The unit of the measurement. | String |
normalizedUnit | Normalized string value of the unit. | String |
dimension | The dimension of the unit, e.g. [M] standing for mass in the example below. | |
value | The numeric value of the measurement. | String |
normalizedValue | Normalized value of the measurement. This value is the result of the transformation of the numeric value according to the transformation of the unit to its standard unit. | String |
Annotation Type: de.averbis.types.health.DoseForm
Attribute | Description | Type |
---|---|---|
| Preferred term of the dose form. |
|
| Matching synonym of the Ingredient. |
|
| Unique identifier of the dose form concept of the format 'terminologyId:conceptID'. |
|
| The concept id. |
|
| The name of the terminology source. |
|
negatedBy | Specifies the negation word, if one exists. | String |
Annotation Type: de.averbis.types.health.DoseFrequency
Attribute | Description | Type |
---|---|---|
| Preferred term of the dose frequency (optional). |
|
| The matched Term of the Ingredient concept (optional). |
|
| Unique identifier of the dose frequency of the format 'terminologyId:conceptID' (optional). |
|
| The concept id (optional). |
|
| The name of the terminology source (optional). |
|
| Specifies the negation word, if one exists. | String |
| The taking interval of a medication, e.g. day, week, month etc. |
|
| Total count of taken drug units per interval. |
|
| Total dose of taken drug per interval. |
|
| Only available for DayTimeDoseFrequency: represent the count of drug units to be taken at the different daytimes. |
|
| Only available for WeekTimeDoseFrequency: represent the count of drug units to be taken at the different week days. |
|
Annotation Type: de.averbis.types.health.TTY
Attribute | Description | Type |
---|---|---|
| Term type code for the medication. |
|
| The kind of the TTY, e.g. "IN" for ingridient, "SCDC" for ingridient and drug. |
|
| Term type description for the medication. |
|
NEW: Standard Feature
This type has now all standard features: 'begin' 'end'; 'type', 'coveredText' and 'id'
3.21.4. Terminology Binding
Country | Name | Version | Identifier | Comment |
---|---|---|---|---|
United States | RxNorm Ingredients | 2020 | RxNorm-Ingredients_2020_08 | Subset of RxNorm, a US-specific terminology in medicine that contains all medications available on the US market in 2020, enriched with synonyms by Averbis. This subset contains only the ingredients. |
United States | RxNorm Strength | 2019 | RxNormStrength_2019071 | Subset of RxNorm, a US-specific terminology in medicine that contains all medications available on the US market in 2020, enriched with synonyms by Averbis. This subset contains only the strengths. |
United States | Averbis-Dose-Frequency | 1.0 | Averbis-Dose-Frequency_1.0 | Terminology of dose frequencies, ID based on SNOMED-CT codes composed and enriched by Averbis. |
United States / Germany | Averbis Dose Form | 1.0 | Averbis-Dose-Form_1.0 | Terminology of dose forms, composed and enriched by Averbis. Based on SNOMED-CT, RxNorm and Abdamed. |
Germany | Abdamed-Averbis | 2017 | Abdamed-Averbis_2017 | Database of pharmaceutical and medication terminology in Germany, 2017, enriched with synonyms by Averbis. |
3.21.5. Web Service Example
Text Example1: "Medication on discharge: Aspirin 100 mg 1-0-1 TAB from 01/01 to 01/30/2018"
{ "begin": 25, "end": 49, "type": "de.averbis.types.health.Medication", "coveredText": "Aspirin 100 mg 1-0-1 TAB", "id": 2421, "date": { "begin": 50, "end": 74, "type": "de.averbis.types.health.DateInterval", "coveredText": "from 01/01 to 01/30/2018", "id": 2430, "endDate": "2018-01-30", "kind": "DATEINTERVAL", "value": "[2018-01-01, 2018-01-30]", "startDate": "2018-01-01" }, "administrations": [], "drugs": [ { "begin": 25, "end": 39, "type": "de.averbis.types.health.Drug", "coveredText": "Aspirin 100 mg", "id": 2422, "ingredient": { "begin": 25, "end": 32, "type": "de.averbis.types.health.Ingredient", "coveredText": "Aspirin", "id": 2423, "negatedBy": null, "matchedTerm": "Aspirin", "dictCanon": "Aspirin", "conceptID": "1191", "source": "RxNorm_2020_08", "uniqueID": "RxNorm_2020_08:1191" }, "strength": { "begin": 33, "end": 39, "type": "de.averbis.types.health.Strength", "coveredText": "100 mg", "id": 2424, "negatedBy": null, "unit": "mg", "matchedTerm": "100 MG", "dictCanon": "100 MG", "conceptID": "STR4", "normalizedUnit": "kg", "source": "RxNormStrength_2019071", "normalizedValue": 0.0001, "value": 100, "dimension": "[M]", "uniqueID": "RxNormStrength_2019071:STR4" } } ], "termTypes": null, "doseForm": { "begin": 46, "end": 49, "type": "de.averbis.types.health.DoseForm", "coveredText": "TAB", "id": 2427, "negatedBy": null, "matchedTerm": "Tabs", "dictCanon": "Oral tablet (qualifier value)", "conceptID": "SCT421026006", "source": "AverbisDoseForm_1.0", "uniqueID": "AverbisDoseForm_1.0:SCT421026006" }, "rateQuantity": "NaN", "doseFrequency": { "begin": 40, "end": 45, "type": "de.averbis.types.health.DayTimeDoseFrequency", "coveredText": "1-0-1", "id": 2428, "negatedBy": null, "midday": 0, "matchedTerm": null, "source": null, "totalCount": 2, "atNight": "NaN", "morning": 1, "totalDose": { "begin": 40, "end": 45, "type": "de.averbis.types.health.Measurement", "coveredText": "1-0-1", "id": 2429, "unit": "mg", "normalizedUnit": null, "normalizedValue": "NaN", "value": 200, "dimension": "[M]" }, "dictCanon": null, "conceptID": null, "interval": "daytime", "evening": 1, "uniqueID": null }, "status": "DISCHARGE" }, { "begin": 0, "end": 74, "type": "de.averbis.types.health.ClinicalSection", "coveredText": "Medication on discharge: Aspirin 100 mg 1-0-1 TAB from 01/01 to 01/30/2018", "id": 2418, "label": "DischargeMedication", "keyword": { "begin": 0, "end": 23, "type": "de.averbis.types.health.ClinicalSectionKeyword", "coveredText": "Medication on discharge", "id": 2419, "negatedBy": null, "matchedTerm": "Medication on discharge", "dictCanon": "Medication on discharge", "conceptID": "10183-2", "source": "clinical_sections_en", "uniqueID": "clinical_sections_en:10183-2" } }
Text Example 2: "Lisinopril 5 MG tablet Take 5 mg by mouth daily."
{ "begin": 0, "end": 47, "type": "de.averbis.types.health.Medication", "coveredText": "Lisinopril 5 MG tablet Take 5 mg by mouth daily", "id": 1812, "date": null, "administrations": [ "by mouth" ], "drugs": [ { "begin": 0, "end": 15, "type": "de.averbis.types.health.Drug", "coveredText": "Lisinopril 5 MG", "id": 1813, "ingredient": { "begin": 0, "end": 10, "type": "de.averbis.types.health.Ingredient", "coveredText": "Lisinopril", "id": 1814, "negatedBy": null, "matchedTerm": "Lisinopril", "dictCanon": "Lisinopril", "conceptID": "29046", "source": "RxNorm_2020_08", "uniqueID": "RxNorm_2020_08:29046" }, "strength": { "begin": 11, "end": 15, "type": "de.averbis.types.health.Strength", "coveredText": "5 MG", "id": 1815, "negatedBy": null, "unit": "mg", "matchedTerm": "5 MG", "dictCanon": "5 MG", "conceptID": "STR133", "normalizedUnit": "kg", "source": "RxNormStrength_2019071", "normalizedValue": 0.000005, "value": 5, "dimension": "[M]", "uniqueID": "RxNormStrength_2019071:STR133" } } ], "termTypes": null, "doseForm": { "begin": 16, "end": 22, "type": "de.averbis.types.health.DoseForm", "coveredText": "tablet", "id": 1818, "negatedBy": null, "matchedTerm": "Tablets", "dictCanon": "Tablet dose form (qualifier value)", "conceptID": "SCT385055001", "source": "AverbisDoseForm_1.0", "uniqueID": "AverbisDoseForm_1.0:SCT385055001" }, "rateQuantity": "NaN", "doseFrequency": { "begin": 42, "end": 47, "type": "de.averbis.types.health.TimeMeasurementDoseFrequency", "coveredText": "daily", "id": 1819, "negatedBy": null, "totalDose": { "begin": 42, "end": 47, "type": "de.averbis.types.health.Measurement", "coveredText": "daily", "id": 1820, "unit": "mg", "normalizedUnit": null, "normalizedValue": "NaN", "value": 5, "dimension": "[M]" }, "matchedTerm": "Daily", "dictCanon": "Daily (qualifier value)", "conceptID": "69620002", "interval": "1/day", "source": "DoseFrequency_1.0", "totalCount": 1, "uniqueID": "DoseFrequency_1.0:69620002" }, "status": null }
3.22. Medication Status
3.22.1. Description
The annotator recognizes the status of medications. Different status includes, for example, "INTENDED" or "FAMILY".
3.22.2. Input
Above this annotator, the following annotator must be included in the pipeline:
3.22.3. Output
This annotator sets the feature status
in annotations of type Medication
.
3.22.4. Web Service Example
Text Example: "A very good alternative, if the tumor is ER positive, is treatment with Tamoxifen."
{ "begin": 72, "end": 81, "type": "de.averbis.types.health.Medication", "coveredText": "Tamoxifen", "id": 1552, "date": null, "administrations": [], "drugs": [ { "begin": 72, "end": 81, "type": "de.averbis.types.health.Drug", "coveredText": "Tamoxifen", "id": 1553, "ingredient": { "begin": 72, "end": 81, "type": "de.averbis.types.health.Ingredient", "coveredText": "Tamoxifen", "id": 1554, "negatedBy": null, "matchedTerm": "Tamoxifen", "dictCanon": "Tamoxifen", "conceptID": "10324", "source": "RxNorm_2020_08", "uniqueID": "RxNorm_2020_08:10324" }, "strength": null } ], "termTypes": null, "doseForm": null, "rateQuantity": "NaN", "doseFrequency": null, "status": "CONSIDERED" }
3.23. Morphology
3.23.1. Description
This component detects morphologys. It is mainly used in pathology reports.
3.23.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.23.3. Output
The component creates annotations of type:
Annotation Type: de.averbis.types.health.Morphology
Attribute | Description | Type |
---|---|---|
| Preferred term of the Morphology. |
|
| Matching synonym of the Morphology. |
|
| Unique identifier of the Morphology of the format 'terminologyId:conceptID'. |
|
| The concept id. |
|
| The name of the terminology source. |
|
| Contains "true" if the concept is negated. |
|
confidence | The confidence feature denotes the probability of the annotation (Diagnosis/Morphology/Topography concept) to be valid, i.e. the higher the confidence, the closer to a valid annotation. | Double |
3.23.4. Terminology Binding
Country | Name | Version | Identifier | Comment |
---|---|---|---|---|
United States | ICD-O | 3.1 | ICD-O_3.1 | International Classification of Diseases for Oncology WHO edition, enriched with synonyms by Averbis. |
Germany | ICD-O-DE | 3.1 | ICD-O-DE_3.1 | International Classification of Diseases for Oncology German Edition, enriched with synonyms by Averbis. |
3.23.5. Web Service Example
Text Example: "Adenocarcinoma of the rectum"
{ "begin": 0, "end": 14, "type": "de.averbis.types.health.Morphology", "coveredText": "Adenocarcinoma", "id": 974, "negatedBy": null, "matchedTerm": "Adenocarcinoma", "dictCanon": "Adenocarcinoma, NOS", "confidence": 0, "conceptID": "8140/3", "source": "ICD-O-Morphology-EN_3.1", "uniqueID": "ICD-O-Morphology-EN_3.1:8140/3" }
3.24. Negations
3.24.1. Description
This component detects negated expressions. The negations are detected and assigned to concept annotations that are affected by these expressions. The negation detection component is optimized for medical texts.
3.24.2. Input
Above this annotator, the following annotators must be included in the pipeline:
- LanguageDetection or LanguageSetter
all components which include concept annotators like LabValues, Diagnoses, Medications, Morphology or Topography .
3.24.3. Output
This component sets the following internal type that is not visible in the annotation editor:
Annotation Type*: de.averbis.types.health.MedicalNegation
If a concept is successfully negated, the feature negatedBy
will be set to the corresponding negation term. If the DiagnosisStatus annotator is included behind it, the’verificationStatus' feature is additionally set to NEGATED.
3.24.4. Web Service Example
Text Example: "No Crohn’s disease"
{ "begin": 3, "end": 18, "type": "de.averbis.types.health.Diagnosis", "coveredText": "Crohn’s disease", "id": 832, "negatedBy": "No", "side": null, "matchedTerm": "Crohn's disease", "verificationStatus": "NEGATED", "kind": null, "confidence": 0, "onsetDate": null, "source": "ICD10CM_2021", "clinicalStatus": null, "approach": "DictionaryLookup", "laterality": null, "dictCanon": "Crohn's disease, unspecified, without complications", "conceptID": "K50.90", "belongsTo": null, "uniqueID": "ICD10CM_2021:K50.90" }
3.25. Ophthalmology
3.25.1. Description
This component detects indicators for the left and the right eye, the intraocular pressure, mentions of visual acuity and concepts concerning the field of ophthalmology.
3.25.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.25.3. Output
Annotation Type: de.averbis.types.health.Ophthalmology
Attribute | Description | Type |
---|---|---|
| Preferred term of Ophthalmology. |
|
| Matching synonym of Ophthalmology. |
|
| Unique identifier of Ophthalmology of the format 'terminologyId:conceptID'. |
|
| The concept id. |
|
| The name of the terminology source. |
|
| Specifies the negation word, if one exists. |
|
Annotation Type: de.averbis.types.health.Tensio
Attribute | Description | Type |
---|---|---|
| Tensio measurement of left eye. |
|
| Tensio measurement of right eye. |
|
Annotation Type: de.averbis.types.health.RelevantVisualAcuity
Best or actual visual acuity, selected from multiple VisualAcuity- or VisualAcuityValues.
Annotation Type: de.averbis.types.health.VisualAcuity
Attribute | Description | Type |
---|---|---|
| Left eye’s visual acuity. |
|
| Right eye’s visual acuity. |
|
Annotation Type: de.averbis.types.health.VisualAcuityValue
Attribute | Description | Type |
---|---|---|
| Normalized value of visual acuity. |
|
| Visual acuity measured with blackboard. |
|
| Normalized value of correction during measuring visual acuity. |
|
| The measured refraction. |
|
| Visual acuity measured with pin hole. |
|
| Kind of comment, e.g. "AR_NOT_POSSIBLE", "DOES_NOT_IMPROVE". |
|
Annotation Type: de.averbis.types.health.Refraction
Attribute | Description | Type |
---|---|---|
| The spheric value of the actual refraction. |
|
| The cylinder value of the actual refraction. |
|
| The axis value of the actual refraction. |
|
Annotation Type: de.averbis.types.health
.
VisualAcuityAdditionalInformation
Attribute | Description | Type |
---|---|---|
| Normalization of additional information on visual acuity, e.g. "AR_NOT_POSSIBLE" for "DOES_NOT_IMPROVE" |
|
3.25.4. Web Service Example
Text Example1 (Tensio): "Tensio RA 13 mmHg LA 14 mmHg"
{ "begin": 7, "end": 28, "type": "de.averbis.types.health.Tensio", "coveredText": "RA 13 mmHg LA 14 mmHg", "id": 1580, "rightEye": { "begin": 10, "end": 17, "type": "de.averbis.types.health.Measurement", "coveredText": "13 mmHg", "id": 1582, "unit": "mmHg", "normalizedUnit": "kg/(m·s²)", "normalizedValue": 1733.1860000000001, "value": 13, "dimension": "[M]/([L]·[T]²)" }, "leftEye": { "begin": 21, "end": 28, "type": "de.averbis.types.health.Measurement", "coveredText": "14 mmHg", "id": 1581, "unit": "mmHg", "normalizedUnit": "kg/(m·s²)", "normalizedValue": 1866.508, "value": 14, "dimension": "[M]/([L]·[T]²)" }
Text Example 2 (Visual Acuity): "Visus RA 0,16 (AR +1,0 -3,25 84) LA sc 1/35 (AR nicht möglich)"
{ "begin": 0, "end": 62, "type": "de.averbis.types.health.VisualAcuity", "coveredText": "Visus RA 0,16 (AR +1,0 -3,25 84) LA sc 1/35 (AR nicht möglich)", "id": 3043, "rightEye": { "begin": 9, "end": 32, "type": "de.averbis.types.health.VisualAcuityValue", "coveredText": "0,16 (AR +1,0 -3,25 84)", "id": 3046, "additionalInformation": null, "pinHole": false, "fact": "0.16", "refraction": { "begin": 14, "end": 32, "type": "de.averbis.types.health.Refraction", "coveredText": "(AR +1,0 -3,25 84)", "id": 3047, "sphere": 1, "cylinder": -3.25, "axis": 84 }, "meter": false, "correction": "AR" }, "leftEye": { "begin": 36, "end": 62, "type": "de.averbis.types.health.VisualAcuityValue", "coveredText": "sc 1/35 (AR nicht möglich)", "id": 3044, "additionalInformation": { "begin": 44, "end": 62, "type": "de.averbis.types.health.VisualAcuityAdditionalInformation", "coveredText": "(AR nicht möglich)", "id": 3045, "normalized": "AR_NOT_POSSIBLE" }, "pinHole": false, "fact": "1/35", "refraction": null, "meter": true, "correction": "SC" } }
Text Example 3 (Relevant Visual Acuity): "Visus RA 0,16 (AR +1,0 -3,25 84) LA sc 1/35 (AR nicht möglich)"
{ "begin": 0, "end": 62, "type": "de.averbis.types.health.RelevantVisualAcuity", "coveredText": "Visus RA 0,16 (AR +1,0 -3,25 84) LA sc 1/35 (AR nicht möglich)", "id": 3048, "rightEye": { "begin": 9, "end": 32, "type": "de.averbis.types.health.VisualAcuityValue", "coveredText": "0,16 (AR +1,0 -3,25 84)", "id": 3046, "additionalInformation": null, "pinHole": false, "fact": "0.16", "refraction": { "begin": 14, "end": 32, "type": "de.averbis.types.health.Refraction", "coveredText": "(AR +1,0 -3,25 84)", "id": 3047, "sphere": 1, "cylinder": -3.25, "axis": 84 }, "meter": false, "correction": "AR" }, "leftEye": { "begin": 36, "end": 62, "type": "de.averbis.types.health.VisualAcuityValue", "coveredText": "sc 1/35 (AR nicht möglich)", "id": 3044, "additionalInformation": { "begin": 44, "end": 62, "type": "de.averbis.types.health.VisualAcuityAdditionalInformation", "coveredText": "(AR nicht möglich)", "id": 3045, "normalized": "AR_NOT_POSSIBLE" }, "pinHole": false, "fact": "1/35", "refraction": null, "meter": true, "correction": "SC" } }
Text Example 4 (Ophthalmology): "Kataraktoperation"
{ "begin": 0, "end": 17, "type": "de.averbis.types.health.Ophthalmology", "coveredText": "Kataraktoperation", "id": 624, "negatedBy": null, "side": null, "matchedTerm": "Kataraktoperation", "dictCanon": "Katarakt-Operation", "conceptID": "110473004", "source": "Ophthalmologie_1.0", "uniqueID": "Ophthalmologie_1.0:110473004" },
3.26. Organizations
3.26.1. Description
This component detects types of organizations and gives correspondance information, e.g. if the organization is the sender of a clinical note. Please note: at present, this annotator is used exclusively to identify if the sender of a record is a German hospital and to assign a hospital type (e.g. university hospital, general hospital...) to this sender.
3.26.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.26.3. Output
Annotation Type: de.averbis.types.health.Organization
Attribute | Description | Type |
---|---|---|
organizationType
| Type of the organization. Possible values: general hospital | university hospital | specialist clinic | rehabilitation clinic | physician's admitting hospital | String |
correspondence | Contains "sender" if the organization is detected as sending party of the record.* | String |
*please note: at present, organizations are extracted only, if detected as "sender".
3.26.4. Terminology Binding
Country | Name | Version | Comment |
---|---|---|---|
DE | Hospital | 1.0 | List of german hospital names enriched with synonyms by Averbis. |
3.26.5. Web Service Example
Text Example (organizations): "Zentrum für Psychiatrie Emmendingen"
{ "begin": 0, "end": 35, "type": "de.averbis.types.health.Organization", "coveredText": "Zentrum für Psychiatrie Emmendingen", "id": 765, "organizationType": "specialist clinic", "correspondence": "sender" },
3.26.6. Departments
3.26.6.1. Description
This component is part of the Organizations Annotator and annotates medical departments in clinical notes, e.g. Paediatrics, Neurology, Orthodontics...
3.26.6.2. Input
See Annotator "Organizations"3.26.6.3. Output
Annotation Type:
de.averbis.types.health.Department
Attribute | Description | Type |
---|---|---|
| Preferred term of the department (concept) as defined in the terminology. | String |
| The term that matched to a department concept in the terminology. | String |
| The ID of the matched department concept in the terminology. | String |
departmentType | Additional information about the department type, currently limited to the information, if the department is the sending department of a clinical note. Note: there may be more than one sending department, e.g. "Division of Hematology and Oncology". Possible values (default is underlined): null | sender | String |
| The name of the terminology source. | String |
| Unique identifier of the department concept of the format 'terminologyId:conceptID'. | String |
negatedBy | Specifies the negation word, if one exists. | String |
3.26.6.4. Terminology Binding
Country | Languages | Version | Identifier | Comment |
---|---|---|---|---|
United States, Germany | EN, DE | 1.0 | Averbis-SpecialistDepartment_1.0 | Terminology of department names, composed by Averbis enriched with terms from SNOMED-CT. |
3.26.6.5. Web Service Examples
Example Text (departments): Service: "Division of Hematology and Oncology"
{ "begin": 0, "end": 22, "type": "de.averbis.types.health.Department", "coveredText": "Division of Hematology", "id": 923, "negatedBy": null, "matchedTerm": "Hematology", "dictCanon": "Haematology", "conceptID": "394916005", "source": "SpecialistDepartment_1.0", "departmentType": "sender", "uniqueID": "SpecialistDepartment_1.0:394916005" }, { "begin": 27, "end": 35, "type": "de.averbis.types.health.Department", "coveredText": "Oncology", "id": 924, "negatedBy": null, "matchedTerm": "Oncology", "dictCanon": "Medical oncology", "conceptID": "394593009", "source": "SpecialistDepartment_1.0", "departmentType": "sender", "uniqueID": "SpecialistDepartment_1.0:394593009" }
3.27. Pathology Documentation
3.27.1. Description
The component aggregates relevant information from pathology reports (e.g. diagnosis, topography, morhology, TNM, grading and others) into one annotation type. Hereby, the PathologyDocumentation Annotator determines a single valid value for each pathology feature per pathology report based on various rule-based and machine-learning based methods. This information can be used, for example, to encode reports in cancer registries.
3.27.2. Input
Above this annotator, the following annotators must be included in the pipeline:
- Laterality
- PatientInformation
- HealthMeasurements
- TNM
- Topography
- TumorStage
- Receptors
- Specimen
- DiagnosisStatus
3.27.3. Output
Annotation Type: de.averbis.types.health.PathologyDocumentation
Attribute | Description | Type |
---|---|---|
| The code of the relevant diagnosis based on ICD-10, e.g. "C50.9" for malignant neoplasm of breast of unspecified site. |
|
| Topography code based on ICD-O, e.g. "C50.9" |
|
| Morphology code based in ICD-O, e.g. "8520/3". |
|
| T-Value according to TNM classification. |
|
| N-Value according to TNM classification. |
|
| M-Value according to TNM classification. |
|
location
| Further specification of the metastasis: Pulmonary (PUL), Bone marrow (MAR), Osseous (OSS), Pleura (PLE), Hepatic (HEP), Peritoneum (PER), Brain (BRA), Adrenals (ADR), Lymph nodes (LYM), Skin (SKI), Others (OTH) | String |
| The grading of the tumor. Possible values (default is underlined): null | U | 1 | 2 | 3 | 4 | String |
resultDate
| Date of result | String |
| Residual classification (R-Classification) of the tumor. Possible values (default is underlined): null | Rx | R0 | R1 | R2 | String |
side
| The laterality of the pathology item, e.g. R,L,B,U | String |
lymphnodesTested
| The amount of tested lymph nodes. | Integer |
lymphnodesAffected
| The amount of affected lymph nodes. | Integer |
sentinelLymphnodesTested
| The amount of tested sentinel lymph nodes. | Integer |
sentinelLymphnodesAffected
| The amount of affected sentinel lymph nodes. | Integer |
category
| Classification of the pathology report by organ entity, e.g. "MAMMA", "COLON", "PROSTATE" | category |
thickness
| The thickness of the tumor. | thickness |
lymphaticInvasion
| Invasion of cancer cells in the lymphatic system. Possible Values (default is underlined): null, L0, L1. | String |
vascularInvasion
| Invasion of cancer cells in the blood vascular system. Possible Values (default is underlined): null, V0, V1, V2. | String |
perineuralInvasion
| Perineural Invasion of cancer cells. Possible Values (default is underlined): null, Pn0, Pn1 |
|
pathologyScores | Grouping of pathological scores | PathologyScore |
For the annotation types Diagnoses, Morphology and Topography the following features a referenced in the PathologyDocumentation type.
Attribute | Description | Type |
---|---|---|
dictCanon
| The preferred term of the Diagnosis/Morphology/Topography concept as defined in the terminology. | String |
matchedTerm
| Matching synonym of the section concept. | |
conceptID
| The conceptID. | String |
uniqueID
| Unique identifier of a concept of the format 'terminologyId:conceptId'. | String |
confidence
| The confidence feature denotes the probability of the annotation (Diagnosis/Morphology/Topography concept) to be valid, i.e. the higher the confidence, the closer to a valid annotation. | Double |
source
| The name of the terminology source. | String |
negatedBy
| Specifies the negation word, if one exists. | String |
For the tumor thickness the following features are referenced in the PathologyDocumentation type.
Attribute | Description | Type |
---|---|---|
value
| The value of the tumor thickness. | Double |
unit
| The unit of the measurement. By default, the tumor thickness is presented in millimeters (mm). | String |
NEW: Standard Feature
This type has now all standard features: 'begin' 'end'; 'type', 'coveredText' and 'id'
The category has the following features referenced in the PathologyDocumentation type:
Attribute | Description | Type |
---|---|---|
label
| The label of the category, e.g. MAMMA, PROSTATA, COLON, MELANOMA, BASALIOM | String |
confidence
| The confidence feature denotes the probability of the label to be valid. | Double |
NEW: Standard Feature
This type has now all standard features: 'begin' 'end'; 'type', 'coveredText' and 'id'
The pathologyScores has the following features referenced in the PathologyDocumentation type.
Attribute | Description | Type |
---|---|---|
gleasonScore | The Gleason grading system is used to determine the prognosis of men with prostate cancer using sample results from a prostate biopsy. | String |
3.27.4. Web Service Example
Text Example (german only): "Klinische Angaben: Mammakarzinom links, cT1a bei 0,5 cm Größe. Tumorklassifikation: TNM (7.Aufl.): pT1b, pN1 (1/1 sn), L0, V0, Pn0 Grading: G2, R-Klassifikation (lokal): R0, ICD-O: 8500/3, ICD-10: C50.9."
{ "begin": 0, "end": 203, "type": "de.averbis.types.health.PathologyDocumentation", "coveredText": "Klinische Angaben: Mammakarzinom links, cT1a bei 0,5 cm Größe. Tumorklassifikation: TNM (7.Aufl.): pT1b, pN1 (1/1 sn), L0, V0, Pn0 Grading: G2, R-Klassifikation (lokal): R0, ICD-O: 8500/3, ICD-10: C50.9.", "id": 6775, "side": "L", "grading": "2", "morphology": { "begin": 181, "end": 187, "type": "de.averbis.types.health.Morphology", "coveredText": "8500/3", "id": 6781, "negatedBy": null, "matchedTerm": "8500/3", "dictCanon": "Invasives duktales Karzinom", "confidence": 0.999591052532196, "conceptID": "8500/3", "source": "ICD-O-DE_3.1", "uniqueID": "ICD-O-DE_3.1:8500/3" }, "thickness": null, "topography": { "begin": 197, "end": 202, "type": "de.averbis.types.health.Topography", "coveredText": "C50.9", "id": 6779, "negatedBy": null, "matchedTerm": "C50.9", "dictCanon": "Brust", "confidence": 0.9736150503158569, "conceptID": "C50.9", "source": "ICD-O-DE_3.1", "uniqueID": "ICD-O-DE_3.1:C50.9" }, "diagnosis": { "begin": 0, "end": 203, "type": "de.averbis.types.health.Diagnosis", "coveredText": "Klinische Angaben: Mammakarzinom links, cT1a bei 0,5 cm Größe. Tumorklassifikation: TNM (7.Aufl.): pT1b, pN1 (1/1 sn), L0, V0, Pn0 Grading: G2, R-Klassifikation (lokal): R0, ICD-O: 8500/3, ICD-10: C50.9.", "id": 6778, "negatedBy": null, "side": null, "matchedTerm": null, "verificationStatus": null, "kind": null, "confidence": 0.9736150503158569, "onsetDate": null, "source": "ICD10GM_2021", "clinicalStatus": null, "approach": null, "laterality": null, "dictCanon": "Bösartige Neubildung: Brustdrüse, nicht näher bezeichnet", "conceptID": "C50.9", "belongsTo": null, "uniqueID": "ICD10GM_2021:C50.9" }, "lymphnodesTested": 1, "resultDate": null, "rClass": "R0", "sentinelLymphnodesTested": 1, "node": "pN1", "sentinelLymphnodesAffected": 1, "lymphnodesAffected": 1, "metastasis": "pMu", "tumor": "pT1b", "lymphaticInvasion": "L0", "vascularInvasion": "V0", "location": null, "category": { "begin": 0, "end": 203, "type": "de.averbis.types.health.TumorCategory", "coveredText": "Klinische Angaben: Mammakarzinom links, cT1a bei 0,5 cm Größe. Tumorklassifikation: TNM (7.Aufl.): pT1b, pN1 (1/1 sn), L0, V0, Pn0 Grading: G2, R-Klassifikation (lokal): R0, ICD-O: 8500/3, ICD-10: C50.9.", "id": 6776, "confidence": 0.9995457743445415, "label": "MAMMA" }, "perineuralInvasion": "Pn0" }
3.28. Patient Information
3.28.1. Description
With this component, different information about the patient shall be detected, such as admission and discharge dates, the gender of the patient and the information as to whether the patient is deceased. In addition, if a list of patient names was imported as terminology to Averbis Health Discovery, these patient names can be extracted, too.
3.28.2. Input
Above this annotator, the following annotators must be included in the pipeline:
The HealthPreprocessing pipeline block provides the prerequisite annotation types to ensure the proper functionality of this annotator.
3.28.3. Annotation of patient names
Patient names can only be extracted from clinical notes, if they exist as an entry in a terminology called "patientnames". Therefore, the following preparations are necessary to annotate patient names.
Step 1: Create a terminology in the "Terminology Administration" with the Terminology-ID "patientnames", Concept-Type “de.averbis.textanalysis.types.health.PatientNameConcept" and language "Miscellaneous". Label and Version can be set freely. See Create your own Terminology for more details on how to create a terminology.
Step 2: Import your list of patient names into the terminology using OBO-Format or enter the patient names manually into the terminology using the "Terminology Editor". In order to distinguish between first names and last names, the terms must follow the following syntax: Firstname[semicolon]Lastname, e.g. John;Doe.
Your OBO-file with patient names may look like:
[Term] id: 1 name: Sue;Miller [Term] id: 2 name: John;Doe ....
Step 3: View the results of your import/editing in the "Terminology Editor" to make sure everything worked out smoothly. The imported terminology/OBO-file should contain the patients' first and last name as preferred term. Synonyms do not need to be added.
Step 4: Switch to the "Terminology Administration" and submit the terminology to the text analytics module.
Step 5: Reuse an existing pipeline where "Patient Information" is included or create a pipeline and include the following annotators:
Step 6: (Re)Start the pipeline. After completing steps 1 through 5, the pipeline is now ready to annotate the imported patient names.
3.28.4. Output
Annotation Type:
de.averbis.types.health.PatientInformation
Attribute | Description | Type |
name
| Matching preferred term in the terminology "patientnames". | PatientName |
gender
| Gender of the patient. Possible values (default is underlined): null, female, male | String |
deathdate
| Deathdate of the patient. | Date |
deceased
| Information as to whether the patient is deceased. Possible values (default is underlined): false, true | Boolean |
Attribute | Description | Type |
firstName
| The first part (before the semicolon) of the matching preferred term in the terminology "patientnames". | String |
lastName
| The last part (after the semicolon) of the matching preferred term in the terminology "patientnames". | String |
Attribute | Description | Type |
Date
| Deathdate of the patient. Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17) | String |
Annotation Type: de.averbis.types.health.Hospitalisation
Attribute | Description | Type |
| Date of admission to hospital. Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17) | String |
| Date of discharge from hospital. Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17) |
|
3.28.5. Terminology Binding for patientnames
Country | Name | Version | Identifier | Comment |
All | <define your name> | <define your version> | patientnames | To annotate the patient's name, a terminology with ID "patientnames" has to be created and filled with an individual list of patient names, which should be annotated. See chapter Annotate Patient Names for more details. |
3.28.6. Web Service Example
Example text (Hopitalisation, PatientInformation): "We're reporting on the patient John Doe. He stayed in our hospital from 1/01/2018 until 2/01/2018."
{ "begin": 72, "end": 97, "type": "de.averbis.types.health.Hospitalisation", "coveredText": "1/01/2018 until 2/01/2018", "id": 1648, "admissionDate": "2018-01-01", "dischargeDate": "2018-02-01" }, { "begin": 0, "end": 98, "type": "de.averbis.types.health.PatientInformation", "coveredText": "We're reporting on the patient John Doe. He stayed in our hospital from 1/01/2018 until 2/01/2018.", "id": 1650, "firstName": null, "lastName": null, "deceased": false, "gender": "male", "deathDate": null }
Example text (death date): "The patient died on 2/01/2018 in the course of a multiorgan failure."
{ "begin": 0, "end": 68, "type": "de.averbis.types.health.PatientInformation", "coveredText": "The patient died on 2/01/2018 in the course of a multiorgan failure.", "id": 1285, "firstName": null, "lastName": null, "deceased": true, "gender": null, "deathDate": "2018-02-01" }
3.29. PHI
3.29.1. Description
Protected health information (PHI), also referred to as personal health information, generally refers to demographic information, medical histories, test and laboratory results, mental health conditions, insurance information, and other data that a healthcare professional collects to identify an individual and determine appropriate care.
This component identifies protected health information like names, dates, locations, IDs, contact information, professions and others.
3.29.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.29.3. Output
Annotation Type: de.averbis.types.health.Age
Age mentioned in the document.
Annotation Type: de.averbis.types.health.Date
All dates in the document.
Annotation Type: de.averbis.types.health.Name
Attribute | Description | Type |
| possible values: PATIENT | DOCTOR | OTHER |
|
Annotation Type: de.averbis.types.health.Location
Attribute | Description | Type |
| possible values: STREET | CITY | ZIP | COUNTRY (country or nationality) | STATE | HOSPITAL | ORGANIZATION (Other organizations beside the hospital, e.g. the employer ot the patient) | OTHER |
|
Annotation Type: de.averbis.types.health.Id
Attribute | Description | Type |
| possible values: PATIENTID | OTHER |
|
Annotation Type: de.averbis.types.health.Contact
Attribute | Description | Type |
| possible values: PHONE | FAX | URL | EMAIL |
|
Annotation Type: de.averbis.types.health.Profession
The profession of the patient or his relatives.
Annotation Type: de.averbis.types.health.PHIOther
Other protected health informations.
Annotation Type: de.averbis.types.health.DeidentifiedDocument
This type returns the deidentified text. By default it replaces the recognized PHI concepts with "X".
Attribute | Description | Type |
| Defines the deidentification method. E.g. "crossout" replaces the PHI concepts with "X" while "tag" replaces the PHI information with a tag such as <DATE/>, <CITY/> etc. Possible values (default is underlined): crossout | tag Please note: the kind can be set via parameter within the PHIDeidentifier compound of the PHI Annotator. |
|
kind
| the deidentified text | String |
3.29.4. Configuration
The PHI module consists of two compounds which can be configured to influence the output.
3.29.4.1. Configuration of PHIAnnotator
The PHI Annotator is provided with configuration parameters, as shown in the diagram below. Please keep the default values of the configuration parameters and not to make any changes.
3.29.4.2. Configuration of PHIDeidentifier
The PHI Deidentifier compound is provided with a parameter called "deidentificationMethod".
The default value for the deidendifiation method is "crossout". The second method is named "tag", by which identified PHIs are replaced by a tag name.
3.29.5. Web Service Example
Text Example for Deidentification: "Mr. Jim Jack was born on 25/10/1990 in Boston."
Method: crossout
{ "begin": 0, "end": 46, "type": "de.averbis.types.health.DeidentifiedDocument", "coveredText": "Mr. Jim Jack was born on 25/10/1990 in Boston.", "id": 5133, "kind": "crossout", "deidentifiedText": "Mr. XXX XXXX was born on XXXXXXXXXX in XXXXXX." }
Method: tag
{ "begin": 0, "end": 46, "type": "de.averbis.types.health.DeidentifiedDocument", "coveredText": "Mr. Jim Jack was born on 25/10/1990 in Boston.", "id": 4933, "kind": "tag", "deidentifiedText": "Mr. <NAME/> was born on <DATE/> in <LOCATION/>." }
Text Example for further PHI-Types:
"Universitätsklinik Denzlingen, Abteilung Innere Medizin, Elzstraße 165, 54679 Denzlingen, Telefon: +49 6789 - 1234 00, Telefax: +49 6789 - 1234 01, IBAN DE89 4568 2145 5698 4565 12, http://www.uniklinik-denzlingen.de
An:
Dr. med. Markus Bernoulli
Philipp-Furtwängler-Straße 89
32568 Waldkirch
betrifft Patienten:
Hr. Fieseler, Benjamin
geboren am 5.10.53
Libellenallee 34
65432 Reute
Denzlingen, den 28. Januar 2016
Sehr geehrter Kollege,
Wir bedanken uns für die Überweisung von Herrn Benjamin Fieseler (PatientenID: 123456789) zur Entnahme der Schilddrüse und berichten Ihnen nachfolgend über dessen Aufenthalt in unserem Hause."
{ "begin": 164, "end": 182, "type": "de.averbis.types.health.Contact", "coveredText": "+49 6789 - 1234 00", "id": 61978, "kind": "PHONE" }, { "begin": 211, "end": 245, "type": "de.averbis.types.health.Contact", "coveredText": "http://www.uniklinik-denzlingen.de", "id": 62076, "kind": "URL" }, { "begin": 192, "end": 210, "type": "de.averbis.types.health.Contact", "coveredText": "+49 6789 - 1234 01", "id": 62032, "kind": "FAX" }, { "begin": 382, "end": 389, "type": "de.averbis.types.health.Date", "coveredText": "5.10.53", "id": 43922 }, { "begin": 437, "end": 452, "type": "de.averbis.types.health.Date", "coveredText": "28. Januar 2016", "id": 43936 }, { "begin": 558, "end": 567, "type": ""de.averbis.types.health.ID", "coveredText": "123456789", "id": 44649 "kind": "PATIENTID" }, { "begin": 0, "end": 29, "type": "de.averbis.types.health.Location", "coveredText": "Universitätsklinik Denzlingen", "id": 67425, "kind": "HOSPITAL" }, { "begin": 123, "end": 136, "type": "de.averbis.types.health.Location", "coveredText": "Elzstraße 165", "id": 69052, "kind": "STREET" }, { "begin": 138, "end": 143, "type": "de.averbis.types.health.Location", "coveredText": "54679", "id": 70568, "kind": "ZIP" }, { "begin": 144, "end": 154, "type": "de.averbis.types.health.Location", "coveredText": "Denzlingen", "id": 70772, "kind": "CITY" }, { "begin": 280, "end": 309, "type": "de.averbis.types.health.Location", "coveredText": "Philipp-Furtwängler-Straße 89", "id": 69194, "kind": "STREET" }, { "begin": 310, "end": 315, "type": "de.averbis.types.health.Location", "coveredText": "32568", "id": 70598, "kind": "ZIP" }, { "begin": 316, "end": 325, "type": "de.averbis.types.health.Location", "coveredText": "Waldkirch", "id": 70790, "kind": "CITY" }, { "begin": 390, "end": 406, "type": "de.averbis.types.health.Location", "coveredText": "Libellenallee 34", "id": 69280, "kind": "STREET" }, { "begin": 407, "end": 412, "type": "de.averbis.types.health.Location", "coveredText": "65432", "id": 70612, "kind": "ZIP" }, { "begin": 413, "end": 418, "type": "de.averbis.types.health.Location", "coveredText": "Reute", "id": 70804, "kind": "CITY" }, { "begin": 421, "end": 431, "type": "de.averbis.types.health.Location", "coveredText": "Denzlingen", "id": 70818, "kind": "CITY" }, { "begin": 263, "end": 279, "type": "de.averbis.types.health.Name", "coveredText": "Markus Bernoulli", "id": 65723, "kind": "DOCTOR" }, { "begin": 352, "end": 370, "type": "de.averbis.types.health.Name", "coveredText": "Fieseler, Benjamin", "id": 65785, "kind": "PATIENT" }
3.30. Physical Therapies
3.30.1. Description
The component annotates physical therapies (e.g. cryotherapy, occupational therapy) from clinical notes.
3.30.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.30.3. Output
Annotation Type: de.averbis.types.health.PhysicalTherapy
Attribute | Description | Type |
---|---|---|
dictCanon
| Preferred term of the physical therapy. | String |
matchedTerm
| The matching synonym of the physical therapy. | String |
| Unique identifier of a concept of the format 'terminologyId:conceptID'. | String |
| The concept id of the physical therapy. | String |
| The identifier of the terminology. | String |
| Specifies the negation word, if one exists. | String |
| Describes the status of the physical therapie. Possible values (default is underlined): null | PLANNED | CANCELED | COMPLETED | NEGATED | String |
3.30.4. Terminology Binding
Country | Name | Version | Identifier | Comment |
---|---|---|---|---|
EN,DE | Averbis -Therapy | 1.0 | Averbis-Therapy_1.0 | Averbis' own multilingual terminology for physical and related therapies. |
3.30.5. Web Service Example
Text Example: "Ergotherapy terminated at patient's request."
{ "begin": 0, "end": 11, "type": "de.averbis.types.health.PhysicalTherapy", "coveredText": "Ergotherapy", "id": 949, "negatedBy": null, "matchedTerm": "ergotherapy", "dictCanon": "ergotherapy", "conceptID": "PT000003", "source": "AverbisTherapies_1.0", "uniqueID": "AverbisTherapies_1.0:PT000003", "status": "CANCELED" },
3.31. Procedures
3.31.1. Description
The component annotates surgical procedures from clinical notes. Optional: As an additional annotation to the procedures component ProcedureCandidate can be visualized too. This component can be optionally activated which specifically detect procedures candidates to optimize DRG coding.
This component is currently only available in English.
3.31.2. Input
Above this annotator, the following annotators must be included in the pipeline:
To get the full functionality, the following annotators should also be included below this annotator in the given order:
3.31.3. Output
Annotation Type: de.averbis.types.health.Procedure
Attribute | Description | Type |
---|---|---|
| Preferred term of the procedure. |
|
| The matching synonym of the procedure. |
|
| Unique identifier of a concept of the format 'terminologyId:conceptID'. |
|
| The concept id of the procedure. |
|
| The identifier of the terminology. |
|
| Specifies the negation word, if one exists. |
|
| Describes the status of the procedure. Possible values (default is underlined): null | PLANNED | CANCELED | COMPLETED | NEGATED |
|
| The laterality of the procedure. Possible values (default is underlined): null | RIGHT | LEFT | BOTH |
|
| The date of the procedure. Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17) |
|
Annotation Type (optional): de.averbis.types.health.ProcedureCandidate
Attribute | Description | Type |
---|---|---|
| Preferred term of the procedure. |
|
| The concept id of the procedure. |
|
approach | Information about the text mining approach used to generate the annotation. Possible values: DictionaryLookup | SimilarityMatching | DocumentClassification | DerivedByLabValue |
|
confidence | For approaches using machine learning (e.g. "DocumentClassification"), the confidence is calculated that the respective annotation has been correctly generated. Possible value range: 0-1 Note: Annotations generated with non-machine learning approaches such as terminology mappings (approach = "DictionaryLookup") are reflected with a confidence value of 0. |
|
3.31.4. Terminology Binding
Country | Name | Version | Identifier | Comment |
---|---|---|---|---|
United States | SNOMED-CT-US | 2018-09-01 | SNOMED-CT-US_2018-09-01 | The SNOMED CT United States (US) Edition, subtree of concept "387713003 Surgical procedure (procedure)" |
3.31.5. Web Service Example
Text example for Procedure: "Hysterectomy was not performed due to hypertension of the patient."
{ "begin": 0, "end": 12, "type": "de.averbis.types.health.Procedure", "coveredText": "Hysterectomy", "id": 1052, "date": null, "negatedBy": null, "side": null, "matchedTerm": "Hysterectomy", "dictCanon": "Hysterectomy (procedure)", "conceptID": "236886002", "source": "SNOMED-CT-Procedures_20180901", "uniqueID": "SNOMED-CT-Procedures_20180901:236886002", "status": "CANCELED" }
Text example for ProcedureCandidate: "Hysterectomy was not performed due to hypertension of the patient."
{ "begin": 0, "end": 12, "type": "de.averbis.types.health.ProcedureCandidate", "coveredText": "Hysterectomy", "id": 1052, "dictCanon": "Hysterectomy (procedureCandidate)", "conceptID": "236886002", "approach": "DocumentClassification", "confidence": 0, }
3.32. Receptors
3.32.1. Description
This component detects different receptors important in oncology. Currently the annotation is restricted to the receptors HER2 (human epidermal growth factor receptor 2), Progesterone and Estrogen. Additionally, an interpretation and the percentage will be extracted, if available.
3.32.2. Input
Before this annotator, the following annotator must be included in the pipeline:
3.32.3. Output
Annotation Type:
de.averbis.types.health.HER2
Attribute | Description | Type |
---|---|---|
| The status of the HER2 expression. Possible values (default is underlined): -,+,++,+++ |
|
| The percentage represented as a measurement. |
|
Annotation Type:
de.averbis.types.health.EstrogenReceptor
Attribute | Description | Type |
---|---|---|
| The status of the Estrogen expression. Possible values (default is underlined): -,+,++,+++ |
|
| The percentage represented as a measurement. |
|
Annotation Type:
de.averbis.types.health.ProgesteroneReceptor
Attribute | Description | Type |
---|---|---|
| The status of the Progesterone expression. Possible values (default is underlined): -,+,++,+++ |
|
| The percentage represented as a measurement. |
|
3.32.4. Web Service Example
Example Text (Her2): "Her2 positive (45%)"
{ "begin": 0, "end": 18, "type": "de.averbis.types.health.HER2", "coveredText": "Her2 positive (45%)", "id": 1119, "percentage": { "begin": 15, "end": 18, "type": "de.averbis.types.health.Measurement", "coveredText": "45%", "id": 1120, "unit": "%", "normalizedUnit": "", "normalizedValue": 0.45, "value": 45, "dimension": "" }, "status": "++" }
Example Text (Progesterone): "PROGESTERONE RECEPTOR: POSITIVE (10%)"
{ "begin": 0, "end": 36, "type": "de.averbis.types.health.ProgesteroneReceptor", "coveredText": "PROGESTERONE RECEPTOR: POSITIVE (10%", "id": 1222, "percentage": { "begin": 33, "end": 36, "type": "de.averbis.types.health.Measurement", "coveredText": "10%", "id": 1223, "unit": "%", "normalizedUnit": "", "normalizedValue": 0.1, "value": 10, "dimension": "" }, "status": "++" }
3.33. RutaEngine
3.33.1. Description
The RutaEngine is a generic annotator which interprets and executes a rule-based scripting language for Apache UIMA, called UIMA Ruta. Due to its generic nature, the annotator is able create and modify all available types of annotations.
Detailed documentation on the use of Ruta can be found at the Apache UIMA official manual.
3.33.2. Input
This general RutaEngine annotator does not expect any annotations.
If special characters are to be annotated - before this annotator, the following annotator must be included in the pipeline:
In addition, the typesystem "de.averbis.textanalysis.typesystems.AverbisTypeSystem" should be imported as well (see figure below).
3.33.3. Output
The Entity type is described here as a exemplary and recommended placeholder for possible types of annotations that are created by this annotator. Entity is a generic type which semantics are specified by its features label and value.
Annotation Type: de.averbis.extraction.types.Entity
Attribute | Description | Type |
---|---|---|
| This feature provides the text of the annotated mention. |
|
| The type of the entity; e.g., PERSON, LOCATION etc. |
|
3.33.4. Configuration
Name | Description | Type | MultiValued | Mandatory |
---|---|---|---|---|
| A String parameter representing the rule that should be applied by the analysis engine. If set, it replaces the content of file specified by the mainScript parameter. |
|
|
|
3.33.5. Web Service Example
Example Ruta Script:
"pack years|py|pack year" -> Keyword; (n:NUM k:Keyword){-> CREATE(Entity, "label" = k.ct, "value" = n.ct)};
Example Text (Ruta Script applied): "40 pack years"
{ "begin": 0, "end": 12, "type": "de.averbis.types.health.Entity", "coveredText": "40 pack years", "id": 626, "label": "pack years", "value": "40" }
3.34. Specimen
3.34.1. Description
During surgical procedures or biospies, tissue or fluid samples are often taken. In pathological reports, these samples or so-called specimens are often listed and described individually. This annotator extracts specimen information in pathology reports. The specimen annotator is a so-called complex annotator and contains several individual annotators such as Morphology, Topography and Diagnoses.
3.34.2. Input
Before this annotator, the following annotator must be included in the pipeline:
3.34.3. Output
Annotation Type:
de.averbis.types.health.Specimen
Attribute | Description | Type |
---|---|---|
| Identifier of the specimen. Possible values: SINGLE (if only one specimen identified) or Number of Specimen (e.g. 1, 2, 3....) |
|
| The morphology of the specimen, according to ICD-O (see Morphology for more details). |
|
topography
| The topography of the specimen, according to ICD-O. (see Topography for more details). | Topography |
diagnosis
| Diagnosis information of the specimen (see Diagnoses for more details). | Diagnosis |
laterality
| Laterality of the specimen. Possible values (default is underlined): null | RIGHT | LEFT | BOTH | String |
descriptions
| Description text, if available. | StringArray |
3.34.4. Web Service Example
Example Text: "SKIN, LEFT FOREARM (BIOPSY): BASAL CELL CARCINOMA EXTENDING TO THE EXCISION MARGINS"
{ "begin": 0, "end": 83, "type": "de.averbis.types.health.Specimen", "coveredText": "SKIN, LEFT FOREARM (BIOPSY): BASAL CELL CARCINOMA EXTENDING TO THE EXCISION MARGINS", "id": 1444, "identifier": "SINGLE", "morphology": { "begin": 29, "end": 49, "type": "de.averbis.types.health.Morphology", "coveredText": "BASAL CELL CARCINOMA", "id": 1443, "negatedBy": null, "matchedTerm": "Basal cell carcinoma", "dictCanon": "Basal cell carcinoma, NOS", "confidence": 0, "conceptID": "8090/3", "source": "ICD-O-Morphology-EN_3.1", "uniqueID": "ICD-O-Morphology-EN_3.1:8090/3" }, "topography": { "begin": 0, "end": 18, "type": "de.averbis.types.health.Topography", "coveredText": "SKIN, LEFT FOREARM", "id": 1446, "negatedBy": null, "matchedTerm": "Skin left forearm", "dictCanon": "Skin of upper limb and shoulder", "confidence": 0, "conceptID": "C44.6", "source": "ICD-O-Topography-EN_3.1", "uniqueID": "ICD-O-Topography-EN_3.1:C44.6" }, "diagnosis": { "begin": 0, "end": 83, "type": "de.averbis.types.health.Diagnosis", "coveredText": "SKIN, LEFT FOREARM (BIOPSY): BASAL CELL CARCINOMA EXTENDING TO THE EXCISION MARGINS", "id": 1445, "negatedBy": null, "side": null, "matchedTerm": null, "verificationStatus": null, "kind": null, "confidence": 0, "onsetDate": null, "source": "ICD10CM_2021", "clinicalStatus": null, "approach": null, "laterality": null, "dictCanon": "Basal cell carcinoma of skin of left upper limb, including shoulder", "conceptID": "C44.619", "belongsTo": null, "uniqueID": "ICD10CM_2021:C44.619" }, "descriptions": null, "laterality": "LEFT" }
3.35. TNM
3.35.1. Description
This component detects and annotates abbreviated notations and free-text remarks of the TNM classification. It is able to identify the tumor (T), node (N), metastasis (M), the grading of tumor.
3.35.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.35.3. Output
The following TNM types exist:
Attribute | Description | Type |
---|---|---|
| The tumor annotations. |
|
| The node annotations. |
|
| The metastasis annotations. |
|
| The grading annotations. |
|
LymphaticInvasion
| The annotation of lymphatic invasion. | TNMAdditional |
VascularInvasion
| The annotation of vascular invasion. | TNMAdditional |
PerineuralInvasion
| The annotation of perineural invasion. | TNMAdditional |
The TNM types "TNMTumor
", "TNMNode
", "TNMMetastasis
" and "TNMGrading
" have the following attribute:
Attribute | Description | Type |
---|---|---|
| The value of the TNM classification or grading. |
|
The TNM type "TNMAdditional
" has the following attributes:
Attribute | Description | Type |
---|---|---|
label
| The label of the additional TNM classification. Possible values are: LymphaticInvasion, VascularInvasion, PerineuralInvasion | String |
| The value of L,V or Pn. Possible values are (default is underlined):
|
|
3.35.4. Web Service Example
Text Example: "pTis, N0, Mx, G3, L1, V0, Pn0"
{ "begin": 18, "end": 20, "type": "de.averbis.types.health.TNMAdditional", "coveredText": "L1", "id": 3995, "label": "LymphaticInvasion", "value": "L1" }, { "begin": 22, "end": 24, "type": "de.averbis.types.health.TNMAdditional", "coveredText": "V0", "id": 4003, "label": "VascularInvasion", "value": "V0" }, { "begin": 26, "end": 29, "type": "de.averbis.types.health.TNMAdditional", "coveredText": "Pn0", "id": 4011, "label": "PerineuralInvasion", "value": "Pn0" }, { "begin": 14, "end": 16, "type": "de.averbis.types.health.TNMGrading", "coveredText": "G3", "id": 1903, "value": "G3" }, { "begin": 10, "end": 12, "type": "de.averbis.types.health.TNMMetastasis", "coveredText": "Mx", "id": 1904, "value": "Mx" }, { "begin": 6, "end": 8, "type": "de.averbis.types.health.TNMNode", "coveredText": "N0", "id": 1905, "value": "N0" }, { "begin": 0, "end": 4, "type": "de.averbis.types.health.TNMTumor", "coveredText": "pTis", "id": 1906, "value": "Tis" }
3.36. TumorStage
3.36.1. Description
The annotator extracts stage information concerning tumors like 'end-stage' or 'stage II-B'.
3.36.2. Input
3.36.3. Output
Annotation Type:
de.averbis.types.health.TumorStage
Attribute | Description | Type |
---|---|---|
| Numeric value of the tumor stage Possible values: NULL, 1,2,3,4 |
|
| Modifier of the tumor stage, e.g. a, b, c |
|
3.36.4. Web Service Example
Example Text: "Mamma Ca stage IIa"
{ "begin": 9, "end": 18, "type": "de.averbis.types.health.TumorStage", "coveredText": "stage IIa", "id": 931, "stage": "2", "modifier": "a" }
3.37. Topography
3.37.1. Description
This component detects topography. It is mainly used in pathology reports.
3.37.2. Input
Above this annotator, the following annotators must be included in the pipeline:
3.37.3. Output
Annotation Type: de.averbis.types.health.Topography
Attribute | Description | Type |
---|---|---|
| Preferred term of the Topography. |
|
| Matching synonym of the Topography. |
|
| Unique identifier of the Topography of the format 'terminologyId:conceptID'. |
|
| The concept id. |
|
| The name of the terminology source. |
|
| Specifies the negation word, if one exists. |
|
confidence | The confidence feature denotes the probability of the annotation (Diagnosis/Morphology/Topography concept) to be valid, i.e. the higher the confidence, the closer to a valid annotation. | Double |
3.37.4. Terminology Binding
Country | Name | Version | Identifier | Comment |
---|---|---|---|---|
United States | ICD-O | 3.1 | ICD-O_3.1 | International Classification of Diseases for Oncology WHO edition, enriched with synonyms by Averbis. |
Germany | ICD-O-DE | 3.1 | ICD-O-DE_3.1 | International Classification of Diseases for Oncology German Edition, enriched with synonyms by Averbis. |
3.37.5. Web Service Example
Text Example: "Adenocarcinoma of the Rectum"
{ "begin": 22, "end": 28, "type": "de.averbis.types.health.Topography", "coveredText": "Rectum", "id": 981, "negatedBy": null, "matchedTerm": "Rectum", "dictCanon": "Rectum, NOS", "confidence": 0, "conceptID": "C20.9", "source": "ICD-O-Topography-EN_3.1", "uniqueID": "ICD-O-Topography-EN_3.1:C20.9" }
3.38. WordlistAnnotator
3.38.1. Description
The WordlistAnnotator allows users to directly embed simple wordlists into pipelines. It identifies words from the wordlist in texts and creates an annotation of type Entity. Optionally, a 'label' and a 'value' can be specified in columns 2 and 3 of the wordlist to fill the corresponding attributes of type Entity (see example below).
3.38.2. Input
Above this annotator, the following annotator must be included in the pipeline:
3.38.3. Configuration
Name | Description | Type | MultiValued | Mandatory |
---|---|---|---|---|
| The separator of different terms in the wordlist, separating the searched term from its features. | String | false | true |
| Option to ignore the case of the terms in the wordlist. Possible values (default is underlined): ACTIVE | INACTIVE | boolean | false | true |
| Option to filter matches that are part of a longer match. Example: 'diabetes mellitus' but not 'diabetes'. Possible values (default is underlined): ACTIVE | INACTIVE | boolean | false | true |
| The wordlist (dictionary) content. The first line contains the complete package name of type Entity. If columns 2 and 3 are filled, line 1 has to be filled with the attribute names 'label' and 'value'. The remaining lines contain the words of the wordlist (column 1) and optionally 'label' and 'value' values (columns 2 and 3). Example Wordlist: de.averbis.extraction.types.Entity;label;value Lip;Organ;C00 Tongue;Organ;C01 | String | false | false |
3.38.4. Output
The annotator creates an annotation of type Entity.
Exemplary Annotation Type: de.averbis.extraction.types.Entity
Attribute | Description | Type |
---|---|---|
label
| Represents the string in the feature "label" of the matched term in the wordlist. | String |
value
| Represents the string in the feature "value" of the matched term in the wordlist. | String |
3.38.5. WebService Example
Text Example: "The lip"
{ "begin": 4, "end": 7, "type": "de.averbis.types.health.Entity", "coveredText": "lip", "id": 306, "label": "Organ", "value": "C00", }
3.39. AnnotationMapper
3.39.1. Description
The AnnotationMapper is an extension of WordlistAnnotator and allows the user to perform simple annotation mapping based on the predefined configuration parameter “mappingList”. The searched term and its features should be separated by a “delimiter”. Annotations of the type given by the parameter “sourceType” are investigated. If the value of their feature “sourceFeature” is present in the “mappingList”, then a new annotation of the type “targetType” is created and its features specified in “targetFeatures” are filled with values defined in the remaining columns (in the same order).
3.39.2. Input
The component expects the annotations of the types specified by the user in the configuration parameters “sourceType” and optionally “sourceFeature”.
Above this annotator, the following annotator must be included in the pipeline:
3.39.3. Configuration
Name | Description | Type | MultiValued | Mandatory |
---|---|---|---|---|
| The separator of different terms in the wordlist, separating the searched term from its features. | String | false | true |
| The dictionary content specifying the mapping. The first column defines the source feature value. The remaining columns specify the target feature values. | String | false | false |
| The feature name for the source annotations. Annotations with the features values given in mappingList will be mapped. | String | false | true |
| The type name of the source annotations. | String | false | true |
targetFeature | The feature names for the target annotations. These features will be filled for the newly created annotations according to the mappingList | String | true | true |
targetType | The type name of the target annotations | String | false | true |
ignoreCase | Option to ignore the case of terms in the dictionary | Boolean | false | true |
ignorePattern | A regular expression for text occurrences that should be ignored by the dictionary lookup. | String | false | true |
3.39.4. Output
The component creates concepts of the types which have been set in the configuration parameter “targetType”.
Exemplary Annotation Type: de.averbis.extraction.types.Concept
Attribute | Description | Type |
---|---|---|
conceptID | Represents the string in the feature "label" of the matched term in the wordlist. | String |
3.39.5. WebService Example
Text Example: "Headache"
{ "begin": 4, "end": 39, "type": "de.averbis.extraction.types.Concept", "coveredText": "Headache", "id": 817, "conceptID": "M30.1" }
4. Available Text Mining Pipelines
The respective components are described in detail in Annotators.
4.1. deid Pipeline
4.1.1. Description
This experimental pipeline identifies protected health information (PHI) like names, dates, locations, IDs, contact information, professions and others. The resulting annotations can be used for deidentification procedures.
4.1.2. Components
The following components are part of the deid pipeline:
4.2. Discharge Pipeline
4.2.1. Description
This pipeline extracts the basic medical information in physician letters. Since these letters mainly originate when patients are discharged from the hospital or transferred to another doctor, they are called discharge letters. After some preprocessing, this pipeline annotates information concerning diagnoses, laboratory values and medications. The resulting annotations undergo a postprocessing considering enumerations, negations, disambiguity and possible status.
4.2.2. Components
The following components are part of the discharge pipeline:
- Laterality
- PatientInformation
- Organizations
- PhysicalTherapies
- LabValues
- Diagnoses
- Procedures
- Medications
- Enumerations
- Negations
- DiagnosisStatus
- Procedure
- Disambiguation
- MedicationStatus
- HealthPostprocessing
4.3. Ophthalmology Pipeline
4.3.1. Description
This pipeline extracts information concerning diagnoses, laboratory values, medications, negations, visual acuity, tensio and further information in the field of ophthalmology.
4.3.2. Components
The following components are part of the discharge pipeline:
- Laterality
- PatientInformation
- LabValues
- Diagnoses
- Medications
- Ophthalmology
- Enumerations
- Negations
- DiagnosisStatus
- Disambiguation
- MedicationStatus
- HealthPostprocessing
4.4. Pathology Pipeline
4.4.1. Description
This pipeline extracts information from pathology reports. After some preprocessing, this pipeline annotates information concerning diagnoses, morphology, topography and TNM classification.
4.4.2. Components
The following components are part of the discharge pipeline:
- LanguageDetection
- Laterality
- PatientInformation
- TNM
- LabValues
- Diagnoses
- Topography
- Morphology
- GleasonScore
- Enumerations
- Negations
- TumorStage
- Receptors
- Specimen
- DiagnosisStatus
- Disambiguation
- PathologyDocumentationClassification
- HealthPostprocessing
4.5. Transplantation Pipeline
4.5.1. Description
This pipeline extracts information concerning diagnoses, laboratory values, medications, graft-versus-host-disease, conditioning regimens and negations from physician letters after a transplantation,
4.5.2. Components
The following components are part of the discharge pipeline:
- Laterality
- PatientInformation
- LabValues
- Diagnoses
- Enumerations
- Negations
- DiagnosisStatus
- Disambiguation
- MedicationStatus
- HLA
- HealthPostprocessing
5. GUI Overview
5.1. Welcome Screen
Users with administration rights can create new users and projects. When these users are logged in, they can see the "Project administration" and "User administration" areas.
5.2. Project administration
In the project administration area, you first see a list with all projects that are currently available in the system.
Name: name of the project. The name also functions as a link to the corresponding project. The link goes to the project’s overview page.
Description: description of the project.
Operations | Edit project: this allows you to modify the name and the description of the project.
Operations | Delete project: this allows you to delete a project.
Below the table is a button that you can use to create a new project.
5.3. User administration
In the user administration area, you first see a list with all local user accounts that are currently available in the system. This list can be filtered using the text box on the top left.
Username: the user’s login name.
Lastname: the user’s last name.
Firstname: the user’s first name.
Email: the user’s email address.
Blocked: if a user is temporarily blocked, a padlock icon is displayed here.
Administrator: if the user is an administrator, a checkmark is displayed here.
- Local Account: Indicates if this user is a local user.
Operations | Rights: using this button you can see an overview of the rights that the user currently has. Rights cannot be edited here. Editing rights is done using the corresponding button in each project.
Operations | Edit: in the Edit dialog, you can edit the user profile data (firstname, lastname, email address). You can also use this dialog to block a user.
Operations | Change password: this allows you to enter a new user password.
Operations | Delete user: this allows you to delete the user.
Below the table is a button that you can use to create a new user.
5.3.1. Add and/or edit users
Use the 'Create new user' or 'Edit user' button to open a dialog and edit the user’s metadata.
In addition to editing the profile metadata, you can also assign an initial password when creating the user (to edit the password of an already existing user, please use the corresponding 'Change password' button in the user administration overview table).
You can also use this dialog to block the user.
5.3.2. Change password
Using the Change password buttons you can open a dialog which allows you enter a new password.
5.4. LDAP Configuration
supports LDAP to authenticate users and groups against directory services, e.g. Microsoft Active Directory. The LDAP configuration must be done from a local user account with admin privileges. It includes the following configuration parameters:
- Display Name: An identifier that is displayed on the login page.
- LDAP URL: The URL of the LDAP server.
- Search Base: Defines the starting point for the search in your directory tree.
- Manager Account Name: The distinguished name (DN) of the manager account. This account needs at least read access to the LDAP groups you want to integrate into .
- Manager Account Password: The password of the manager account.
- User Attribute: The unique identifier for users that will be used on the login page.
- User Filter: The filter that selects the users from LDAP. The Test Query button can be used to validate the filter.
- Admin Filter: The filter that selects the user with administrator privileges from LDAP. The Test Query button can be used to validate the filter.
- Enabled: Activated or deactivates a LDAP configuration.
If you're using LDAP over SSL (LDAPS) please make sure to add your certificate to the JAVA trust store. This can be done using the following command:
keytool -import -alias yourCertificate.pem -file yourCertificate.pem -keystore $JAVA_HOME/lib/security/cacerts -storepass changeit
The users in LDAP are required to have an attribute named distinguishedName. Without this attribute LDAP users will not be able to log on to
supports multiple LDAP configurations for multiple domains.
One configuration can be selected as default, which means that it will be pre-selected on the login page:
5.5. General guidelines
When a user without global administration rights opens the application, his/her home page contains an overview of the projects assigned to this user (My projects). The project names act as links to the corresponding projects. On the project overview page, the user can find all the functions for which he/she has the relevant project rights.
After selecting a project, a page is displayed with a list of all the modules in the project. This list is also available on other pages with the project navigation menu in the upper right area.
5.6. Language and web interface localization
The web interface is currently available in German and English. The language is recognized automatically from the browser or the system settings of your operating system and the content of the user interface is displayed in the corresponding language.
5.7. Outer navigation bars
The top and left side outer navigation bars can be hidden when required. This saves space when the navigation tools are not required. To show/hide the navigation bars, click the small menu icon on the upper right edge of the application.
5.8. Keyboard Shortcuts
To simplify working with the application, some functions are implemented with keyboard shortcuts. Press Shift + ? to display a summary of the defined shortcuts.
5.9. Flash messages
To provide information about the progress and outcome of processes or to display general information flash messages are displayed that are standard for all applications. The background of the flash messages differs according to the message category. Information messages are blue, success messages green, error messages red. Flash messages disappear automatically after a few seconds. Flash messages that display errors however remain displayed until they are closed manually by the user.
5.10. Documentation
Complete user documentation is available that describes the functionality of each component. This documentation can be accessed directly from the help menu in the navigation bar on the left side of the web interface.
5.11. Embedded help
In addition to the complete online help, you can find information in several places directly embedded in the interface. You can access this wherever you see a blue question mark on a white background. Move the mouse cursor over the question mark.
6. Connector Management & Document Import
6.1. Managing Standard Connectors
Connectors are used to import documents into the system. A connector monitors a specific resource (like a file system or a database), automatically imports new documents and updates changes so that imported documents are kept in sync with the document source. Connectors can also be scheduled to certain times of day, for example to import and update documents only at night and reduce system load during office hours.
Connectors can be created and administered on the connector management page. The figure below shows the connector management with the list of all connectors that have been created within the current project:
Connector: The name of the connector.
Type: The connector type. For example file connector or database connector.
Active: Indicates whether the connector is active. Only active connectors import and update documents.
Schedules: Displays the periods of time in which the connector is active. 0-24 means that the connector is active 24 hours a day.
Statistics: The statistics show the following values
Documents whose URLs have been reported by the connector.
Documents that have already been requested by the connector and whose contents have been received.
Documents that have already been enriched with metadata.
Documents that have already been saved.
Actions | Start connector : Starts the connector.
Actions | Stop connector : Stops the connector.
Actions | Reset connector : If you reset a connector, all documents from this connector are re-imported.
Actions | Edit connector : Opens the edit connector dialog. All parameters except the connector name can be edited.
Actions | Edit mapping : Opens the edit mapping dialog where connector matadata fields like title and content can be mapped to document fields.
Actions | Schedule connector : Opens the schedule dialog.
Actions | Delete documents of connector : Deletes all documents that have been imported by the connector.
Actions | Delete connector : Deletes the connector. All documents that have been import by the connector will be deleted as well.
In order to create a new connector, the connector type has to be selected first. After clicking the Create connector button the connector can be configured in the create new connector dialog. Please refer to the connector specific documentation for further details.
6.1.1. File System Connector
A file system connector imports documents from file system resources. It monitors one or multiple directories (including sub-directories) and imports documents from files in these directories. The following file types are supported:
- .txt
.doc/docx
- .ppt/pptx
- .xls/xlsx
- .html
There are currently two implementations: FileConnectorType
and AverbisFileConnectorType
. The AverbisFileConnectorType
remembers the current position when stopping, so that it does not start from the beginning when restarting.
A file system connector can be configured using the following parameters:
Name: Name of the connector. This name can be chosen freely and serves e. g. as label within the connector overview. They must not contain spaces and special characters nor underscores.
Start paths: For each line, you can specify a file system path that is taken into account by the connector. The connector runs through these directories recursively, i. e. all subdirectories are considered.
Exclude pattern: Here you can specify patterns to exclude certain files or file types (Black List).
Include pattern (optional): Here you can specify patterns to include certain files or file types only (White List).
The file system connector can only read local file systems. For docker deployments, the directories from which data is to be imported have to be mounted into the gcm
container. This can be done by adding an additional volume to the gcm
service in the docker-compose.yml
file. In the example below the /externalData
directory on the docker host will be mounted to /data
in the gcm
container.
gcm: image: registry.averbis.com/gcm/gcm:X.X.X ... volumes: - gcmVol-hd:/opt/resources/connector-manager/home - gcmvoljdbchd:/opt/apache-karaf/lib/ext - /externalData:/data
Please make sure to restart the docker containers with docker-compose up -d
to apply the changes.
6.1.2. Database Connector
With a database connector, structured data can be imported from a database connection. The database connector supports JDBC compliant databases and can crawl database tables using SQL queries. Each row from the SQL query result is treated as a separate document. The database connector keeps track of changes that are made to the database tables and synchronizes these changes automatically into .
In order to use the database connector, the database JDBC driver has to be provided to the Tomcat server instance that is running
. Please ask your system administrator to put the database JDBC driver library into Tomcats lib
directory.
The database connector can be configured using the following parameters:
Name: Name of the connector. This name can be chosen freely and serves e. g. as label within the connector overview. They must not contain spaces and special characters nor underscores.
JDBC Driver Classname: Fully qualifying class name of the database JDBC driver. E.g.
com.mysql.jdbc.Driver
JDBC Connection URL: JDBC connection URL to the database. E.g.
jdbc:mysql://localhost:3306/documentDB
Username: Database username.
Password: Database password.
Traversal SQL Query: SQL select query. E.g.
SELECT id, title, content FROM documents
Primary Key Fields: Name of the column that represents the primary key and identifies a table row. E.g.
id
The database connector default field mapping concatenates all queried columns (like id, title and content) and maps it into the document field named content
. The field mapping can be configured in the connector field mapping dialog (See section Editing field mappings for further details). The figure below shows a custom field mapping that maps the database columns to document fields. The id column is mapped to the document_name field, title and content are mapped to identical document fields.
6.1.3. Editing field mappings
Connectors read different sources to extract structured data from them. The extracted data is then written to fields of a solr core. Field mappings define which information from the original documents is written to which fields of the Solr Index.
Specific default mappings can be specified for each index and connector throughout the system. These are automatically taken into account when a new connector is created.
When editing the field mappings, select a connector field on the left. On the right, select the core field in which you want the connector to write this data. All core fields that have been activated in the Solr schema configuration and are writable are available here. In addition to editing the default mappings, you can also specify further mappings or remove existing ones.
You can also specify a sequence for the mappings. This order is relevant when mapping multiple connector fields to a core field. If the core field can contain more than one value, it lands in the field in the order specified here. If the core field can only contain one value, it will be the value that is the lowest in the mapping sequence.
After you have edited a field mapping, you must reset the connector so that the changes to the mapping are taken into account.
There are currently three different mapping types:
Copy Mapping: Der Standard Typ: The connector field is mapped 1:1 to the specified document field.
Constant Mapping: Instead of a connector field, a constant value can be mapped to a document field.
Split Mapping: The value of a connector field is divided into several values by a character to be entered. This can be used to convert comma-separated lists into multi valued document fields.
6.2. Document Import
In addition to defining connectors that can monitor and search different document sources, it is also possible to import pre-structured data into a search engine index. Unlike connectors, this data is imported once, i. e. no subsequent synchronization takes place.
6.2.1. Manage document imports
Any number of document sets can be imported in the application and deleted if necessary. For each set of imported documents, known as import batches, you see a row in the overview table. In addition to the name of the import batch, you can also see how many documents are part of the batch. The status indicates whether the import is still running, whether it was successful, or whether it has failed.
Below the overview table you will find the form elements to import a new document set. To do this, enter a name and click the Browse button. A window opens in which the local file system is displayed.
You can import single files as well as zip archives with several files. Make sure that there are no (hidden) subdirectories in such ZIP file and that the files have the correct file extensions.
These import formats are currently available:
Text Importer
Text importers can be used to import any plain text files. The complete content of the file is imported into a field. The file name of the file is available later as a metadate. CAS Importer Allows the import of serialized UIMA CAS (currently as XMI). This means that for example documents are imported as gold standards.
Please note that the type system of this CAS has to be compatible with the type system of .
Solr XML Importer
A simple XML format that allows the import of pre-structured data. During the import, the fields defined in XML are written to the search index in fields with the same value. Please make sure that the field names in the XML file correspond to the field names of the search index associated with your project.
Images that can be imported to the documents and displayed together with them are a special feature. To upload an image, you have to pack the XML document (s) together with the images into a ZIP archive. With each document you can now add as many image_reference fields as you like. Relative paths to the image are expected. Images can be stored in any subfolders within the ZIP archive. Supported image formats are. gif,. png,. jpg and. tif.
...
<field name="image_reference">images/image.png</field>
<field name="image_reference">./images/pics/picture.png</field>
...
An example of the supported import format is shown below
<?xml version='1.0' encoding='UTF-8'?> <!--Averbis Solr Import file generated from: medline15n0771.xml.gz--> <update> <add> <doc> <field name="id">24552733</field> <field name="title">Treatment of sulfate-rich and low pH wastewater by sulfate reducing bacteria with iron shavings in a laboratory. </field> <field name="content">Sulfate-rich wastewater is an indirect Tag der Arbeit threat to the environment especially at low pH. Sulfate reducing bacteria (SRB) could use sulfate as the terminal electron acceptor for the degradation of organic compounds and hydrogen transferring SO(4)(2-) to H2S. However their acute sensitivity to acidity leads to a greatest limitation of SRB applied in such wastewater treatment. With the addition of iron shavings SRB could adapt to such an acidic environment, and 57.97, 55.05 and 14.35% of SO(4)(2-) was reduced at pH 5, pH 4 and pH 3, respectively. Nevertheless it would be inhibited in too acidic an environment. The behavior of SRB after inoculation in acidic synthetic wastewater with and without iron shavings is presented, and some glutinous substances were generated in the experiments at pH 4 with SRB culture and iron shavings.</field> <field name="tag">Hydrogen-Ion Concentration; Iron; Oxidation-Reduction; Sulfur-Reducing Bacteria; Waste Water; Water Purification</field> <field name="author">Liu X, Gong W, Liu L</field> <field name="descriptor">Evaluation Studies; Journal Article; Research Support, Non-U.S. Gov't</field> </doc> <doc> <field name="id">24552734</field> <field name="title">Environmental isotopic and hydrochemical characteristics of groundwater from the Sandspruit Catchment, Berg River Basin, South Africa.</field> <field name="content">The Sandspruit catchment (a tributary of the Berg River) represents a drainage system, whereby saline groundwater with total dissolved solids (TDS) up to 10,870 mg/l, and electrical conductivity (EC) up to 2,140 mS/m has been documented. The catchment belongs to the winter rainfall region with precipitation seldom exceeding 400 mm/yr, as such, groundwater recharge occurs predominantly from May to August. Recharge estimation using the catchment water-balance method, chloride mass balance method, and qualified guesses produced recharge rates between 8 and 70 mm/yr. To understand the origin, occurrence and dynamics of the saline groundwater, a coupled analysis of major ion hydrochemistry and environmental isotopes (d(18)O, d(2)H and (3)H) data supported by conventional hydrogeological information has been undertaken. These spatial and multi-temporal hydrochemical and environmental isotope data provided insight into the origin, mechanisms and spatial evolution of the groundwater salinity. These data also illustrate that the saline groundwater within the catchment can be attributed to the combined effects of evaporation, salt dissolution, and groundwater mixing. The salinity of the groundwater tends to vary seasonally and evolves in the direction of groundwater flow. The stable isotope signatures further indicate two possible mechanisms of recharge; namely, (1) a slow diffuse type modern recharge through a relatively low permeability material as explained by heavy isotope signal and (2) a relatively quick recharge prior to evaporation from a distant high altitude source as explained by the relatively depleted isotopic signal and sub-modern to old tritium values. </field> <field name="tag">Groundwater; Isotopes; Rivers; Salinity; South Africa; Water Movements</field> <field name="author">Naicker S, Demlie M</field> <field name="descriptor">Journal Article; Research Support, Non-U.S. Gov't</field> </doc> </add> </update>
7. Text Analysis
7.1. Pipeline Configuration
The text analysis annotators and pipelines used in can be graphically administered and monitored in a centralized way. This is done in the Pipeline configuration module.
The overview page lists all the text analysis pipelines available in the project. The following information and operations are provided in the table.
"Pipeline Name": name of the pipeline.
"Status": Status of the pipeline: STOPPED, STARTING or STARTED. As soon as the pipeline started, it reserves system resources. Only after it started, it accepts analysis requests.
- "Instances": Number of pipeline instances, by default set to 1. Increasing the value to n means the pipelines can process n requests from the web interface in parallel. Note: memory requirements increase as well.
"Throughput": here, two indicators for the pipeline throughput are given: the total number of processed texts, and the average number of processed texts per second. The statistics are reinitialized each time the pipeline stops/starts.
"Operations | Initialize pipeline" : this is used to initialize a pipeline. As soon as it has been initialized, it can process texts.
"Operations | Stop pipeline" : to save system resources, pipelines can also be stopped.
"Operations | Edit pipeline" : this is used to configure a pipeline, for example to add other components to it, to remove them or to modify their configuration parameters. Pipelines can only be edited when they are stopped.
"Operations | Update pipeline" : this is used to update the statistics (throughput) and status of the pipeline.
"Operations | Delete pipeline" : this allows pipelines to be permanently deleted, if they are no longer needed.
To create new pipelines, use the 'Create pipeline' button below the overview table.
To copy an existing pipeline, click on the "Clone pipeline" button on the right button bar of the pipeline
7.2. Pipeline details
With the pencil icon in the taskbar of the overview table, you can access the details page of the pipeline. At the top left, all annotators are displayed in the order in which they are used in the pipeline.
To the right of each annotator name (green panel), you can see the annotator-specific throughput data, indicating the total number of processed texts and the average number of texts per second. By clicking the relevant annotator, you can show all the configurable configuration parameters.
7.3. Editing a pipeline
There are some general rules for pipeline configuration:
- Health Discovery comes with a set of preconfigured pipelines (see section Available Text Analysis Pipelines). Other than self-created pipelines, these preconfigured pipelines cannot be edited.
- As long as a pipeline is running, it cannot be edited.
Pipelines can be edited in the details page.
Left panel in green: your pipeline configuration
The arrow buttons and the x-button on the right can be used to move annotators to another position within the pipeline or to remove them. Individual configuration parameters of the annotators are now also editable.
Right panel in blue: available annotators
A list of available annotators, which can also be added to the pipeline by clicking on the horizontal arrow button.
7.4. Managing / Adding new text analysis annotators
The application allows to add new text analysis annotators at runtime. There is no need to reinstall or redeploy the application. For that, so called UIMA™ PEAR components (Processing Engine ARchive) are used. PEAR is a packaging format, which allows to ship textanalysis components alongside all needed resources in a single artifact.
You find a list of all available PEAR components in the Pipeline Configuration where you configure your textanalysis pipeline. Adding new annotators is done within the Textanalysis: Annotators module.
7.5. Text Analysis Processes
Any number of text analysis results can be generated and stored for all known document sources in . Text analysis results can be created either automatically through pipelines or manually. This way, you can obtain different semantic views of the same document which enable you to evaluate several views side by side.
The table contains the following columns:
"Type": indicates whether this is a manual or automatic text analysis.
"Name": name of the process. For example Demo - anatomy
"Status": Status of the process. It is either RUNNING or IDLE.
"Document source": the document source to which the task refers. In parentheses after the name is the number of processed fields. For example if two fields, contents and title, are processed in a corpus of 3000 documents, then at the end of the task, 6000 will be indicated here.
"Pipeline": in the case of an automatic text analysis, the pipeline that was used for the text analysis is indicated here.
Buttons:
Download: Download the whole result as set of UIMA XMI files.
- Refresh: Button to refresh the current state of the process (e.g. to verify the processing is finished)
Delete: Delete whole process and all results.
By clicking on the process name, e.g. "Demo_Process", you can jump to the analysis results displayed in the Annotation Editor module.
When you create a new text analysis process, you can select whether it is a manual or an automatic text analysis.
If you choose automatic text analysis, then in addition to the name and the document source, you are requested to give your text mining process a name and specify the document source and pipeline.
7.6. Annotation Editor: Viewing and Editing Annotations
To be able to make a judgment about text analysis components, it is frequently essential to have the results displayed graphically. You may also want to correct text analysis results manually or annotate documents completely manually, for example to create gold standards, which are then used to evaluate text analysis components. For all these purposes, the Annotation Editor can be used.
7.6.1. Viewing annotations inside a document source
The Annotation Editor can be used to display text analysis results graphically. Using the annotation editor, all documents from a document source can be easily viewed, section by section, and all annotations can be graphically highlighted.
In Annotation Editor, you first select a document source (1). You then select the text analysis process that you wish to view (2). If document names have been given to the documents in the source, the name of the first document in the source is displayed (3).
Once you have selected the source and the text analysis, the first document in the corpus is displayed. The document is displayed section by section. There is a selector above the text of each available annotation to enable the content of the annotation to be graphically highlighted (4).
In the main window (5), you can see the corresponding section of the document with the currently activated highlights. Below the main window, there are buttons for navigating through the individual sections of a document (6). Above it there are similar buttons, which you can use to navigate between the individual documents in a source (7).
A table with a list of all the currently highlighted annotations can be displayed on the right of the main window.
Clicking on a position in the text shows the annotations found at this position in the details view.
The overview table is also used to view the individual attributes of the annotation. By expanding the annotation in the table, you can obtain a list of all the annotation’s attributes.
7.6.2. Configuring section sizes
As described above, the documents are displayed section by section. By default, 5 sentences are displayed on each page. This setting can be configured in the interface by clicking on the wheel at the right top.
In principle, you can combine a character-based sectioning with an annotation-based sectioning. While the standard sectioning is the character-based sectioning, annotation-based sectioning may has the advantage that you don’t miss cross section annotations. When combining both sections, the sections are always shown with a slight overlap. The end of section n is displayed again at the beginning of section n+1 to avoid the section being taken out of context. Furthermore, when sectioning by characters, the sectioning automatically ensures that the section splits are not made in the middle of a word.
Any change to the section size the graphical configuration is applied immediately after closing the window. Using the reset button, you can restore the configure default values.
7.6.3. Manually editing, adding and deleting annotations
The annotation editor can also be used to add annotations manually or to edit them. Using the button on the right (1), you can switch to edit mode.
In edit mode, a combo box appears above the main window where you can select the annotation type you want to annotate (2). After you select the type, you can create annotations of this type in the text. To create annotations of this type, simply highlight an area of text in the main window using the mouse. A quick way of adding an annotation is to simply click a word. An annotation of the corresponding type is then created for the whole word.
Edit mode also allows you to edit and delete existing annotations (3). To do this, click the cross mark in the overview table of annotations on the right.
After you have made changes to the document, these can be saved or discarded by clicking the buttons (4).
7.6.4. Displayed and editable annotation types, attributes and colours
Currently, the user cannot configure which annotation types and attributes are visible in the annotation editor, which colors are assigned to these annotation types, and which attributes are editable. This is currently preset by Averbis.
7.7. Text Analysis Evaluation
The results of various text analysis tasks can be evaluated against each other, e.g., to compare a text mining process against gold standards.
To do this, you may first choose the document’s source (1) which serves as the basis of the evaluation. Then, you choose the reference view (2) in the left part of the window, and, on the right side (3), you choose the text analysis process that you wish to evaluate.
If you choose a source and two text analysis processes, one can evaluate the results visually, one against the other, in a split-view with two separate annotation editors. The representation of the sections in the right window is thereby coupled to the sections in the left window. Matching annotations are indicated by a green background, non-matching annotation are marked red.
7.7.1. "Matches" and "Partial Matches"
When evaluating, it is possible to distinguish between exact and partial matches. Annotations are marked as an exact match if their type, characterizing attributes and position in the text are identical.
To obtain an extra level between a hit and a no-hit, it is also possible to define a partial match. Annotations that are not exactly identical, but still meet these criteria, are marked accordingly both in the graphical and table presentation. In the graphical presentation they are indicated with a yellow background.
7.7.2. Configuring the match criteria
The definition of what should be considered as a match, partial match and mismatch can be configured by the user in the interface.
The general rule is that two annotations are considered as a match when they are of the same type and are found at exactly the same place in the document. For each annotation type you can then define which annotation attributes also have to match. If we use a concept, this could be the concept’s unique ID. This means that two concepts would be identified as a match only if this attribute was identical in both annotations.
It is also possible to configure for each annotation type, when two annotations of this type should be considered as a partial match. Here you can choose between four different options:
"No partial matches": only exact matches are allowed.
"Annotations must overlap": a partial match is given whenever the annotations overlap.
"Allow fixed offset": at the beginning and end of the annotations, a configurable offset is allowed.
"Are within the same annotation of a specific type": a partial match is found whenever the annotations are within the same larger annotation. For example, if they are inside the same sentence.
7.7.3. Corpus evaluation
Using the Evaluate metrics button, a window can be opened, displaying the precision, recall, F1 score and standard deviation for either a single document or the whole corpus. The numbers are split by annotation type.
In the Settings panel, you can configure which types are to be taken into account in the corpus evaluation.
7.8. Annotation Overview
For the quality assessment and improvement of text analysis pipelines, an aggregated overview of the assigned annotations is often helpful. For this purpose, the Annotation overview is used. You can create any number of these overviews. To do this, you first select a source and an existing text analysis process. Next, you select the annotation type to be analyzed.
After pressing the green button, the aggregation is calculated. Depending on the scope of the selected source, this may take some time. All overviews are listed in the table. As soon as an overview has been calculated, the results can be displayed via the list symbol.
7.8.1. Aggregation und Context
If you select an overview from the table using the list symbol, you will see an aggregated list of the annotations found for the corresponding type. By default, the list is sorted in descending order by frequency. By clicking on an annotation in the table, you can display some example text in which the annotations occur. In addition to the analysis, the overview is also suitable for directly improving the results. In this way, false positives as well as false negatives can be identified and corrected.
Currently, the attributes that appear in the list for each annotation, are preconfigured by Averbis. This setting cannot yet be made graphically via the GUI.
8. Terminologies/Lexical resources
In this module, you can manage the lexical resources, which are used within the text analysis components.
8.1. Terminology Administration
That module lists all available terminologies within the current project. You can:
- create and delete terminologies
- edit the parameters of a terminologies (e.g. ID, Label, Version)
- import new terminologies in OBO format
- submit terminologies for use in text mining pipelines
- download terminologies in OBO format
8.1.1. Create a new terminology
When creating a new terminology, you can specify the following parameters:
Terminology-ID
A unique identifier. E.g. MeSH_2017.
Label
A label. E.g. MeSH.
Version
A version number. E.g. 2017.
Concept type
The concept type when being used within text analysis. E.g. de.averbis.extraction.types.Concept.
Hierarchical
When unchecking this box, the terminology will not contain any hierarchical relations (flat list).
Encrypted export
ConceptAnnotator dictionaries can be exported encrypted to prevent having sensible data on the disk.
This parameter only affects Concept Dictionary XML Exports. Other exports still are unencrypted.
Besides, you can specify, which languages are available within that terminology.
8.1.1.1. Available languages
Your terminology can contain term for all languages which are selected here. There is no need to use all languages for all terms. So there could be concepts, which only have terms in a subset of those languages. Since in some situations, we need to compute one cross-lingual preferred term, we need to decide which language to use, if there are no terms in specific languages. For that, you can specify a language priority by moving the language up/down in this list. If you have English at the top, followed by German, we try to display the English preferred term. If no English preferred term is available, the German one is displayed.
There is one special language, called Diverse
. Terms in that language are mapped in every language. You can mark language independent terms with that language (e.g. Roman numerals).
8.1.2. Edit terminology´s meta data
You can edit the meta data, that you specified when creating the terminology, via the edit-button.
8.1.3. Delete a terminology
The delete-button allows to delete a terminology, when there is no active import or export running.
8.1.4. Import content into a terminology
You can import content from OBO files (versions 1.2 [1 and 1.4[2]) into an existing terminology. If you have multilingual terminologies, version 1.4 needs to be used. Optionally, a mapping mode for each synonym can be imported, too.
The source file may be zipped to support large files.
The minimal structure of your OBO terminology looks like this:
Example of an OBO terminology
synonymtypedef: DEFAULT_MODE "Default Mapping Mode" //OPTIONAL - only if using mapping modes
synonymtypedef: EXACT_MODE "Exact Mapping Mode" //OPTIONAL - only if using mapping modes
synonymtypedef: IGNORE_MODE "Ignore Mapping Mode" //OPTIONAL - only if using mapping modes
[Term]
id: 1
name: First Concept
synonym: "First Concept" DEFAULT_MODE []
synonym: "First Synonym" IGNORE_MODE []
synonym: "Second Synonym" EXACT_MODE []
[Term] id: 2 name: First Child is_a: 1 ! First Concept
To import terms with mapping modes, the OBO terminology begins with the synonym type definitions, as shown in the first three lines of the OBO terminology in the example above.
Each concept begins with the flag "[TERM]", followed by an "id" and a preferred name with the flag "name". After that you can add as many synonyms as you like with the flag "synonym", followed by the desired mapping mode (optionally). Note: if you would like to define a mapping mode for your concept name (flag "name"), you have to add the term as synonym, as shown in the example for "First Concept".
Furthermore, if your terminology contains a hierarchy, you can use "is_a" to refer to other concepts of your terminology.
To import a terminology like the one shown above, proceed as follows:
In "Project Overview", click on "Terminology Administration".
Click on "Create New Terminology". Fill in the dialog as described in Add Terminology.
Once you have created a terminology, click the up arrow icon to the right of the terminology.
In the "Import Terminology" dialog, select the terminology you want to import from the file system. Click on "Import".
By clicking on the "Refresh" button to the right of the terminology you can check the progress of the import. When the terminology has been fully imported, the status changes to "Completed".
To browse your terminology, switch to the "Terminology Editor" by going to the "Project Overview" page and clicking on "Terminology Editor".
After an import has started, the current status is shown in the overview.
Besides, you can see some details of the latest import (including error messages).
After successful terminology import, terms, hierarchies and mapping modes can be checked in the Terminology Editor.
8.1.5. Submit terminologies for use in text mining pipelines
To use a terminology within the text analysis, , it must be handed over to the text analytic module via the "submit terminology to text analytics" button.
After submitting the terminology to the text analytic module, you need to stop and restart the pipelines, which use this terminology.
8.1.6. Download terminology
To download a terminology in OBO format, open the Terminology Administration and perform the following steps:
Step 1: Click the button "Preparing for OBO download"
The preparation time depends on the size of your terminology. Once the download is ready, a notification appears in the bell symbol in the upper menu bar.
Step 2: After the preparation step is completed, refresh the terminology. This will enable the button "Download OBO file"
Step 3: Click the button "Download OBO file". Depending on your local browser settings, the download will start automatically or the download prompt will open.
8.2. Terminology Editor
The Terminology Editor allows to edit the content of terminologies.
8.2.1. Free text search and autosuggest
The centered search bar at the top of the Terminology Editor is meant for doing a free text search across multiple terminologies. You can include or exclude terminologies from the search by checking them within the drop down menu next to the search bar. While entering a search term, the system suggests different possible matches via autosuggest, grouped by terminology.
Doing a free text search, you can use the asterisk symbol (*
) for truncation (e.g. Appendi\*
). The results of a free text search are listed within the upper right section. Results are grouped by their terminologies.
The settings menu on the top right allows to customize some search and autosuggest settings. You can specify whether Concept IDs are included within the search, and define the number of hits that shall be displayed.
8.2.2. Displaying concepts hierarchically
The tree view in the Terminology Editor allows to view its position in the terminology hierarchy. Just click on a concept within the list of search results.
You can configure whether the Concept ID shall be shown in the tree as well, and whether the tree view shall show the siblings of a concept along its hierarchy.
8.2.3. Terms
In the lower right corner of the windows you see the concept’s details. The first tab shows concept synonyms. You can edit, add or delete synonyms here as well.
8.2.4. Mapping Mode
Every term has a so called Mapping Mode. Mapping Modes are an efficient way of increasing the accuracy of terminology based annotations. They allow to ignore certain synonyms which are irrelevant or lead to false positive hits (IGNORE). Synonyms can also be set to EXACT matches, which is especially good for acronyms and abbreviations (AIDS != aid).
Currently, there are 3 Mapping Modes
DEFAULT
Term is preprocessed the same way the pipeline is configured.
EXACT
Term is only mapped when the string matches exactly to the text without any modification by preprocssing (including case).
IGNORE
Term will be ignored. It won’t be used within the text analysis.
8.2.5. Relations
The second tab shows all relations known for that concept. You can use this view to add or delete relations, too. Currently, only hierarchical relations are supported. When adding a new relation, you get an autosuggest to find the correct concept that you want to relate.
8.2.6. Mapping Mode and comment
In the third tab, you can add a comment to a concept. Besides, you can set a concept-wide Mapping Mode. Terms, which do not have a specific Mapping Mode inherit it from this concept Mapping Mode.
9. Document Search
9.1. Solr Core Administration
As soon as the Solr Admin module is used, the application has a default Solr Core. This core is displayed in the administration panel.
uses Solr to create a search index and to make documents searchable. Choose "Solr Core Administration" on the project overview to create the basic settings.
9.1.1. Indexing pipeline
Documents that are imported or crawled go through a text analysis pipeline in order to add metadata to the search index.
The corresponding pipeline is selected here - a separate indexing pipeline can be used for each project.
If you choose an indexing pipeline, all documents that are imported or crawled in the future will be processed. If you want to use a different pipeline for processing search queries, you can set it in the Solr Core Management section.
You can also switch the indexing pipeline within a project. To avoid a heterogeneous set of metadata, all documents are re-processed.
9.1.2. Query Pipeline
Here you can select which of the available pipelines should be used for analyzing the search query. By default, the same pipeline is used here as selected for indexing the documents.
9.1.3. Solr Core Overview
A so-called "Solr Core" is available for each project, the administration of which can be accessed via the "Solr Core Management" button on the project page.
"Core Name": The name of the Solr instance (generated automatically)
"Path to solrconfig.xml": This is the path to the configuration file of this Solr instance. Expert settings can be made in this configuration file. After editing this file, the Solr instance must be restarted in order for the changed settings to take effect.
"Path to schema.xml": The index fields are configured in this configuration file. This file should only be edited manually in exceptional cases and by experts.
"Indexed documents": Number of documents currently in the index.
"Pending documents": Number of documents that are currently in the processing queue of the Solr instance.
After pending documents have been processed by Solr, a commit must take place before these documents are actually available in the index. Since a commit is quite resource-intensive, the number of commits are kept low. By default, a commit therefore only takes place every 15 minutes. The processed documents therefore appear under the indexed documents with a delay.
"Operations": At the level of the Solr core, there are three operations available:
"Refresh" : You can update the displayed key figures by clicking on this icon.
"Commit" : This command executes a commit on the Solr core, including documents in the index that are not visible beforehand. By default, this happens every 30 minutes in the background.
"Delete all documents from the index" : With a click on this icon, all documents are deleted from the index.
9.1.4. Configuration of the search index schema
The configuration of the schema of the current search index can be reached via the module "Solr schema configuration".
9.1.4.1. Overview of all schema fields
Each Solr core has a schema that defines which information is stored in which kinds of fields. The Solr schema configuration lists all available fields in alphabetical order. The following information and operations are available for field in the index:
"Field name": Name of the field as defined in the Solr schema. This name is often chosen in such a way that it is unpleasant for people to read. If a field is a system field, that is, a field whose values must not be overwritten by the user, a small lock symbol () is displayed to the right of the field name.
"Type": The type specifies the contents of this field. In addition to an abstract description (e. g. string) the complete class name of the field is specified in parentheses.
"Active": This button controls whether the field contains information to be displayed or used elsewhere in the application. These fields are then available, for example, to be displayed in the search result, to form facets or to be used via query builder for the formulation of complex, field-based search restrictions. Fields that are not activated can still be used by the system, but they are not available for manual configuration to the users. If a field is activated, the line is highlighted in green.
"Label": The field name itself is often not suitable for displaying because it is not legible, and it is not localized. Therefore, you can define meaningful display names for all fields in different languages. These names are used wherever the user accesses or displays field contents. If no corresponding display name is defined for the user’s language, the illegible field name is displayed.
9.1.4.2. Dynamic fields
In the overview, dynamically generated Solr fields are also displayed as soon as they have been created (that is, as soon as they have been filled with values once). As soon as the field has data, it remains permanently in the overview, even if all documents containing values in this field have been deleted in the meantime.
9.2. Manage and use search interface
The functionality and appearance of the search interface can be influenced by configuration.
9.2.1. Configuring the display of search results
Starting from the overview page of a project, the display of search results can be configured by using the "Field Layout Configuration" module. You can specify which fields/contents of the indexed documents are to be displayed in the interface. This applies to both the fields on the results overview page and the fields on the detail page of the documents (accessible by clicking on the title information of the result). Fields that are only displayed on the overview page of the search results are highlighted in green. In addition to selecting the fields, you can also configure whether the field title should be displayed, as well. If this option is activated, the display name created in the Solr schema management for the language of the respective user is displayed.
In addition, the length of content of a particular field can be specified, as well as some style settings.
9.2.2. Configure Facets
So-called facets provide the user with additional filter options. They are displayed on the left side of the search page. The configuration of facets can be accessed via the module "Facet Configuration" on the project overview page.
On the configuration page, you can select and configure the facet fields displayed in the user interface. When selecting a facet, you can configure whether the entries within a facet are AND- or OR-linked. In the case of AND facets, only documents that combine all the terms selected in this facet are displayed. OR facets, on the other hand, offer the option of finding documents that contain only individual terms (e. g. documents of "Category 1" OR "Category 2").
In addition, you can configure how many entries are to be displayed within each facet. The order of the facets can be determined with the arrows. The display in the search interface is similar to the order in the administration panel. The display name of a facet is selected according to the labels assigned in the Solr schema configuration (see above).
9.2.3. Configuring auto-completion
Settings for automatic completion of search terms can be made via the "Autosuggest" module that you access on the project overview page. There are various methods by which users can make suggestions to complete their searches in a meaningful way. Currently, four methods are available to choose from, and they can be freely combined as needed.
The proposals are grouped by their mode in the search interface. The order of the groups corresponds to the order in which the modes are listed here (if more than one mode is used). Use the arrow keys to change the order.
In addition to the number of proposals per group, you can also specify a description for each group, which is displayed in the search interface above the respective proposal block.
Changes will take effect immediately after saving for all users of the search.
If one of the two concept-based methods is used, an additional field appears where you select which Solr field is to be used for the lookup. All fields that are recognized as concept-based fields are available for selection.
The methods are characterized as follows:
"Prefixed Facet Mode"
The proposals for completing the search query come from the documents in the search index. No external sources are therefore used for the proposals.
The suggestions are intended to complete the term currently entered, no additional term is proposed (no multiple word suggestions).
The current search restrictions (e. g. via facets) are taken into account in the proposals. Therefore, only those terms are suggested for which there are also hits in the body, taking into account all active search restrictions.
The proposals are not based on the order of the terms in the documents. If you enter a search query that consists of several partial words, the proposed word does not have to be directly behind the term it is in the search query.
"Shingled Prefixed Facet Mode"
The proposals for completing the search query come from the documents in the search index. No external sources are therefore used for the proposals.
Unlike simple prefixed facet mode, suggestions can consist of several words. In addition to the completion of the term currently entered, it is also suggested terms that are often directly or closely related to this term in the documents. Entering Appen in this mode could therefore lead to suggestions such as treating _appendicitis.
The current search restrictions (e. g. via facets) are taken into account in the proposals. Therefore, only those terms are suggested for which there are also hits in the body, taking into account all active search restrictions.
If the query consists of several words, the suggestions for the order are based on the last of these words. All terms before this last word are still used as filters. The entry Hospital Appendi could therefore also lead to the suggestion Hospital Treat Appendicitis, if Hospital Treat Appendicitis is not in the immediate vicinity of Hospital in the text.
Concept Mode with guaranteed hits (concepts_hit)
The suggestions for completing the search query are taken from synonyms of the stored terminology.
Proposals show the wording of the synonym and the title of the terminology as well as the preferred name of the concept in the user’s language.
If you select a proposal (synonym), a search with the associated concept is executed.
Documents that contain the exact synonym text (that is, documents that cannot be found using another synonym) are given a higher weighting and are displayed in the results list above.
Only proposals that guarantee at least one hit are displayed.
Concept Mode without guaranteed hits (concepts_all).
This mode differs from the conventional concept mode in that proposals are also displayed that do not lead to a hit. All terms from the stored terminology are displayed.
The activation of the concept modes is not completely implemented via the GUI. Please contact support.
9.2.4. Search restrictions
Switch to the "Search" module of the project to get to the search page of the application. All search terms entered remain comprehensible for the user at any time. You can easily see which search terms have led to the currently presented result set. The current search restrictions are listed next to each other on the left side of the search bar. They are highlighted in the same color as the corresponding highlighting in the text. If the restriction by a term originates from a facet, the name of the facet is listed before the search term (see screenshot below).
If the number of search restrictions is too long to be displayed in the search bar, they are displayed in a pop-up and collapsible menu on the left in the search bar. The small cross symbol next to each search restriction removes this restriction and updates the search results accordingly. With the cross button to the right of the search bar you can also remove all current search restrictions at once.
9.2.5. Faceted search
Facets represent one of the core functionalities of the search. With the help of the facets, the search results can be quickly limited to relevant results. In the admin panel you can configure for which categories facets should be displayed.
Within the facets, the most frequent terms from the respective category appear, which are contained in the indexed documents. The number after the faceted entries indicates how many documents are contained in the index (or current search result set) that match the corresponding term.
The faceted entries can be clicked on, whereupon the search result will be limited accordingly. Different terms can be combined here - even across facets. This allows a high degree of flexibility in restricting the search results.
9.2.6. AND-linked facets
By default, all selected facet entries are AND-linked. This means that only documents matching all selected criteria are listed. The currently selected filters are highlighted in orange. The restriction can be removed by clicking on the faceted entry again.
9.2.7. OR-linked facets
This filter yields to result sets in which at least one of the selected criteria appears. only one or only a few of the selected terms appear. In the case of these OR-linked facets, a checkbox is displayed in front of each entry.
9.2.8. Querybuilder / Expert Search
With the query builder, a comfortable mechanism is available in the system to create complex search queries. This allows for combining different criteria to a a query using any fields from the index.
The Querybuilder can be opened using the magic wand icon in the search bar.
The input mask allows you to add search restrictions on all activated schema fields. Depending on the type of the selected schema field, different comparison operators are available. Text fields allow the operators contains
and contains not
. Any text can be entered as a restricting value. The asterisk *
is used as a wildcard.
Date fields are provided by the comparison operators >=
and <=
. Numerical fields are provided by the comparison operators =
, <>
, >=
and <=
. By combining two date or number fields, the search can also be restricted to periods or ranges.
Concept-based fields allow the operators contains
and contains not
like text fields.
Any number of conditions can be added. These are linked with each other using the boolean operators AND and/or OR. The criteria can also be grouped together to create any logical combinations. In addition to the graphical display, you can also find the logical expression that results from the current compilation of search restrictions in the upper area of the query builder. Once the complex search query has been created, it can be activated using the Apply button. The search results are calculated accordingly. In addition, the magic wand icon in the search bar turns orange to indicate that a complex search restriction is active. The search query can be reloaded by clicking on this button and can be edited until the result matches your expectations.
The query created using the Querybuilder behaves in addition to any other search restrictions, such as by means of free text search or facet restriction.
9.2.9. Document details and original document
The title field of a document serves as a link to a detail page containing additional information about the document (see "Solr Schema Configuration" module on the project overview page).
In addition to the detailed view, you can also download the underlying original documents (e.g. PDF, office document etc.) if they are available. You can recognize this by a small icon on the right of the document title. The symbol differs depending on the document category. Clicking on the file icon starts the download of the original document.
9.3. Export search results
Documents in the system can be exported - both individual documents and complete search result sets.
9.3.1. Selection of documents to be exported
If the user has the necessary permissions to export documents, checkboxes are provided on the search results page to mark individual documents. There is also a checkbox to mark all currently displayed documents. In addition, the button "Export search results" is displayed above the search results, where the selected documents can be exported.
Another option is to export all documents that meet the current search restrictions. In this case, all checkbox have to be deselected.
9.3.2. Selection of the exporter and the fields to be exported
After selecting the documents to be exported, a dialog box appears in which the exporter type can be selected. To this day, there is an exporter that exports selected fields of the documents to an Excel document.
After selecting the fields to be included in the export and confirming with the "Export" button, the export starts. Once the export is complete, the result is offered for download.
10. Document Classification
10.1. Manage classification
10.1.1. Administration of the label system
The target categories for automatic classification of documents are called the label system that can be edited and maintained in the module "Label System". In a new project, the label system is initially empty.
Clicking on "Create new label" at the bottom left adds a new label. The pen symbol on the right-hand side is used to rename the label. The plus symbol to its right adds a new label as a child of the current label. It is therefore used to create hierarchically organized label systems. Clicking on the red cross symbol deletes labels (only labels that have no children can be deleted).
In a hierarchical labeling system, the hierarchical arrangement can also be edited via drag & drop.
10.1.2. Administration of different classification sets
The starting point for the automatic classification of documents are so-called classification sets.
10.1.2.1. Create a new classification set
Any number of classification sets can be created for each project. This means that you can classify the same document source with different classification parameters.
There is only one label system per project. The same label system is used for each classification set. Please make sure that the label system has been created before you create a classification set.
To be able to view the results of the classification in the interface, you should select an indexing pipeline in Solr Core Management
before you create classification sets.
When creating a new classification set, following settings can be adjusted:
Name: Name under which this classification set is referenced.
Document fields: From all document fields known to the system, you can select those that are used for training the classifier (so-called
features
).High confidence threshold: The system distinguishes between documents with high and low confidence for automatically classified documents. This parameter can be used to define the value above which the confidence is interpreted as "high".
Classifier: In principle, different implementations can be used for classification. At present, the implementation offered is a
support vector machine
.SVM:
Support vector machine
Single/multi-label: This parameter determines how many categories can be assigned to a single document. With
Single
only one label is assigned. With aMulti
, a document can be categorized in several classes.Classification method: The classification method determines how the machine selects from several candidates. Depending on whether it is a single-label or multi-label scenario, different options and configuration parameters are available:
Single-Label
Best Labels: With
Single-Label-Classification
there is only one classification method: theBest Labels
method chooses the class with the highest confidence.Threshold : The threshold value can be used to determine that only classes that have a certain minimum confidence are taken into account. This allows for filtering assignments for which the machine is very unsafe.
Multi-Label: For
Multi-Label Classification
several methods are available (for a deeper theoretical background, see Matthew R. Boutell: Learning multi-label scene classification ):All Labels: This method simply selects the available instance labels in a decreasing confidence order.
T-criterion: Using the T-criterion, instances first get filtered by a minimum confidence threshold of 0.5. If the confidences are too low, i.e. no labels are assigned, another filter step is used. The second step checks if the entropy of the confidences is lower than the minimum entropy threshold, i.e. confidences are distributed unevenly. If this is the case, the labels are assigned based on a lower minimum confidence threshold.
Entropy: 1.0 (default minimum entropy)
Threshold value: 0.1 (default minimum confidence)
C-criterion: This method ensures the selection of the best prediction values depending on the configuration parameters (i.e. Percentage and Threshold values). It first selects the label with the highest confidence (larger than the threshold value) and continues to assign labels whose confidence is at least at 75% of the highest confidence value.
Percentage value: 0.75
Threshold value: 0.1 (minimal default confidence).
Top n labels: This method selects those categories that have the highest confidence.
n: the number of classes to be assigned
The classification configuration can be changed on the classification administration page by clicking on the edit button.
After changing parameters of an existing classification set re-training and re-classification are necessary for all changes to take effect.
Before documents can be automatically classified, the machine requires appropriate training material. This refers to a small set of intellectually classified documents used by the machine to train a model.
Training data can be created in two ways. Either by manually assigning classes via the graphical user interface (please see "Browse classifications" below) or by importing a CSV file that contains appropriate assignments.
10.1.2.2. Import of training material
The button opens a dialog for importing a CSV file with training material. The CSV file must contain the name of the document in the first column (referred to document_name
in the system). The subsequent columns contain the label assignments (one column for each label in a mult-label scenario). The columns must be separated by semicolons. The values of the columns can be enclosed with double quotation marks if required (mandatory if the values contain semicolons).
Example :
trainset.csv
doc1;label_1;label_2
doc2;label_1;
doc3;label_1;label_3
...
The document name, which is used to identify the document in the list, must contain the value that is entered in the field document_name
in the application.
If a training file contains several labels per document, but the selected training set is a single-label classification, only the first label is used.
If the document names or labels contain semicolons, the values must be enclosed in double quotation marks to avoid incorrectly interpreting the semicolon as a field separator.
Only values that are part of the label system in the application (or project) are allowed as labels (all others are ignored).
When you import training material, any labels that may already be assigned to the documents in the list are deleted.
10.1.2.3. Train a model
As soon as the system has access to training material by importing a training list or manually assigning labels, a model can be trained using the button. Use to update the information on "State" and "Model": the training has finished if "State" is IDLE and "Model" is READY.
10.1.2.4. Quality of the current model
After each training session, an evaluation is carried out to evaluate the current quality of the model. For this purpose, the machine uses the document set of intellectually confirmed labels. This quantity is divided into a training set (90%) and a test set (10%). The test set is classified by the machine on the basis of a model that has been trained for this training set. The results of the automatic classification are then compared with the intellectually assigned labels. To smooth the results, the machine repeats this 10 times for different divisions of test and training sets. The results of the tests can be viewed in the form of a diagram using the button. The diagrams show the following metrics per label, which are derived from the number of correct assignments (true positives - TP), false assignments (false positives - FP), and missing assignments (false negatives - FN):
Accuracy: The ratio of all correct assignments (and correct non-assignments) to the total sum of all observations:
TP + TN
____________________
TP + FP + FN + TN
Precision: The ratio of correct assignments to all assignments:
TP
_________
TP + FP
If one attaches great importance to the fact that there are no misallocations, this value is of particular relevance.
Recall: The ratio of correct assignments to the sum of all existing correct assignments:
TP
_________
TP + FN
If you take some misallocations into account in order to increase the number of hits, this value is of particular relevance.
F1-Score: A weighted average between Precision (P) and Recall (R):
P x R
2 x _________
P + R
10.1.2.5. Automatic classification of all unclassified documents
As soon as an initial model has been created, all previously unclassified documents can be automatically classified on the basis of this model via on the classification configuration page.
Once the classification is complete, the results can be viewed in the graphical user interface. The assigned classes are displayed above each document (see "Browse classifications" below).
10.1.2.6. Status information
The overview table depicts information of the current status of the classification set:
IDLE: No process is currently running.
TRAINING: A training is in progress. During this time, no other processes can be started on this classification set.
CLASSIFYING: Documents are currently being classified. During this time, no other processes can be started on this classification set.
ABORTING: A process (training or classification) is being aborted. During this time, no processes can be started on this classification set.
The resulting model of a classification set comes with additional information:
NONE: No model has been trained yet.
READY: A valid model exists and a classification process can be started.
OUTDATED: Since the last training, manual classifications have been added or automatic classifications have been confirmed or rejected. The model should be re-trained in order to make changes take effect.
INVALID: Changes were made to the label system or a manually assigned label were deleted, which invalidates the current model. The model has to be re-trained.
10.2. Index, evaluate and manually classify documents
For all classification sets, you can use a graphical user interface to navigate through the documents, review results, confirm or delete automatically assigned classes, and assign classes manually. You can access this browser view by clicking on "Classification" on the project overview page.
10.2.1. Structure of the interface
The interface is similar to the search interface, both in terms of its structure and functionality. The classification page has three predefined facets on the left side of the screen, that can be used to filter documents according to the assigned class (Label
), the assigned confidences (Confidence
) or the assignment status on the document level (Status
).
This makes it very easy to display, for example, only those documents that have been automatically classified (Status
= Autoclassified
) and that have labels with low confidence (Confidence
= low
). By making corrections/confirmations to the resulting documents the classification model can be improved (i.e. the system learns exactly where it is currently most unsafe (so-called Active Learning).
To the right of the search input field, the classification set on which you want to work can be chosen. If you have created several classification sets, you can quickly switch between them.
10.2.2. Confirm or reject automatically assigned labels
The labels that have been assigned to each document are depicted below the title information of each document. Manually assigned labels are displayed in blue ( ), automatically assigned classes are displayed in red (low confidence
), or green (high confidence
).
Automatically assigned labels have a button to confirm and to delete the label. By confirming an automatically assigned label, it changes its color and will be considered for the next training session to improve the model.
As soon as you confirm, delete or add labels, the model is considered OUTDATED
. This means that since the last training session, new data has been collected to improve the model and re-training is necessary.
10.2.3. Execute actions on several selected documents
Similar to the conventional search interface, there are several document-centered actions for classification. In general, actions either refer to
exactly one document,
a selection of documents
all documents of the project or
all documents corresponding to the current search restrictions.
For any of these actions, there is a small button with a distinctive icon under the document title. Use this button to apply the action exactly to the corresponding document.
The same icons are displayed on larger buttons below the search bar ("Label documents(s)", "Classifiy document(s)", "Export classifications"). Clicking on these buttons apply the action to all documents that are marked with the checkbox left to their title. All documents on the current search result page are selected by clicking the uppermost checkbox on the page.
If no particular documents are selected at all, the action is applied to all documents that correspond to the current search restrictions. Since the result set can be very large, a window opens for approving the currents selection before the corresponding process starts in background.
10.2.4. Manually label documents
In addition to confirming or rejecting automatically assigned labels, categories can be assigned manually. The button attached to each document serves this purpose. The button opens a window in which you can select the desired label(s). You can also manually label several documents at the same time by using the checkboxes left to the documents title in conjunction with the uppermost button.
When manually assigning labels, a window opens with labeling information:
"Not selected": This label has not been assigned to any of the selected documents.
"Partially selected": This label has already been assigned for some (not all) selected documents (gray stripes).
"Completely selected": All selected documents already have this label (grey).
When assigning a label manually, automatically assigned labels of the same type are automatically overwritten, if existing.
As an example, if you select 100 documents to assign label A and 10 of them already have an automatically assigned label A, the status for the 10 documents will be switched to "Approved". An automatic assigned label B would not be replaced by this procedure (except in a single label classification scenario where only one label is allowed).
10.2.5. Classify documents automatically
The same selection mechanism as for manual labeling also applies to automatic classification (single documents, a selection of documents or the current search result set). The button "Classify document(s)" with the icon automatically classifies documents that are not manually categorized.
As a result, automatically assigned category labels are displayed in red (low confidence automatic label with low confidence), or green (high confidence automatic label with high confidence). The corresponding facet filters on the left (Label, Confidence and Status) will change when refreshing the page.
If documents are automatically classified, all previously unconfirmed automatically assigned classes of these documents are deleted from previous runs.
10.2.6. Export labels
The assignment of (confirmed or manual) labels can be exported from the interface to a CSV file (button "Export classifications"). The format has the same structure as the input format that is allowed for importing training material.
10.2.7. Training and classifying directly from the search page
With the button on the top right of the page a new model based on all previously manually classified or confirmed documents can be trained. Similar, the button on the top right is used to classify all unclassified documents based on the current model.
11. Application Interface: REST API
11.1. Overview
The REST API provides access to functionality for third-party applications. The API is HTTP-based, so it can be used with any language that has an HTTP library, such as curl.
11.1.1. Base URL
All API endpoints are relative to the base URL. For example, assuming is available at http://localhost:8080/information-discovery, the REST API base URL for all endpoints is:
http://localhost:8080/information-discovery/rest/
All API endpoints are relative to the base URL. For example, assuming is available at http://localhost:8080/health-discovery, the REST API base URL for all endpoints is:
http://localhost:8080/health-discovery/rest/
All API endpoints are relative to the base URL. For example, assuming is available at http://localhost:8080/patent-monitor, the REST API base URL for all endpoints is:
http://localhost:8080/patent-monitor/rest/
11.1.2.
API Versions
The REST API has multiple versions. You can specify the version in the request URL after the REST API base URL. For example, here's a call to API version 1 indicated by the v1 URL path:
curl -X GET 'http://localhost:8080/information-discovery/rest/v1/buildInfo'
curl -X GET 'http://localhost:8080/health-discovery/rest/v1/buildInfo'
curl -X GET 'http://localhost:8080/patent-monitor/rest/v1/buildInfo'
11.1.3. Response
Typically, requests to the REST API are answered with a JSON return. The return object essentially consists of a payload
property, which contains the actual user data, and an errorMessages
property, which contains any error messages. Successful API requests are answered with a HTTP status code 200.
{ "payload": {}, "errorMessages": [ "string" ] }
11.1.4. API Tokens
The REST API (starting with API version 1) uses API tokens to protect resources against unauthorized use. Users can create personalized API tokens and use them for authentication on API calls:
To authorize an API call, the user underlying the API token is used. The API tokens must be transferred in the api-token
header. The following example shows a REST API request using an API token:
curl -X GET --header 'api-token: 235907816cd27cc1411633bea37fc5c7af38030f6ce22888d0d49872b8b74ad6' 'http://localhost:8080/information-discovery/rest/v1/buildInfo'
curl -X GET --header 'api-token: 235907816cd27cc1411633bea37fc5c7af38030f6ce22888d0d49872b8b74ad6' 'http://localhost:8080/health-discovery/rest/v1/buildInfo'
curl -X GET --header 'api-token: 235907816cd27cc1411633bea37fc5c7af38030f6ce22888d0d49872b8b74ad6' 'http://localhost:8080/patent-monitor/rest/v1/buildInfo'
11.1.5.
Error Handling
Errors are indicated by standard HTTP error codes. The following error codes are used by the REST API:
Code | Description |
---|---|
400 | Bad request. Please check the error message for further details. |
401 | Unauthorized request. Please supply a valid API token in the api-token header. |
403 | Forbidden. The user that does not have the required privileges to access the resource. |
404 | Resource could not be found. |
405 | Request method not supported. |
500 | Internal server error. |
Additional information may be provided by the JSON response that contains more details about the error. In this case, this additional information will be contained in the errorMessages
property.
{ "payload": null, "errorMessages": [ "Pipeline \"MyPipeline\" has not been initialized" ] }
11.1.6. Browser Interface
comes with a built-in browser interface for the REST API based on Swagger UI. It allows you to get an overview of the API and to submit sample requests directly from the browser. The browser interface is available at
http://localhost:8080/information-discovery/swagger-ui.html
http://localhost:8080/health-discovery/swagger-ui.html
http://localhost:8080/patent-monitor/swagger-ui.html
11.2. Text Analysis
11.2.1. Create Pipeline
This function creates a new text analysis pipeline using a given pipeline configuration.
POST /v1/textanalysis/projects/{projectName}/pipelines
11.2.1.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
pipelineConfigurationDto | body | string | A JSON object that encapsulates the pipeline configuration. |
11.2.1.1.1. Example pipelineConfigurationDto
{ "schemaVersion": "1.2", "name": "MyPipeline", "description": "A very simple pipeline", "analysisEnginePoolSize": 1, "casPoolSize": 2, "fixedFlow": [ { "refs": "LanguageSetter" }, { "refs": "SentenceAndTokenAnnotator" } ], "collectionReader": null, "components": [ { "analysisEngines": [ { "name": "SentenceAndTokenAnnotator", "template": "de.averbis.textanalysis.components.jtokannotator.JTokAnnotator", "resourceRefs": [], "parameters": [ { "name": "genre", "value": "patent" }, { "name": "addParagraphs", "value": "false" } ] }, { "name": "LanguageSetter", "template": "de.averbis.textanalysis.components.languagesetter.LanguageSetter", "resourceRefs": [], "parameters": [ { "name": "language", "value": "en" }, { "name": "overwriteExisting", "value": "false" } ] } ], "aggregatedAnalysisEngines": [], "resources": [] } ] }
curl -X POST "http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 1, \"casPoolSize\": 2, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"language\", \"value\": \"en\" }, { \"name\": \"overwriteExisting\", \"value\": \"false\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"
curl -X POST "http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 1, \"casPoolSize\": 2, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"language\", \"value\": \"en\" }, { \"name\": \"overwriteExisting\", \"value\": \"false\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 1, \"casPoolSize\": 2, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"language\", \"value\": \"en\" }, { \"name\": \"overwriteExisting\", \"value\": \"false\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"
11.2.1.2. Response
{ "payload": "MyPipeline", "errorMessages": [] }
11.2.2. Get Pipeline
The get pipeline function provides access to pipeline details and status information.
GET /v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}
11.2.2.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
pipelineName | path | string | The name of the text analysis pipeline. |
curl -X GET --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline'
curl -X GET --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline'
curl -X GET --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline'
11.2.2.2. Response
{ "payload": { "id": 63506, "name": "MyPipeline", "description": "A simple pipeline", "pipelineState": "STARTED", "pipelineStateMessage": null, "preconfigured": false, "scaleOuted": false }, "errorMessages": [] }
11.2.3. Get Pipeline Configuration
This function retrieves the detailed pipeline configuration. It can be used to clone pipeline configurations to other projects or instances.
11.2.3.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
pipelineName | path | string | The name of the text analysis pipeline. |
curl -X GET "http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5"
curl -X GET "http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5"
curl -X GET "http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5"
11.2.3.2. Response
{ "payload": { "schemaVersion": "1.2", "name": "MyPipeline", "description": "A very simple pipeline", "analysisEnginePoolSize": 1, "casPoolSize": 2, "fixedFlow": [ { "refs": "LanguageSetter" }, { "refs": "SentenceAndTokenAnnotator" } ], "collectionReader": null, "components": [ { "analysisEngines": [ { "name": "SentenceAndTokenAnnotator", "template": "de.averbis.textanalysis.components.jtokannotator.JTokAnnotator", "resourceRefs": [], "parameters": [ { "name": "genre", "value": "patent" }, { "name": "addParagraphs", "value": "false" } ] }, { "name": "LanguageSetter", "template": "de.averbis.textanalysis.components.languagesetter.LanguageSetter", "resourceRefs": [], "parameters": [ { "name": "overwriteExisting", "value": "false" }, { "name": "language", "value": "en" } ] } ], "aggregatedAnalysisEngines": [], "resources": [] } ] }, "errorMessages": [] }
11.2.4. Change Pipeline Configuration
This function can be used to change the configuration of a pipeline.
11.2.4.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
pipelineName | path | string | The name of the text analysis pipeline. |
pipelineConfigurationDto | body | string | A JSON object that encapsulates the pipeline configuration. |
11.2.4.1.1. Example pipelineConfigurationDto
{ "schemaVersion": "1.2", "name": "MyPipeline", "description": "A very simple pipeline", "analysisEnginePoolSize": 2, "casPoolSize": 4, "fixedFlow": [ { "refs": "LanguageSetter" }, { "refs": "SentenceAndTokenAnnotator" }, { "refs": "SnowballStemAnnotator" } ], "collectionReader": null, "components": [ { "analysisEngines": [ { "name": "SnowballStemAnnotator", "template": "de.averbis.textanalysis.components.snowballstemannotator.SnowballStemAnnotator", "resourceRefs": [], "parameters": [] }, { "name": "SentenceAndTokenAnnotator", "template": "de.averbis.textanalysis.components.jtokannotator.JTokAnnotator", "resourceRefs": [], "parameters": [ { "name": "genre", "value": "patent" }, { "name": "addParagraphs", "value": "false" } ] }, { "name": "LanguageSetter", "template": "de.averbis.textanalysis.components.languagesetter.LanguageSetter", "resourceRefs": [], "parameters": [ { "name": "overwriteExisting", "value": "false" }, { "name": "language", "value": "en" } ] } ], "aggregatedAnalysisEngines": [], "resources": [] } ] }
curl -X PUT "http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 2, \"casPoolSize\": 4, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" }, { \"refs\": \"SnowballStemAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SnowballStemAnnotator\", \"template\": \"de.averbis.textanalysis.components.snowballstemannotator.SnowballStemAnnotator\", \"resourceRefs\": [], \"parameters\": [] }, { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"overwriteExisting\", \"value\": \"false\" }, { \"name\": \"language\", \"value\": \"en\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"
curl -X PUT "http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 2, \"casPoolSize\": 4, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" }, { \"refs\": \"SnowballStemAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SnowballStemAnnotator\", \"template\": \"de.averbis.textanalysis.components.snowballstemannotator.SnowballStemAnnotator\", \"resourceRefs\": [], \"parameters\": [] }, { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"overwriteExisting\", \"value\": \"false\" }, { \"name\": \"language\", \"value\": \"en\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"
curl -X PUT "http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 2, \"casPoolSize\": 4, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" }, { \"refs\": \"SnowballStemAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SnowballStemAnnotator\", \"template\": \"de.averbis.textanalysis.components.snowballstemannotator.SnowballStemAnnotator\", \"resourceRefs\": [], \"parameters\": [] }, { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"overwriteExisting\", \"value\": \"false\" }, { \"name\": \"language\", \"value\": \"en\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"
11.2.4.2. Response
{ "payload": null, "errorMessages": [] }
11.2.5. Start Pipeline
The asynchronous start pipeline function allows to trigger a pipeline start. Starting a pipeline may take some time. The pipeline status can be queried using the get pipeline function.
PUT /v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}/start
11.2.5.1.
Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
pipelineName | path | string | The name of the text analysis pipeline. |
curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/start'
curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/start'
curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/start'
11.2.5.2. Response
{ "payload": null, "errorMessages": [] }
11.2.6. Stop Pipeline
The asynchronous stop pipeline function allows to trigger a pipeline shutdown. Stopping a pipeline may take some time. The pipeline status can be queried using the get pipeline function.
PUT /v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}/stop
11.2.6.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
pipelineName | path | string | The name of the text analysis pipeline. |
curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/stop'
curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/stop'
curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/stop'
11.2.6.2. Response
{ "payload": null, "errorMessages": [] }
11.2.7. Analyse Text
The analyse text function allows to analyse plain text with a pipeline.
POST /v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}/analyseText
11.2.7.1.
Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
pipelineName | path | string | The name of the text analysis pipeline. |
text | body | string | The text that will be analyzed. |
language | query | string | Optional parameter to specify the language of the text. Can be omitted if the pipeline has built-in language detection. |
annotationTypes | query | string | Optional parameter to specify what kind of annotations (like sentences, concepts, diagnoses) should be analyzed. Takes a comma separated list of annotation type class names as specified in the type system. Wildcards are supported. |
curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d 'Some sample text' 'http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseText?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'
curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d 'Some sample text' 'http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseText?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'
curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d 'Some sample text' 'http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseText?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'
11.2.7.2. Response
{ "payload": [ { "begin": 0, "end": 16, "type": "de.averbis.types.Sentence", "coveredText": "Some sample text", "id": 13 }, { "begin": 0, "end": 4, "type": "de.averbis.types.Token", "coveredText": "Some", "id": 19 }, { "begin": 5, "end": 11, "type": "de.averbis.types.Token", "coveredText": "sample", "id": 39 }, { "begin": 12, "end": 16, "type": "de.averbis.types.Token", "coveredText": "text", "id": 59 }, { "begin": 0, "end": 4, "type": "de.averbis.extraction.types.Token", "coveredText": "Some", "id": 19, "tokenClass": "FIRST_UPPER_CASE", "componentId": "JTokAnnotator", "normalized": "some", "confidence": 0.0, "lemma": null, "ignoreByConceptMapper": false, "isStopword": false, "segments": null, "concepts": null, "entities": null, "posTag": null, "isAbbreviation": false, "isInvariant": false, "diacriticsFreeVersions": null, "stem": { "begin": 0, "end": 4, "type": "de.averbis.extraction.types.Stem", "coveredText": "Some", "id": 79, "componentId": "SnowballStemAnnotator", "confidence": 0.0, "value": "Some" }, "abbreviations": null }, { "begin": 5, "end": 11, "type": "de.averbis.extraction.types.Token", "coveredText": "sample", "id": 39, "tokenClass": "ALL_LOWER_CASE", "componentId": "JTokAnnotator", "normalized": "sample", "confidence": 0.0, "lemma": null, "ignoreByConceptMapper": false, "isStopword": false, "segments": null, "concepts": null, "entities": null, "posTag": null, "isAbbreviation": false, "isInvariant": false, "diacriticsFreeVersions": null, "stem": { "begin": 5, "end": 11, "type": "de.averbis.extraction.types.Stem", "coveredText": "sample", "id": 86, "componentId": "SnowballStemAnnotator", "confidence": 0.0, "value": "sampl" }, "abbreviations": null }, { "begin": 12, "end": 16, "type": "de.averbis.extraction.types.Token", "coveredText": "text", "id": 59, "tokenClass": "ALL_LOWER_CASE", "componentId": "JTokAnnotator", "normalized": "text", "confidence": 0.0, "lemma": null, "ignoreByConceptMapper": false, "isStopword": false, "segments": null, "concepts": null, "entities": null, "posTag": null, "isAbbreviation": false, "isInvariant": false, "diacriticsFreeVersions": null, "stem": { "begin": 12, "end": 16, "type": "de.averbis.extraction.types.Stem", "coveredText": "text", "id": 93, "componentId": "SnowballStemAnnotator", "confidence": 0.0, "value": "text" }, "abbreviations": null } ], "errorMessages": [] }
11.2.8. Analyse HTML
The analyse html function allows to analyse HTML structured text with a pipeline.
POST /v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}/analyseHtml
11.2.8.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
pipelineName | path | string | The name of the text analysis pipeline. |
text | body | string | The HTML structured text that will be analyzed. |
language | query | string | Optional parameter to specify the language of the text. Can be omitted if the pipeline has built-in language detection. |
annotationTypes | query | string | Optional parameter to specify what kind of annotations (like sentences, concepts, diagnoses) should be analyzed. Takes a comma separated list of annotation type class names as specified in the type system. Wildcards are supported. |
curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d '<html><body>Some sample text</body></html>' 'http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseHtml?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'
curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d '<html><body>Some sample text</body></html>' 'http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseHtml?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'
curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d '<html><body>Some sample text</body></html>' 'http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseHtml?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'
11.2.8.2. Response
{ "payload": [ { "begin": 0, "end": 16, "type": "de.averbis.types.Sentence", "coveredText": "Some sample text", "id": 73 }, { "begin": 0, "end": 4, "type": "de.averbis.types.Token", "coveredText": "Some", "id": 79 }, { "begin": 5, "end": 11, "type": "de.averbis.types.Token", "coveredText": "sample", "id": 99 }, { "begin": 12, "end": 16, "type": "de.averbis.types.Token", "coveredText": "text", "id": 119 }, { "begin": 0, "end": 4, "type": "de.averbis.extraction.types.Token", "coveredText": "Some", "id": 79, "tokenClass": "FIRST_UPPER_CASE", "componentId": "JTokAnnotator", "normalized": "some", "confidence": 0.0, "lemma": null, "ignoreByConceptMapper": false, "isStopword": false, "segments": null, "concepts": null, "entities": null, "posTag": null, "isAbbreviation": false, "isInvariant": false, "diacriticsFreeVersions": null, "stem": { "begin": 0, "end": 4, "type": "de.averbis.extraction.types.Stem", "coveredText": "Some", "id": 139, "componentId": "SnowballStemAnnotator", "confidence": 0.0, "value": "Some" }, "abbreviations": null }, { "begin": 5, "end": 11, "type": "de.averbis.extraction.types.Token", "coveredText": "sample", "id": 99, "tokenClass": "ALL_LOWER_CASE", "componentId": "JTokAnnotator", "normalized": "sample", "confidence": 0.0, "lemma": null, "ignoreByConceptMapper": false, "isStopword": false, "segments": null, "concepts": null, "entities": null, "posTag": null, "isAbbreviation": false, "isInvariant": false, "diacriticsFreeVersions": null, "stem": { "begin": 5, "end": 11, "type": "de.averbis.extraction.types.Stem", "coveredText": "sample", "id": 146, "componentId": "SnowballStemAnnotator", "confidence": 0.0, "value": "sampl" }, "abbreviations": null }, { "begin": 12, "end": 16, "type": "de.averbis.extraction.types.Token", "coveredText": "text", "id": 119, "tokenClass": "ALL_LOWER_CASE", "componentId": "JTokAnnotator", "normalized": "text", "confidence": 0.0, "lemma": null, "ignoreByConceptMapper": false, "isStopword": false, "segments": null, "concepts": null, "entities": null, "posTag": null, "isAbbreviation": false, "isInvariant": false, "diacriticsFreeVersions": null, "stem": { "begin": 12, "end": 16, "type": "de.averbis.extraction.types.Stem", "coveredText": "text", "id": 153, "componentId": "SnowballStemAnnotator", "confidence": 0.0, "value": "text" }, "abbreviations": null } ], "errorMessages": [] }
11.2.9. Result Format (XML)
The answer of the web service is returned in XML format and contains the text analysis for the input data set. For more information about the data format, see chapter Available Text Mining Annotators & Web Service Specification.
11.3. Document Classification
11.3.1. Classify Document
The classify document function allows to automatically classify documents.
POST /classification/projects/{projectName}/classificationSets/{classificationSetName}/classifyDocument
11.3.1.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
projectName | path | string | The name of the project. |
classificationSetName | path | string | The name of the classification configuration. |
type | query | string | The document format type. Supported values are Solr XML Importer |
requestBody | body | string | The document content. |
Accept | header | string | Specifies the resonse format. Supported values are application/json and application/xml. |
curl -X POST --header 'Content-Type: text/plain' --header 'Accept: application/json' -d '<update><add><doc><field name="document_name">24552733</field><field name="title">Machine learning for automatic text classification</field><field name="content">Machine learning is a subset of artificial intelligence in the field of computer science.</field></doc></add></update>' 'http://localhost:8080/information-discovery/rest/classification/projects/MyProject/classificationSets/MyClassificationConfiguration/classifyDocument?type=Solr%20XML%20Importer'
curl -X POST --header 'Content-Type: text/plain' --header 'Accept: application/json' -d '<update><add><doc><field name="document_name">24552733</field><field name="title">Machine learning for automatic text classification</field><field name="content">Machine learning is a subset of artificial intelligence in the field of computer science.</field></doc></add></update>' 'http://localhost:8080/health-discovery/rest/classification/projects/MyProject/classificationSets/MyClassificationConfiguration/classifyDocument?type=Solr%20XML%20Importer'
curl -X POST --header 'Content-Type: text/plain' --header 'Accept: application/json' -d '<update><add><doc><field name="document_name">24552733</field><field name="title">Machine learning for automatic text classification</field><field name="content">Machine learning is a subset of artificial intelligence in the field of computer science.</field></doc></add></update>' 'http://localhost:8080/patent-monitor/rest/classification/projects/MyProject/classificationSets/MyClassificationConfiguration/classifyDocument?type=Solr%20XML%20Importer'
11.3.1.2. Response (JSON)
{ "classifications": [ { "documentIdentifier": "24552733", "success": true, "labels": [ { "confidence": 0.537, "name": "Irrelevant" } ] } ] }
11.3.1.3. Response (XML)
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <response> <classifications> <classification documentIdentifier="24552733" success="true"> <labels> <label confidence="0.537">Irrelevant</label> </labels> </classification> </classifications> </response>
11.4. Document Search
11.4.1. Select
The select function is used to search for documents. It supports Apache Solr query syntax. (Page 225).
GET /v1/search/projects/{projectName}/select
11.4.1.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
debugQuery | query | boolean | Request additional debugging information in the response. |
facet | query | boolean | If set to true, enables faceting. |
facet.field | query | string | Identifies a field to be treated as a facet. |
fl | query | string | Limits the information included in a query response to a specified list of fields. |
fq | query | string | Applies a filter query to the search results. |
projectName | path | string | The name of the project. |
q | query | string | Defines a query using standard query syntax. |
rows | query | integer | Controls how many rows of responses are displayed at a time. |
sort | query | string | Sorts the response to a query in either ascending or descending order based on the response’s score or another specified characteristic. |
start | query | integer | Specifies an offset (by default, 0) into the responses at which Solr should begin displaying content. |
curl -X GET "http://localhost:8080/information-discovery/rest/v1/search/projects/NewProject/select?fl=title&q=*&rows=3" -H "accept: */*" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"
curl -X GET "http://localhost:8080/health-discovery/rest/v1/search/projects/NewProject/select?fl=title&q=*&rows=3" -H "accept: */*" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"
curl -X GET "http://localhost:8080/patent-monitor/rest/v1/search/projects/NewProject/select?fl=title&q=*&rows=3" -H "accept: */*" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"
11.4.1.2. Response
{ "payload": { "solrResponse": { "responseHeader": { "status": 0, "QTime": 3 }, "response": { "numFound": 3000, "start": 0, "docs": [ { "title": "Impact of the reconstruction method on delayed gastric emptying after pylorus-preserving pancreaticoduodenectomy: a prospective randomized study." }, { "title": "Biomonitoring of cadmium, chromium, nickel and arsenic in general population living near mining and active industrial areas in Southern Tunisia." }, { "title": "Vegetation response to hydrologic and geomorphic factors in an arid region of the Baja California Peninsula." } ] }, "highlighting": { "Medline45b17f83-6241-4225-a49e-eba3deb9822d": {}, "Medlinec167e5b9-7495-4b0d-8436-0d9c33fed3c2": {}, "Medlined8553a84-4917-4d83-b3cf-f058ca4bad82": {} } }, "conceptMapping": {}, "entityMapping": {} }, "errorMessages": [] }
11.4.1.3. Examples
Select the ids all documents that contain an ICD-10 R53 diagnose:
curl -X GET "https://localhost:8080/health-discovery/rest/v1/search/projects/NewProject/select?fl=id&q=R53" -H "accept: */*" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"
11.5. User Management
11.5.1. Generate API Token
This function generates an API token for a given user.
POST /v1/users/{userName}/apitoken
11.5.1.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
userName | path | string | The name of the user |
webServiceLoginDto | body | string | A JSON object that encapsulates the user password and an optional LDAP / ActiveDirectory user source name |
11.5.1.1.1. Example webServiceLoginDto with LDAP user
{ "password": "mySecretPassword", "userSourceName": "CompanyLDAP" }
11.5.1.1.2. Example webServiceLoginDto with local user
{ "password": "mySecretPassword", "userSourceName": "" }
curl -X POST "http://localhost:8080/information-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X POST "http://localhost:8080/health-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
11.5.1.2. Response
{ "payload": "21234f9c4c4b8e1dd740168e2c5d84db8e9eaa2c3a7cbe61c0ff982aa0743040", "errorMessages": [] }
11.5.2. Regenerate API Token
This function replaces an existing API token with a new one.
PUT /v1/users/{userName}/apitoken
11.5.2.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
userName | path | string | The name of the user |
webServiceLoginDto | body | string | A JSON object that encapsulates the user password and an optional LDAP / ActiveDirectory user source name |
11.5.2.1.1. Example webServiceLoginDto with LDAP user
{ "password": "mySecretPassword", "userSourceName": "CompanyLDAP" }
11.5.2.1.2. Example webServiceLoginDto with local user
{ "password": "mySecretPassword", "userSourceName": "" }
curl -X PUT "http://localhost:8080/information-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X PUT "http://localhost:8080/health-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X PUT "http://localhost:8080/patent-monitor/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
11.5.2.2. Response
{ "payload": "41ecdf9be70ae524c2431f54140674fcf719213f50ca34d1ebbdf5b2437cfe59", "errorMessages": [] }
11.5.3. Invalidate API Token
This function revokes the API token of a given user.
DELETE /v1/users/{userName}/apitoken
11.5.3.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
userName | path | string | The name of the user |
webServiceLoginDto | body | string | A JSON object that encapsulates the user password and an optional LDAP / ActiveDirectory user source name |
Example webServiceLoginDto with LDAP user
{ "password": "mySecretPassword", "userSourceName": "CompanyLDAP" }
11.5.3.1.1. Example webServiceLoginDto with local user
{ "password": "mySecretPassword", "userSourceName": "" }
curl -X DELETE "http://localhost:8080/information-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X DELETE "http://localhost:8080/health-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X DELETE "http://localhost:8080/patent-monitor/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
11.5.3.2. Response
{ "payload": null, "errorMessages": [] }
11.5.4. Get API Token Status
This function returns the status of a users API token.
GET /v1/users/{userName}/apitoken/status
11.5.4.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
userName | path | string | The name of the user |
webServiceLoginDto | body | string | A JSON object that encapsulates the user password and an optional LDAP / ActiveDirectory user source name |
11.5.4.1.1. Example webServiceLoginDto with LDAP user
{ "password": "mySecretPassword", "userSourceName": "CompanyLDAP" }
11.5.4.1.2. Example webServiceLoginDto with local user
{ "password": "mySecretPassword", "userSourceName": "" }
curl -X POST "http://localhost:8080/information-discovery/rest/v1/users/admin/apitoken/status" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X POST "http://localhost:8080/health-discovery/rest/v1/users/admin/apitoken/status" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/users/admin/apitoken/status" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
11.5.4.2. Response
{ "payload": "EMPTY", "errorMessages": [] }
11.5.5. Change Password
This function is used to change a users password.
PUT /v1/users/{userName}/changeMyPassword
11.5.5.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
userName | path | string | The name of the user |
changeMyPasswordDto | body | string | A JSON object that encapsulates the users old password and the new password. |
11.5.5.1.1. Example changeMyPasswordDto
{ "oldPassword": "admin", "newPassword": "myN3wP4ssw0rd" }
curl -X PUT "http://localhost:8080/information-discovery/rest/v1/users/admin/changeMyPassword" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"oldPassword\": \"admin\", \"newPassword\": \"myN3wP4ssw0rd\"}"
curl -X PUT "http://localhost:8080/health-discovery/rest/v1/users/admin/changeMyPassword" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"oldPassword\": \"admin\", \"newPassword\": \"myN3wP4ssw0rd\"}"
curl -X PUT "http://localhost:8080/patent-monitor/rest/v1/users/admin/changeMyPassword" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"oldPassword\": \"admin\", \"newPassword\": \"myN3wP4ssw0rd\"}"
{
11.5.5.2. Response
{ "payload": null, "errorMessages" }
: [] }
11.6. Project Management
11.6.1. Create Project
This function is used to create new projects.
POST /v1/projects
11.6.1.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
description | query | string | The project description. |
name | query | string | The project name. |
curl -X POST "http://localhost:8080/information-discovery/rest/v1/projects?description=Some%20meaningful%20project%20description&name=NewProject" -H "accept: application/json;charset=UTF-8" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"
curl -X POST "http://localhost:8080/health-discovery/rest/v1/projects?description=Some%20meaningful%20project%20description&name=NewProject" -H "accept: application/json;charset=UTF-8" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/projects?description=Some%20meaningful%20project%20description&name=NewProject" -H "accept: application/json;charset=UTF-8" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"
11.6.1.2. Response
{ "payload": { "id": 1009, "name": "NewProject", "description": "Some meaningful project description" }, "errorMessages": [] }
11.7. Terminology Management
11.7.1. Get All Terminologies
This function is used to get all terminologies in a project.
GET /v1/terminology/projects/{projectName}/terminologies
11.7.1.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
curl -X GET "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X GET "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X GET "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
11.7.1.2. Response
{ "payload": [ { "terminologyName": "MyTerminology", "label": "My Terminology", "version": "1.0", "allowedLanguageCodes": [ "de", "en" ], "hierarchical": true, "conceptType": "de.averbis.extraction.types.Concept" } ], "errorMessages": [] }
11.7.2. Create Terminology
This function is used to create a new terminology.
POST /v1/terminology/projects/{projectName}/terminologies
11.7.2.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
webserviceTerminologyDto | body | string | Terminology properties in JSON |
11.7.2.2. Example webserviceTerminologyDto
{ "conceptType": "de.averbis.extraction.types.Concept", "hierarchical": true, "terminologyName": "MyTerminology", "label": "My Terminology", "version": "1.0", "allowedLanguageCodes": [ "en", "de" ] }
curl -X POST "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/json" -d "{ \"conceptType\": \"de.averbis.extraction.types.Concept\", \"hierarchical\": true, \"terminologyName\": \"MyTerminology\", \"label\": \"My Terminology\", \"version\": \"1.0\", \"allowedLanguageCodes\": [ \"en\",\t\"de\" ]}"
curl -X POST "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/json" -d "{ \"conceptType\": \"de.averbis.extraction.types.Concept\", \"hierarchical\": true, \"terminologyName\": \"MyTerminology\", \"label\": \"My Terminology\", \"version\": \"1.0\", \"allowedLanguageCodes\": [ \"en\",\t\"de\" ]}"
curl X POST "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/json" -d "{ \"conceptType\": \"de.averbis.extraction.types.Concept\", \"hierarchical\": true, \"terminologyName\": \"MyTerminology\", \"label\": \"My Terminology\", \"version\": \"1.0\", \"allowedLanguageCodes\": [ \"en\",\t\"de\" ]}"
11.7.2.3. Response
{ "payload": { "terminologyName": "MyTerminology", "label": "My Terminology", "version": "1.0", "allowedLanguageCodes": [ "en", "de" ], "hierarchical": true, "conceptType": "de.averbis.extraction.types.Concept" }, "errorMessages": [] }
11.7.3. Import Terminology
This function is used to import content into an existing terminology. Existing terminology content will be replaced.
POST /v1/terminology/projects/{projectName}/terminologies/{terminologyName}/terminologyImports
11.7.3.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
terminologyName | path | string | The name of the terminology |
requestBody | body | string | The terminology content. |
terminologyImportImporterName | query | string | The importer name. Currently only OBO Importer is supported. |
curl -X POST "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports?terminologyImportImporterName=OBO%20Importer" -H "accept: application/json;charset=UTF-8" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/octet-stream" -d "[Term]id: Aname: Vehiclessynonym: \"Vehicles\" EXACT PREF [][Term]id: Bname: Autosynonym: \"Auto\" EXACT PREF []synonym: \"Automobile\" EXACT []synonym: \"Car\" EXACT PREF []is_a: A ! Vehicles[Term]id: Cname: Boatsynonym: \"Boat\" EXACT PREF []is_a: A ! Vehicles[Term]id: Dname: Aircraftsynonym: \"Aircraft\" EXACT PREF []is_a: A ! Vehicles"
curl -X POST "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports?terminologyImportImporterName=OBO%20Importer" -H "accept: application/json;charset=UTF-8" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/octet-stream" -d "[Term]id: Aname: Vehiclessynonym: \"Vehicles\" EXACT PREF [][Term]id: Bname: Autosynonym: \"Auto\" EXACT PREF []synonym: \"Automobile\" EXACT []synonym: \"Car\" EXACT PREF []is_a: A ! Vehicles[Term]id: Cname: Boatsynonym: \"Boat\" EXACT PREF []is_a: A ! Vehicles[Term]id: Dname: Aircraftsynonym: \"Aircraft\" EXACT PREF []is_a: A ! Vehicles"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports?terminologyImportImporterName=OBO%20Importer" -H "accept: application/json;charset=UTF-8" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/octet-stream" -d "[Term]id: Aname: Vehiclessynonym: \"Vehicles\" EXACT PREF [][Term]id: Bname: Autosynonym: \"Auto\" EXACT PREF []synonym: \"Automobile\" EXACT []synonym: \"Car\" EXACT PREF []is_a: A ! Vehicles[Term]id: Cname: Boatsynonym: \"Boat\" EXACT PREF []is_a: A ! Vehicles[Term]id: Dname: Aircraftsynonym: \"Aircraft\" EXACT PREF []is_a: A ! Vehicles"
11.7.3.2. Response
{ "payload": null, "errorMessages": [] }
The terminology import is executed asynchronously and may take some time depending on the size of the terminology.
11.7.4. Retrieve Terminology Import Information
The status and progress of a terminology import can be retrieved with the following function:
GET /v1/terminology/projects/{projectName}/terminologies/{terminologyName}/terminologyImports
11.7.4.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
terminologyName | path | string | The name of the terminology |
curl -X GET "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X GET "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X GET "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
11.7.4.2. Response
{ "payload": { "id": 601, "terminologyId": 600, "state": "COMPLETED", "totalNumberOfConcepts": 4, "numberOfProcessedConcepts": 4, "numberOfSkippedConcepts": 0, "numberOfProcessedConceptsWithRelations": 4, "startDate": 1584718172887, "endDate": 1584718173218, "messageDtos": [] }, "errorMessages": [] }
11.7.5. Export Terminology
This function is used to export a terminology to be used in a text analysis pipeline.
POST /v1/terminology/projects/{projectName}/terminologies/{terminologyName}/terminologyExports
11.7.5.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
terminologyName | path | string | The name of the terminology |
terminologyExporterName | query | string | The exporter name. Currently only Concept Dictionary XML Exporter is supported. |
curl -X POST "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports?terminologyExporterName=Concept%20Dictionary%20XML%20Exporter" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X POST "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports?terminologyExporterName=Concept%20Dictionary%20XML%20Exporter" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports?terminologyExporterName=Concept%20Dictionary%20XML%20Exporter" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
11.7.5.2. Response
{ "payload": null, "errorMessages": [] }
The terminology export is executed asynchronously and may take some time depending on the size of the terminology.
11.7.6. Retrieve Terminology Export Information
The status and progress of a terminology export can be retrieved with the following function:
GET /v1/terminology/projects/{projectName}/terminologies/{terminologyName}/terminologyExports
11.7.6.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
terminologyName | path | string | The name of the terminology |
curl -X GET "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X GET "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X GET "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
11.7.6.2. Response
{ "payload": { "id": 620, "terminologyId": 600, "state": "COMPLETED", "totalNumberOfConcepts": 4, "numberOfProcessedConcepts": 4, "startDate": 1584719372703, "endDate": 1584719372921, "messageDtos": [], "exporterName": "Concept Dictionary XML Exporter", "stateMessage": "Submitted terminology to text analysis ( 4 / 4 )", "oboDownloadAvailable": false }, "errorMessages": [] }
11.7.7. Delete Terminology
This function is used to delete a terminology.
DELETE /v1/terminology/projects/{projectName}/terminologies/{terminologyName}
11.7.7.1. Request Parameters
Name | Parameter Type | Data Type | Description |
---|---|---|---|
api-token | header | string | The API token for your user. |
projectName | path | string | The name of the project. |
terminologyName | path | string | The name of the terminology |
curl -X DELETE "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X DELETE "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X DELETE "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
11.7.7.2. Response
{ "payload": null, "errorMessages": [] }
- No labels