Page tree
Skip to end of metadata
Go to start of metadata







Averbis Health Discovery: User Manual

Version 6.1.0, 06/08/2021

0. Changes in Health Discovery 6.0

Averbis Health Discovery GUI enables you to no longer be limited to using the REST API in your production scenario. You can now also import documents into Health Discovery and get the results easily via bulk export.

But the Health Discovery version 6 has much more to offer. You can look forward to:

  • a seamless integration of the REST API into Python
  • new enhancements and changes to the API format
  • Apache UIMA v3 upgrade
  • lots of new and improved annotators


For more information on " What is new and what has been improved in the Health Discovery 6.0?", please follow this link: Benefits of HD6.

To get a detailed overview of our REST API changes - What has changed in HD6?

1. Overview

Health Discovery is a text mining and machine learning platform for analyzing large amounts of patient data. With Health Discovery, medical documents can be analyzed and searched for diagnoses, symptoms, prescriptions, special findings, and other criteria. Heterogeneous patient data in both structured and unstructured forms can be harmonized and analyzed by text mining, and can be accessed and searched via a unified interface.

Health Discovery has a modular structure. The various functionalities are roughly divided into the following categories:

  • General: There are some general modules in which projects and users with corresponding rights and roles can be created.

  • Sources: There are several ways to invite documents to Health Discovery. Documents can be imported from your own client or from any server from file or database.

  • Terminology: Health Discovery allows you to create your own terminology or import terminologies. These can be integrated into text mining, and terms of these terminologies can be found in texts.

  • Text Analysis: This category contains various modules for configuring text mining pipelines, starting text mining processes, and viewing text mining results. Different text mining pipelines can also be compared with each other.

  • Search: Health Discovery contains a semantic full-text search that can be configured and used in the various modules of this category.

  • Classification: Health Discovery contains a machine learning-based classification module. Users can sort documents manually or automatically into different categories. An intuitive interface enables the training and evaluation of machine learning models.

The user manual is intended to give you a quick introduction to Health Discovery with the "Getting started" section. Then the text mining components and pipelines that are included in Health Discovery are described in detail.

2. Getting started

2.1. Login to Health Discovery and create a project

Step 1: Enter the URL of Health Discovery in a web browser and login with your user name and password. If you don’t know the URL or your credentials, contact your system administrator.

Step 2: To create a new project go to the "Project Administration" section (1) and click the button "Create Project" (2).


Figure ?: Go to "Project Administration"


Figure ?: Creation of a new project

Step 3: Enter a project name into the field "Name" (3) of the dialog "Create Project" and click "Save" (4).


Figure ?: Saving a new project

After the project was created successfully, you can open the project by clicking on the newly added name in the list of the "Project Administration" section.

2.2. Import documents

Step 1: On page "Home", select "Project Administration" and click on a project name. You are redirected to the "Project Overview" page of this project.


Figure ?: Project overview of Health Discovery.

Step 2: On the "Project Overview" page, choose module "Import Documents", click on "New Import", give your import batch a name, select the Importer Type "Text Importer" and the documents to be imported. You can import a single file or a zip container with multiple files.

Make sure that the zip container doesn’t contain (hidden) subfolders and that the files have the correct file extension.

Step 3: By clicking on "Import", the document import starts. You can click on the "Refresh" button to the right of your document import to see the progress.


Figure ?: Import Documents in Health Discovery.

You can reach the "Project Overview" page at any time via the breadcrumb navigation in the upper left by clicking on "default".

2.3. Run a text mining process

Health Discovery typically contains predefined pipelines that are already available when the application starts. Therefore, you can start text mining processing immediately after importing the first documents. This goes as follows:

Step 1: On the "Project Overview" page, select "Pipeline Configuration" and start a text mining pipeline, e.g "discharge"

Starting the pipeline may take a few minutes, as a lot of information is loaded into the main memory.

Step 2: Switch back to "Project Overview" and select "Processes".

Step 3: Click on "New Text Analysis"

Step 4: Give your text mining process a name, select the document source and the text mining pipeline, and click Ok.

Step 5: The text analysis starts now. By clicking on the browser refresh you can monitor the progress of the text analysis.


Figure ?: Start a text mining process in Health Discovery.

2.4. View text mining results

As soon as a text mining process has the state "idle", you can see the results in the Annotation Editor by clicking on the process name.


Figure ?: Jumping to text analytic results

The Annotation Editor shows the results of your text analytic process.


Figure ?: View the text mining results in the Annotation Editor.

2.5. Configure your own pipeline

If you want to build your own pipeline from existing text mining components, proceed as follows:

Step 1: In "Project Overview", click on "Pipeline Configuration".

Step 2: Click on "Create Pipeline".

Step 3: Give your pipeline a name, optionally a description and click on "Ok".

Step 4: Click the pen icon ("Edit Pipeline") to the right of your pipeline.

Step 5: Select the desired components from the components on the right by clicking on the corresponding left arrow. For more information about the available components and which upstream components they require, see Available Text Mining Annotators & Web Service Specification


Figure ?: Configure your own pipeline by moving the components from right to left.

2.6. Create your first terminology

If you want to create your own terminology, proceed as follows:

Step 1: In "Project Administration", select "Terminology Administration".

Step 2: Click on "Create Terminology".

Step 3: Assign a "Terminology ID", a "Label", a "Version". Choose whether the terminology should have a hierarchy or not. Leave the "Concept type" on "de.averbis.extraction.types.Concept" and the "Encrypted export" on disabled. Select the language(s) in which the terminology is to be created. Then click on "Ok".


Figure ?: Create your first terminology.

Step 4: Switch to the "Terminology Editor" by going to the "Project Overview" page and clicking on "Terminology Editor".

Step 5: Click on the "plus" to the right of your terminology to create the first concept.

Step 6: Enter a "Concept ID", a "Preferred Term" and optionally a "Comment" and click "Ok".


Figure ?: Create the first concept.

Step 7: By clicking on the button "Add Terms" you can add more synonyms to the concept. If you want to add more than one synonym, click on "add another term". Once all synonyms have been inserted, click on "Ok".

Step 8: By clicking on the "plus" to the right of your newly created concept you can create further sub-concepts.

2.7. Download your terminology

Step 1: Go to "Terminology Administration" module.

Step 2: Choose your terminology and click the icon "Preparing OBO download".


Figure ?: Prepare your terminology for download

The preparation time depends on the size of your terminology. Once the download is ready, a notification appears in the bell symbol in the upper menu bar.


Figure ?: Notification when download preparation is finished

Step 3: Refresh the page using the refresh-button. Now the button for downloading the terminology is activated.


Figure ?: Refresh the page

Step 4: Click on the download icon and save the the OBO file.


Figure ?: Download the obo-file


Figure ?: Save the Obo-file

2.8. Integrate own terminologies into a text mining pipeline

You can import your own terminologies to Health Discovery. Optionally, a mapping mode for each synonym can be imported, too. To import terminologies, you must convert them to the OBO file format. The minimal structure of your OBO terminology looks like the example below.


Figure ?: Example of OBO-file structure
synonymtypedef: DEFAULT_MODE "Default Mapping Mode" 
synonymtypedef: EXACT_MODE "Exact Mapping Mode"      
synonymtypedef: IGNORE_MODE "Ignore Mapping Mode"   

[Term]
id: 1
name: First Concept
synonym: "First Concept" DEFAULT_MODE []
synonym: "First Synonym" IGNORE_MODE [] 
synonym: "Second Synonym" EXACT_MODE [] 
 
[Term] id: 2 name: First Child is_a: 1 ! First Concept

To import terms with mapping modes, the OBO terminology begins with the synonym type definitions ("synonymtypedef"), as shown in the first three lines of the OBO terminology in the example above. The "synonymtypedef" are optional and need only to be applied when using mapping modes. Each concept begins with the flag "[TERM]", followed by an "id" and a preferred name with the flag "name". After that you can add as many synonyms as you like with the flag "synonym", followed by the desired mapping mode (optionally). Note: if you would like to define a mapping mode for your concept name, you have to add the term as synonym, as shown in the example for "First Concept". Furthermore, if your terminology contains a hierarchy, you can use "is_a" to refer to other concepts of your terminology.

2.8.1. Import a terminology

To import a terminology like the one shown above, proceed as follows:

Step 1: In "Project Overview", click on "Terminology Administration".

Step 2: Click on "Create New Terminology". Fill in the dialog as described in Create your first terminology.

Step 3: Once you have created a terminology, click the up arrow icon to the right of the terminology.

Step 4: In the "Import Terminology" dialog, select "OBO Importer" as import format. Then select the terminology you want to import from the file system. Click on "Import".


Figure ?: Import your own terminologies into Health Discovery.


Step 5
: By clicking on the "Refresh" button to the right of the terminology you can check the progress of the import. When the terminology has been fully imported, the state changes to "Terminology imported".

Step 6: To browse your terminology, switch to the "Terminology Editor" by going to the "Project Overview" page and clicking on "Terminology Editor".

After successful terminology import, terms, hierarchies and mapping modes can be checked in the Terminology Editor.


Figure ?: Terminology Editor showing imported terminology

2.9. Use the Web Service

All text mining pipelines configured and started in Health Discovery can also be accessed via web service. To do this, proceed as follows:

Step 1: Add the suffix "/rest/swagger-ui.html" to the URL of Health Discovery (e.g. https://<YOURURL>/health-discovery/swagger-ui.html)

Step 2: In the green upper menu bar, select the spec "REST API v1"

Step 3: Click on "Text Analysis" and then on "/ /rest/v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}/analyseText ".


Figure ?: Use Swagger UI to access our RESTful Web Service.

Step 4: Click on the button on the upper right "Try it out"

Step 5: Add API token as a string to "api-token" (1) (for more details of generating an API token follow this Link REST API Overview), "discharge" to pipelineName (2) (or the name of another started pipeline) and "default" to "projectName" (3).

Step 6: Add any text in the field "text".

Step 7: The field "language" can be left blank for the "discharge" pipeline, as the pipeline automatically recognizes the language.

Step 8: Click on the blue botton "Execute" on the buttom left (4).

Step 9: You receive the response in the response body section.


Figure ?: Use Swagger UI to access our RESTful Web Service.

3. Available Text Mining Annotators & Web Service Specification

Health Discovery contains a number of pipelines and text mining components. These can be configured in the "Pipeline Configuration" module. The individual components are described below. In addition to a short description of the component, it specifies which types the components require as input and which type they generate. A web service example of an annotation of the corresponding type is also given.

3.1. BiologicallyDerivedProducts

3.1.1. Description

A biologically derived product is a material substance originating from a biological entity intended to be transplanted or infused into another biological entity. Examples for a biologically derived product include hematopoietic stem cells such as bone marrow, peripheral blood, or cord blood extraction. This annotator extracts the information about the type of the transplanted biological product, the amount of transplanted cells and the date in the context of allogeneic transplantations.

Currently, the annotation is limited to the extraction of the biological product of CD34-positive stem cells.

3.1.2. Input

3.1.3. Output

Annotation Type: de.averbis.types.health.BiologicallyDerivedProduct


Table ?: Features BiologicallyDerivedProduct
AttributeDescriptionType

quantity

The volume of the product which was transplanted.

Double

time

Temporal information (date or date interval) about the transplantation.

Please see types: Date, DateInterval

Date or DateInterval

matchedTerm Matching synonym of the biologically derived product concept.String
dictCanon Preferred term of the biologically derived product concept.String
conceptID The ID of the concept.String

source

The name of the terminology source.String
uniqueID Unique identifier of the concept of the format 'terminologyId:conceptID'.String
negatedBy Specifies the negation word, if one exists.String

3.1.4. Terminology Binding

NameLanguagesVersionIdentifierComment

Averbis Lab Terminology

EN, DE

2.0

Averbis-Lab-Terminology_2.0

Laboratory and vital signs parameters, ID based on LOINC codes (LOINC parts) composed by Averbis.

3.1.5. Web Service Example

Text Example: "On 11/11/2008 transfusion of 4.5x 106 CD34-positive cells/kg"

{
      "begin": 29,
      "end": 42,
      "type": "de.averbis.types.health.BiologicallyDerivedProduct",
      "coveredText": "4.5x 106 CD34",
      "id": 1839,
      "negatedBy": null,
      "quantity": 4500000,
      "matchedTerm": "CD34+",
      "dictCanon": "CD34+",
      "conceptID": "78002-3",
      "source": "Averbis-Lab-Terminology_2.0",
      "time": {
        "begin": 3,
        "end": 13,
        "type": "de.averbis.types.health.Date",
        "coveredText": "11/11/2008",
        "id": 1840,
        "kind": "DATE",
        "value": "2008-11-11"
      },
      "uniqueID": "Averbis-Lab-Terminology_2.0:78002-3
}


3.2. Chimerism

3.2.1. Description

This component annotates information about chimerism. In the field of transplantation medicine, a chimerism analysis is performed after stem cell or bone marrow transplantation to determine whether the recipient’s hematopoietic system is only derived from the donor or not. The chimerism is called "complete" if more than 95% of the tested hematopoietic cells originate from the donor, otherwise the chimerism is called "mixed".

3.2.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.2.3. Output

Annotation Type: de.averbis.types.health.Chimerism


Table ?: Chimerism Features
AttributeDescriptionType

kind

The kind of the actual chimerism.

Possible values (default is underlined): null | COMPLETE | MIXED

String

value

Numeric value of chimerism.

Double

date

Date of chimerism analysis.

Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2009-02-17)

String

3.2.4. Web Service Example

Text Example: "Chimärismusanalyse vom 17.11.2008: Nachweis von 85,2 % Donorzellen."

    {
      "begin": 48,
      "end": 66,
      "type": "de.averbis.types.health.Chimerism",
      "coveredText": "85,2 % Donorzellen",
      "id": 1469,
      "date": null,
      "kind": "MIXED",
      "value": 85.2
    }

3.3. Clinical Section Keyword 

3.3.1. Description

Clinical Section Keyword is a keyword from which an eventual Clinical Section is derived, e.g. "Patient History" is a keyword for Anamnesis Section. Not every Clinical Section Keyword leads to a Clinical Section.

3.3.2. Input

Above this annotator, the following annotators must be included in the pipeline

3.3.3. Output

Annotation Type: de.averbis.types.health.ClinicalSectionKeyword


Table ?: Clinical Section Features
AttributeDescriptionType

dictCanon

Preferred term of the section concept.

String

uniqueID

Unique identifier of the section concept of the format 'terminologyId:conceptID'.

String

conceptID

The ID of the concept.

String

source

The name of the terminology source.

String

matchedTerm

Matching synonym of the section concept.

String

negatedBy Specifies the negation word, if one exists.String

3.3.4. Terminology Binding


Table ?: Terminology Bindings
NameLanguagesVersionIdentifierComment

clinical-Sections

EN, DE

1.0

clinical_sections_de, clinical_sections_en

Types of clinical sections, ID predominantly based on LOINC codes composed and enriched with synonyms by Averbis.

3.3.5. Web Service Example

Text Example: "Medication Citation|Active|CM| TraMADol HCl - 50 MG Oral Tablet;TAKE 1 TABLET 3 TIMES DAILY.; RPT~Tylenol Arthritis Ext Relief 650 MG TBCR;TAKE 1 TABLET 3-4 TIMES DAILY.; RPT~CeleBREX 200 MG Oral Capsule;TAKE 1 CAPSULE DAILY.; RPT~Folbic TABS;; RPT~Folic Acid 1 MG Oral Tablet;TAKE 1 TABLET DAILY.; RPT~PredniSONE 10 MG Oral Tablet;TAKE 1 TABLET AS NEEDED.; RPT~Cholestyramine 4 GM Oral Packet;MIX THE CONTENTS OF 1 POWDER PACKET WITH 2 TO 6 OZ OF NONCARBONATED BEVERAGE AND DRINK 3 TIMES DAILY.; RPT~Methotrexate 2.5 MG Oral Tablet;TAKE 1 TABLET WEEKLY.; RPT~Citracal Plus Oral Tablet;TAKE 2 TABLET DAILY; RPT~Multi Vitamin Daily TABS;TAKE 1 TABLET DAILY.; RPT~Miscellaneous Medication;Schiff "Move Free". 400 MG taken once daily; RPT"


Table ?: Clinical Section Keyword Features
    {
      "begin": 0,
      "end": 10,
      "type": "de.averbis.types.health.ClinicalSectionKeyword",
      "coveredText": "Medication",
      "id": 16279,
      "negatedBy": null,
      "matchedTerm": "Medication",
      "dictCanon": "Medication",
      "conceptID": "29549-3",
      "source": "clinical_sections_en",
      "uniqueID": "clinical_sections_en:29549-3"
    },
    {
      "begin": 676,
      "end": 686,
      "type": "de.averbis.types.health.ClinicalSectionKeyword",
      "coveredText": "Medication",
      "id": 16280,
      "negatedBy": null,
      "matchedTerm": "Medication",
      "dictCanon": "Medication",
      "conceptID": "29549-3",
      "source": "clinical_sections_en",
      "uniqueID": "clinical_sections_en:29549-3"
    }


3.4. Clinical Section 

3.4.1. Description

This component detects sections in medical documents. These sections can refer to diagnoses, medications, therapies, etc.

3.4.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.4.3. Output

Annotation Type: de.averbis.types.health.ClinicalSection


Table ?: Clinical Section Features
AttributeDescriptionType

keyword

Clinical Section Keyword is a keyword from which an eventual Clinical Section is derived.

ClinicalSectionKeyword

label The label of the section, e.g. "LaboratorySection", "MedicationSection", "AnamnesisSection"String

3.4.4. Terminology Binding


Table ?: Terminology Bindings
NameLanguagesVersionIdentifierComment

clinical-Sections

EN, DE

1.0

clinical_sections_de, clinical_sections_en

Types of clinical sections, ID predominantly based on LOINC codes composed and enriched with synonyms by Averbis.

3.4.5. Web Service Example

Text Example: "Medication Citation|Active|CM| TraMADol HCl - 50 MG Oral Tablet;TAKE 1 TABLET 3 TIMES DAILY.; RPT~Tylenol Arthritis Ext Relief 650 MG TBCR;TAKE 1 TABLET 3-4 TIMES DAILY.; RPT~CeleBREX 200 MG Oral Capsule;TAKE 1 CAPSULE DAILY.; RPT~Folbic TABS;; RPT~Folic Acid 1 MG Oral Tablet;TAKE 1 TABLET DAILY.; RPT~PredniSONE 10 MG Oral Tablet;TAKE 1 TABLET AS NEEDED.; RPT~Cholestyramine 4 GM Oral Packet;MIX THE CONTENTS OF 1 POWDER PACKET WITH 2 TO 6 OZ OF NONCARBONATED BEVERAGE AND DRINK 3 TIMES DAILY.; RPT~Methotrexate 2.5 MG Oral Tablet;TAKE 1 TABLET WEEKLY.; RPT~Citracal Plus Oral Tablet;TAKE 2 TABLET DAILY; RPT~Multi Vitamin Daily TABS;TAKE 1 TABLET DAILY.; RPT~Miscellaneous Medication;Schiff "Move Free". 400 MG taken once daily; RPT"

 {
      "begin": 0,
      "end": 734,
      "type": "de.averbis.types.health.ClinicalSection",
      "coveredText": "Medication Citation|Active|CM\nTraMADol HCl - 50 MG Oral Tablet;TAKE 1 TABLET 3 TIMES DAILY.; RPT~Tylenol Arthritis Ext Relief 650 MG TBCR;TAKE 1 TABLET 3-4 TIMES DAILY.; RPT~CeleBREX 200 MG Oral Capsule;TAKE 1 CAPSULE DAILY.; RPT~Folbic TABS;; RPT~Folic Acid 1 MG Oral Tablet;TAKE 1 TABLET DAILY.; RPT~PredniSONE 10 MG Oral Tablet;TAKE 1 TABLET AS NEEDED.; RPT~Cholestyramine 4 GM Oral Packet;MIX THE CONTENTS OF 1 POWDER PACKET WITH 2 TO 6 OZ OF NONCARBONATED BEVERAGE AND DRINK 3 TIMES DAILY.; RPT~Methotrexate 2.5 MG Oral Tablet;TAKE 1 TABLET WEEKLY.; RPT~Citracal Plus Oral Tablet;TAKE 2 TABLET DAILY; RPT~Multi Vitamin Daily TABS;TAKE 1 TABLET DAILY.; RPT~Miscellaneous Medication;Schiff \"Move Free\". 400 MG taken once daily; RPT",
      "id": 16310,
      "label": "Medication",
      "keyword": {
        "begin": 0,
        "end": 10,
        "type": "de.averbis.types.health.ClinicalSectionKeyword",
        "coveredText": "Medication",
        "id": 16311,
        "negatedBy": null,
        "matchedTerm": "Medication",
        "dictCanon": "Medication",
        "conceptID": "29549-3",
        "source": "clinical_sections_en",
        "uniqueID": "clinical_sections_en:29549-3"
 }


3.5. Diagnoses

3.5.1. Description

This component detects a condition, problem, diagnosis, or other event, situation, issue, or clinical concept that has risen to a level of concern. Optional: As an additional annotation to the diagnoses component  DiagnosisCandidate can be visualized too. This component can be optionally activated which specifically detect diagnosis candidates to optimize DRG coding.

3.5.2. Input

Above this annotator, the following annotators must be included in the pipeline:

To get the full functionality, the following annotators should also be included below this annotator in the given order:

3.5.3. Output

Annotation Type: de.averbis.types.health.Diagnosis


Table ?: Features
AttributeDescriptionType

dictCanon

Preferred term of the condition.

String

matchedTerm

The matching synonym of the Diagnosis.

String

uniqueID

Unique identifier of a concept of the format 'terminologyId:conceptID'.

String

conceptID

The ID of the concept.

String

source

The name of the terminology source.

String

approach

Information about the text mining approach used to generate the annotation.

Possible values: DictionaryLookup | SimilarityMatching | DocumentClassification | DerivedByLabValue

String
confidence

For approaches using machine learning (e.g. "DocumentClassification"), the confidence is calculated that the respective annotation has been correctly generated.

Possible value range: 0-1

Note: Annotations generated with non-machine learning approaches such as terminology mappings (approach = "DictionaryLookup") are reflected with a confidence value of 0.

Double
onsetDate

The onset date of the diagnosis, if given in the text.

Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17)

Please note: The onsetDate is only annotated if the pear component "Disease Onset Date" is integrated in the text analysis pipeline used. The preconfigured pipelines do not contain this component, thus the value of the onset feature is represented as null.

String

negatedBy

Specifies the negation word, if one exists.

String

verificationStatus

Verification status of the actual diagnosis.

Possible values (default is underlined): null | NEGATED | ASSURED | SUSPECTED | DIFFERENTIAL

String

clinicalStatus

Clinical status of the actual diagnosis.

Possible values (default is underlined): null | ACTIVE | RESOLVED

String

kind

The kind of the diagnosis.

Possible values (default is underlined): null | main | secondary

String

side

The laterality of the diagnosis.

Possible values (default is underlined): null | RIGHT | LEFT | BOTH

String

laterality

The laterality of the diagnosis.

Possible values (default is underlined): null | RIGHT | LEFT | BOTH

WARNING: This feature is deprecated and will be removed in V6 of Health Discovery. It will be replaced by the equivalent attribute 'side'.

String

belongsTo

Indicates, whether the diagnosis belongs to a donor or recipient (e.g. in case of transplantations) or to a family member.

Possible values (default is underlined): null | FAMILY | OTHER

String


Annotation Type (optional): de.averbis.types.health.DiagnosisCandidate


Table ?: Features
AttributeDescriptionType

dictCanon

Preferred term of the condition.

String

conceptID

The ID of the concept.

String

approach

Information about the text mining approach used to generate the annotation.

Possible values: DictionaryLookup | SimilarityMatching | DocumentClassification | DerivedByLabValue

String
confidence

For approaches using machine learning (e.g. "DocumentClassification"), the confidence is calculated that the respective annotation has been correctly generated.

Possible value range: 0-1

Note: Annotations generated with non-machine learning approaches such as terminology mappings (approach = "DictionaryLookup") are reflected with a confidence value of 0.

Double

verificationStatus

Verification status of the actual diagnosis.

Possible values (default is underlined): null | NEGATED | ASSURED | SUSPECTED | DIFFERENTIAL

String

clinicalStatus

Clinical status of the actual diagnosis.

Possible values (default is underlined): null | ACTIVE | RESOLVED

String

belongsTo

Indicates, whether the diagnosis belongs to a donor or recipient (e.g. in case of transplantations) or to a family member.

Possible values (default is underlined): null | FAMILY | OTHER

String

3.5.4. Terminology Binding


Table ?: Terminology Bindings
CountryNameVersionIdentifierComment

United States

ICD-10-CM

2021

ICD10CM_2021

International Classification of Diseases,  10th Edition,  Clinical Modification, 2021, enriched with synonyms from SNOMED CT and by Averbis.

Germany

ICD-10-GM

2021

ICD10GM_2021

International Classification of Diseases,  1 0th Edition, German Modification, 2021, enriched with synonyms by Averbis.

3.5.5. Web Service Example

Text Example for Diagnosis: "suspected history of appendicitis"

    {
      "begin": 10,
      "end": 33,
      "type": "de.averbis.types.health.Diagnosis",
      "coveredText": "history of appendicitis",
      "id": 788,
      "negatedBy": null,
      "side": null,
      "matchedTerm": "History of appendicitis",
      "verificationStatus": "SUSPECTED",
      "kind": null,
      "confidence": 0,
      "onsetDate": null,
      "source": "ICD10CM_2021",
      "clinicalStatus": "RESOLVED",
      "approach": "DictionaryLookup",
      "laterality": null,
      "dictCanon": "Personal history of other diseases of the digestive system",
      "conceptID": "Z87.19",
      "belongsTo": null,
      "uniqueID": "ICD10CM_2021:Z87.19"
    }


Text Example for DiagnosisCandidate: "suspected history of appendicitis"

    {
      "begin": 10,
      "end": 33,
      "type": "de.averbis.types.health.DiagnosisCandidate",
      "coveredText": "history of appendicitis",
      "id": 788,
      "verificationStatus": "SUSPECTED",
      "confidence": 0,
      "clinicalStatus": "RESOLVED",
      "approach": "DictionaryLookup",
      "dictCanon": "Personal history of other diseases of the digestive system",
      "conceptID": "Z87.19",
      "belongsTo": null,
    }


3.6. Diagnosis Status

3.6.1. Description

The annotator recognizes the status of diagnoses. Different status includes, for example, "suspected" or "history of".

3.6.2. Input

Above this annotator, the following annotator must be included in the pipeline:

3.6.3. Output

This annotator sets the features belongsTo, verificationStatus and clinicalStatus in annotations of type Diagnosis and changes conceptID and uniqueID if the diagnosis does not belong to the patient but e.g. to a family member.

3.6.4. Web Service Example

Text Example 1 (ClinicalStatus): "history of appendicitis"

    {
      "begin": 0,
      "end": 23,
      "type": "de.averbis.types.health.Diagnosis",
      "coveredText": "history of appendicitis",
      "id": 750,
      "negatedBy": null,
      "side": null,
      "matchedTerm": "History of appendicitis",
      "verificationStatus": null,
      "kind": null,
      "confidence": 0,
      "onsetDate": null,
      "source": "ICD10CM_2021",
      "clinicalStatus": "RESOLVED",
      "approach": "DictionaryLookup",
      "laterality": null,
      "dictCanon": "Personal history of other diseases of the digestive system",
      "conceptID": "Z87.19",
      "belongsTo": null,
      "uniqueID": "ICD10CM_2021:Z87.19"
    }


Text Example 2 (FamilyDiagnosis): "father has diabetes mellitus"

    {
      "begin": 11,
      "end": 28,
      "type": "de.averbis.types.health.Diagnosis",
      "coveredText": "diabetes mellitus",
      "id": 820,
      "negatedBy": null,
      "side": null,
      "matchedTerm": "Diabetes mellitus",
      "verificationStatus": null,
      "kind": null,
      "confidence": 0,
      "onsetDate": null,
      "source": "ICD10CM_2021",
      "clinicalStatus": null,
      "approach": "DictionaryLookup",
      "laterality": null,
      "dictCanon": "Family history of diabetes mellitus",
      "conceptID": "Z83.3",
      "belongsTo": "FAMILY",
      "uniqueID": "ICD10CM_2021:Z83.3"
    }


3.7. Disambiguation

3.7.1. Description

In case of ambiguous annotations this component decides which annotations should be valid in the given context, e.g. within a list of laboratory values the parameter 'Calcium' represents a laboratory parameter and not an ingredient.

3.7.2. Input

This component requires annotations of at least one of the following types:

3.7.3. Output

Only the annotation which is evaluated as valid is maintained the other(s) are discarded.

3.7.4. Web Service Example

There is no special web service return for Disambiguation.


3.8. Enumerations

3.8.1. Description

This component detects enumerations. The enumerations are recognized based on atomic text units (e.g. chunks) and conjunctions (e.g. the word "and").

3.8.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.8.3. Output

This component sets the following internal type that is not visible in the annotation editor:

Annotation Type*: de.averbis.types.Enumeration

3.8.4. Web Service Example

*The enumeration itself is not returned in the web service. However, the following example shows that both diagnoses are assigned the status "SUSPECTED".

Text Example: "suspicion of bronchitis or asthma bronchiale"

    {
      "begin": 13,
      "end": 23,
      "type": "de.averbis.types.health.Diagnosis",
      "coveredText": "bronchitis",
      "id": 1137,
      "negatedBy": null,
      "side": null,
      "matchedTerm": "Bronchitis",
      "verificationStatus": "SUSPECTED",
      "kind": null,
      "confidence": 0,
      "onsetDate": null,
      "source": "ICD10CM_2021",
      "clinicalStatus": null,
      "approach": "DictionaryLookup",
      "laterality": null,
      "dictCanon": "Bronchitis, not specified as acute or chronic",
      "conceptID": "J40",
      "belongsTo": null,
      "uniqueID": "ICD10CM_2021:J40"
    },
    {
      "begin": 27,
      "end": 44,
      "type": "de.averbis.types.health.Diagnosis",
      "coveredText": "asthma bronchiale",
      "id": 1138,
      "negatedBy": null,
      "side": null,
      "matchedTerm": "Bronchial asthma",
      "verificationStatus": "SUSPECTED",
      "kind": null,
      "confidence": 0,
      "onsetDate": null,
      "source": "ICD10CM_2021",
      "clinicalStatus": null,
      "approach": "DictionaryLookup",
      "laterality": null,
      "dictCanon": "Unspecified asthma, uncomplicated",
      "conceptID": "J45.909",
      "belongsTo": null,
      "uniqueID": "ICD10CM_2021:J45.909"
    }


3.9. GenericTerminologyAnnotator

3.9.1. Description

The generic Terminology Annotator recognizes terms from terminologies created in Health Discovery’s Terminology Editor module.

3.9.2. Input

Above this annotator, the following annotator must be included in the pipeline:

3.9.3. Output

The component creates annotations of type:

Annotation Type: de.averbis.extraction.types.Concept


Table ?: Features
AttributeDescriptionType

dictCanon

Preferred term of the concept.

String

uniqueID

Unique identifier of a concept of the format 'terminologyId:conceptID'.

String

conceptID

The concept id.

String

source

The name of the terminology source.

String

matchedTerm

The matching synonym of the terminology source.

String

negatedBy Specifies the negation word, if one exists.String

The exact type depends on the terminology files used and the concept types specified in them.

3.9.4. Configuration

The GenericTerminologyAnnotator has various parameters to annotate texts to terms from the terminologies defined and maintained in the terminology modules of Health Discovery. The various parameters are listed in table below.


Table ?: Configuration
NameDescriptionTypeMultiValuedMandatory

terminologyNames

Names of the source terminologies.

String

true

false

useStemLookup

Apply lookup based on stems.

Boolean

false

true

useSegmentLookup

Apply lookup based on segments.

Boolean

false

true

3.9.4.1. Configuration Example 1: useStemLookup and useSegmentLookup inactived.

Let us first consider the case where the parameters useSegmentLookup and useStemLookup are disabled. In these cases, a mapping still takes place, namely a basic, simple mapping. In this case, all terms from the terminology are mapped as follows:

  • Mapping Modus: Simple
  • Case Sensitivity: Upper and lower case, some punctuation and the occurrence of stop words (e.g. 'of', 'the', 'a', ',') are ignored.
  • Word Order: The word order in text and terminology is not important for a match.

Example: the term "Appendix Inflammation" is mapped to the text snippet "inflammation of the appendix".

3.9.4.2. Configuration Example 2: useStemLookup activated

Now we activate the mode "useStemLookup". This will now apply a stemming to the mapping, which reduces inflected (or sometimes derived) words to their word stems, base or root forms:

  • Mapping Modus: Stemming
  • Case Sensitivity: Upper and lower case, some punctuation and the occurrence of stop words (e.g. 'of', 'the', 'a', ',') are ignored.
  • Word Order: The word order in text and terminology is not important for a match.

Example: the term "Inflamed Appendix" is mapped to the text snipped "inflammation of the appendix".

3.9.4.3. Configuration Example 3: useSegmentLookup actived

The segment lookup mode uses a dictionary-based approach to decompose compound words into their word components. The term "decompounding" is often used for this purpose. It can be helpful in so-called agglutinating languages, which combine many words into new compound words. Conversely, however, there is a risk that only parts of words in texts are mapped to a term, resulting in false positive hits. Therefore, the segment mode should only be used in exceptional cases.

  • Mapping Modus: Segmenting (Decompounding)
  • Case Sensitivity: Upper and lower case, some punctuation and the occurrence of stop words (e.g. 'of', 'the', 'a', ',') are ignored.
  • Word Order: The word order in text and terminology is not important for a match.
3.9.4.4. Activating the "Exact mode" in terminology administration

The Exact Mode is, like the simple mode, automatically activated in each pipeline, so you will not find a parameter for this in the GenericTerminologyAnnotator. The user has special influence on the terms to be mapped in exact mode. This mode is only applied to a specific part of the terminology, namely to exactly those terms for which the user has actively set the mapping mode to "EXACT" in the terminology editor. The "EXACT" mapping mode ensures that the corresponding term is found in exactly the same spelling as the term, i.e. in the same uppercase and lowercase letters, in the same word order and without any pre-processing in the form of e.g. stemming:

  • Mapping Modus: Exact
  • Case Sensitivity: Upper and lower case is considered, stop words are preserved.
  • Word Order: The word order in text and terminology must be the same.

3.9.5. Web Service Example

Text Example: "appendicitis"

{
   "begin": 0,
   "end": 12,
   "type": "de.averbis.types.health.Concept",
   "coveredText": "Appendizitis",
   "id": 303,
   "matchedTerm": "Appendizitis",
   "dictCanon": "Appendizitis",
   "conceptID": "2",
   "source": "test_1.0",
   "uniqueID": "test_1.0:2"
}


3.10. Gleason Score

3.10.1. Description

This component recognizes Gleason score annotations.

3.10.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.10.3. Output

Annotation Type: de.averbis.types.health.GleasonScore


Table ?: GleasonScore Features
AttributeDescriptionType

score

The combined score.

String

primaryGrade

The primary grade (not always available).

String

secondaryGrade

The secondary grade (not always available).

String

3.10.4. Web Service Example

Text Example: "Gleason Pattern 3(60%) + 4(40%) = 7"

    {
      "begin": 0,
      "end": 35,
      "type": "de.averbis.types.health.GleasonScore",
      "coveredText": "Gleason Pattern 3(60%) + 4(40%) = 7",
      "id": 1855,
      "score": "7",
      "primaryGrade": "3",
      "secondaryGrade": "4"
    }


3.11. GvHD

3.11.1. Description

This component recognizes information about the occurrence of a GvHD (Graft-versus-Host-Disease).

3.11.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.11.3. Output

Annotation Type: de.averbis.types.health.GvHD


Table ?: GvHD Features
AttributeDescriptionType

dictCanon

Preferred term of the concept.

String

matchedTerm The matching synonym of the GvHD concept in the terminology.String
uniqueID

Unique identifier of the concept of the format 'terminologyId:conceptID'.

String

conceptID

The ID of the concept.

String

source The name of the terminology source.String
confidence

For approaches using machine learning (e.g. "DocumentClassification"), the confidence is calculated that the respective annotation has been correctly generated.

Possible value range: 0-1

Note: Annotations generated with non-machine learning approaches such as terminology mappings (approach = "DictionaryLookup") are reflected with a confidence value of 0.

Double
negatedBy Specifies the negation word, if one exists.String
approach

Information about the text mining approach used to generate the annotation.

Possible values: DictionaryLookup | SimilarityMatching | DocumentClassification | DerivedByLabValue

String
onsetDate

The onset date of the diagnosis, if given in the text.

Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17)

Please note: The onsetDate is only annotated if the pear component "Disease Onset Date" is integrated in the text analysis pipeline used. The preconfigured pipelines do not contain this component, thus the value of the onset feature is represented as null.

String
verificationStatus

Verification status of the GvHD diagnosis.

Possible values (default is underlined): null | NEGATED | ASSURED | SUSPECTED | DIFFERENTIAL

String
clinicalStatus

Clinical status of the GvHD diagnosis.

Possible values (default is underlined): null | ACTIVE | RESOLVED

String
kind

The kind of the diagnosis.

Possible values (default is underlined): null | main | secondary

String
side

The laterality of the diagnosis.

Possible values (default is underlined): null | RIGHT | LEFT | BOTH

String
laterality

The laterality of the diagnosis.

Possible values (default is  underlined):  null  | RIGHT | LEFT | BOTH

WARNING: This feature is deprecated and will be removed in V6 of Health Discovery. It will be replaced by the equivalent attribute 'side'.

String
belongsTo

Indicates, whether the diagnosis belongs to a donor or recipient (e.g. in case of transplantations) or to a family member.

Possible values (default is underlined): null | DONOR | FAMILY | RECIPIENT

String

continuanceStatus

GvHD status.

Possible values (default is underlined): null | ACUTE | CHRONIC

String

grade

Grade of the GvHD diagnosis. Possible values (default is underlined): null I | II | III | IV

String

stage

Stage of the GvHD diagnosis.

Possible values (default is underlined): null | 1 | 2 | 3 | 4 | LIMITED | EXTENDED

String

organ

Organ diagnosed with GvHD.

Possible values (default is underlined): null | SKIN | LIVER | INTESTINAL | EYE | LUNG | CONNECTIVE TISSUE | MUCOSA | VAGINAL

String

date

The date of the diagnosis.

Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17)

String

GvHD is a subtype of Diagnosis , i.e. it inherits all features.

3.11.4. Web Service Example

Text Example: "Akute Transplantat-gegen-Wirt Erkrankung Stadium 3 der Haut, Schweregrad III"

    {
      "begin": 0,
      "end": 76,
      "type": "de.averbis.types.health.GvHD",
      "coveredText": "Akute Transplantat-gegen-Wirt Erkrankung Stadium 3 der Haut, Schweregrad III",
      "id": 1346,
      "date": null,
      "organ": "SKIN",
      "negatedBy": null,
      "side": null,
      "matchedTerm": "Akute Transplantat-gegen-Wirt Erkrankung Stadium 3 der Haut",
      "verificationStatus": null,
      "kind": null,
      "confidence": 0,
      "onsetDate": null,
      "source": "ICD10GM_2021",
      "clinicalStatus": null,
      "approach": "DictionaryLookup",
      "laterality": null,
      "stage": "3",
      "dictCanon": "Stadium 3 der akuten Haut-Graft-versus-Host-Krankheit",
      "grade": "III",
      "continuanceStatus": "ACUTE",
      "conceptID": "L99.13*",
      "belongsTo": null,
      "uniqueID": "ICD10GM_2021:L99.13*"
    }


3.12. Health Measurements

3.12.1. Description

This component detects measurements in medical texts.

3.12.2. Input

Above this annotator, the following annotator must be included in the pipeline:

When generating a measurement annotation a NumericValue and a unit is combined. The LaboratoryParameter annotation allows the generation of a measurement even when a unit is missing, e.g. Hb 11.

The HealthPreprocessing pipeline block provides most of the prerequisite annotation types to ensure the proper functionality of the Health Measurements annotator. In order to use the positive effect of available LaboratoryParameter annotations, this annotator is included in LabValues, but it can also be used separately.

3.12.3. Output

Annotation Type: de.averbis.types.health.Measurement


Table ?: Measurement Features
AttributeDescriptionType

unit

The unit of the measurement.

String

normalizedUnit

Normalized string value of the unit.

String

normalizedValue

Normalized value of the measurement.

This value is the result of the transformation of the numeric value according to the transformation of the unit to its standard unit.

Double

value

The numeric value of the measurement.

Double

dimension

The dimension of the unit, e.g. [M] standing for mass in the example below.

String

3.12.4. Web Service Example

Health Measurements are only returned in the context of a LabValues and Medications.


3.13. Health Preprocessing

3.13.1. Description

This pipeline block is responsible for preprocessing the input documents and preparing the minimal set of required annotations which serve as input for the subsequenet components. Among others, this pipeline block recognizes and annotates words, sentences, abbreviations, temporal expressions and numerical values. Additionally, it filters out the stopwords (i.e, commonly used words which carry no important significance) and improves the sentence segmentation altered by abbreviations.

For the optimal functionality of the subsequent components, it is recommended to run the Health Preprocessing beforehand.

3.13.2. Input

Above this annotator, one of the following annotators must be included in the pipeline:

3.13.3. Output

This component generates annotations which will be processed by the subsequent components, e.g. words, sentences, abbreviations, temporal expressions and numerical values.

3.13.4. Web Service Example

The annotations generated by the preprocessing pipeline block are not returned in the web service.


3.14. HealthPostProcessing

3.14.1. Description

This pipeline block contains annotators that can be used for postprocessing annotations of previous pipeline components. The first element provided in postprocessing is the "Blacklist Removal Annotator", which is described in more detail below.

3.14.2. Input

This pipeline block is based on the fact that the annotators, whose output is changed by a post-processing component, are previously included in the pipeline.

3.14.3. Output

The output of this pipeline block depends on the components which are used as part of the postprocessing.

3.14.4. Web Service Example

The annotations generated by the postprocessing pipeline block are not returned in the web service.

3.14.5. Blacklist Removal Annotator

3.14.5.1. Description

This annotator is a component of the HealthPostProcessing pipeline block. It can be used to remove annotations using a blacklist. The blacklist consists of several blacklist terms that can be set via parameters in the pipeline configuration on the Health Discovery User Interface.

By default, only the following annotation types are removed by the annotator:

  • CodingCandidate
  • Diagnosis
  • Drug
  • Ingredient
  • LaboratoryParameter
  • LaboratoryValue
  • Medications
3.14.5.2. Input

This component does not require any specific annotators. It relies on the assumption that the affected annotators (see above) are defined in the typesystem.

3.14.5.3. How to use the Blacklist Removal Annotator

Step 1: Go to "Pipeline Configuration"

Step 2: Stop the pipeline in which you would like to use the Blacklist Annotator and click the button "edit pipeline".

Step 3: Select from the list of available annotators on the right panel the component "HealthPostProcessing" and add the component as last position to your pipeline.

S tep 4: Click on the HealthPostProcessing component in your pipeline to display the containing components.

Step 5: Click on "BlacklistAnnotationRemover" and fill in the text passages you would like to remove from the text analysis output.


Figure ?: How to use the Blacklist Removal Annotator

Step 6: Save and restart the pipeline.

Please note: 1) choose the terms you enter carefully, because they change the output of the pipeline for all documents that are analyzed with this pipeline, and 2) the parameter "ignoreCase" can be used to define whether the terms should be treated case-sensitive or not. If you want terms to be handled differently, add a second HealthPostProcessing block in your pipeline.

3.14.5.4. Output

This component only removes annotations, no new annotations are added.

3.14.5.5. Web Service Example

This component removes annotations, thus there is no return in the web service.


3.15. HLA

3.15.1. Description

This component annotates information about HLA (human leukocyte antigen).

3.15.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.15.3. Output

Annotation Type: de.averbis.types.health.HLA


Table ?: HLA Features
AttributeDescriptionType

dictCanon

Preferred term of HLA.

String

matchedTerm The matching synonym of the GvHD concept in the terminology.String
uniqueID

Unique identifier of the concept of the format 'terminologyId:conceptID'.

String

conceptID

The ID of the concept.

String

source The name of the terminology source.String
negatedBy Specifies the negation word, if one exists.String

date

Date of observation. (Format: YYYY-MM-DD)

String

samplingDate

Date of sampling (Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2009-02-17))

String

receiptDate

Date of receipt of sample. (Format: YYYY-MM-DD)

String

belongsTo

Indicates, whether the diagnosis belongs to a donor or recipient (e.g. in case of transplantations).

Possible values (default is underlined): null | DONOR | RECIPIENT

String

male

Paternal HLA manifestation.

HLAValue

female

Maternal HLA manifestation.

HLAValue

Annotation Type: de.averbis.types.health.HLAValue


Table ?: HLAValue Features
AttributeDescriptionType

alleleGroup

Allele group of actual HLA.

String

protein

Specific protein of actual HLA.

String

synonymousDNA

Synonymous DNA substitution within the coding region.

String

noncodingRegionVariant

Differences in non-coding region.

String

expressionNote

Suffix to code changes in expression.

String


3.15.4. Web Service Example

Example of HLA in a medical record (table view)


HLA-A
Patient0101, 6801
Donor0101, 6801

Example of HLA table converted to text:

HLA-A

Patient

0101,6801

Donor

0101,6802


Output when text send to Web Service:

    {
      "begin": 0,
      "end": 5,
      "type": "de.averbis.types.health.HLA",
      "coveredText": "HLA-A",
      "id": 1272,
      "date": null,
      "negatedBy": null,
      "matchedTerm": "HLA-A",
      "dictCanon": "HLA-A",
      "receiptDate": null,
      "conceptID": "LP18319-1",
      "source": "Averbis-Lab-Terminology_2.0",
      "female": {
        "begin": 19,
        "end": 23,
        "type": "de.averbis.types.health.HLAValue",
        "coveredText": "6801",
        "id": 1274,
        "alleleGroup": "68",
        "noncodingRegionVariant": null,
        "protein": "01",
        "synonymousDNA": null,
        "expressionNote": null
      },
      "samplingDate": null,
      "belongsTo": "RECIPIENT",
      "uniqueID": "Averbis-Lab-Terminology_2.0:LP18319-1",
      "male": {
        "begin": 14,
        "end": 18,
        "type": "de.averbis.types.health.HLAValue",
        "coveredText": "0101",
        "id": 1273,
        "alleleGroup": "01",
        "noncodingRegionVariant": null,
        "protein": "01",
        "synonymousDNA": null,
        "expressionNote": null
      }
    },
    {
      "begin": 0,
      "end": 5,
      "type": "de.averbis.types.health.HLA",
      "coveredText": "HLA-A",
      "id": 1275,
      "date": null,
      "negatedBy": null,
      "matchedTerm": "HLA-A",
      "dictCanon": "HLA-A",
      "receiptDate": null,
      "conceptID": "LP18319-1",
      "source": "Averbis-Lab-Terminology_2.0",
      "female": {
        "begin": 35,
        "end": 39,
        "type": "de.averbis.types.health.HLAValue",
        "coveredText": "6802",
        "id": 1277,
        "alleleGroup": "68",
        "noncodingRegionVariant": null,
        "protein": "02",
        "synonymousDNA": null,
        "expressionNote": null
      },
      "samplingDate": null,
      "belongsTo": "DONOR",
      "uniqueID": "Averbis-Lab-Terminology_2.0:LP18319-1",
      "male": {
        "begin": 30,
        "end": 34,
        "type": "de.averbis.types.health.HLAValue",
        "coveredText": "0101",
        "id": 1276,
        "alleleGroup": "01",
        "noncodingRegionVariant": null,
        "protein": "01",
        "synonymousDNA": null,
        "expressionNote": null
      }


3.16. Irradiation

3.16.1. Description

This component recognizes information about a previous irradiation therapy.

3.16.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.16.3. Output

Annotation Type: de.averbis.types.health.Irradiation


Table ?: Irradiation Features
AttributeDescriptionType

dictCanon

Preferred term of the Irradiation concept.

String

matchedTerm

Matching synonym of the Irradiation concept.

String

uniqueID

Unique identifier of the Irradiation concept of the format 'terminologyId:conceptID'.

String

conceptID

The concept id.

String

source

The name of the terminology source.

String

negatedBy Specifies the negation word, if one exists.String

irradiationDose

The irradiation dose.

IrradiationDose

dateInterval

Temporal information (date interval) about the irradiation therapy.

DateInterval

Annotation Type: de.averbis.types.health.IrradiationDose


Table ?: IrradiationDose Features
AttributeDescriptionType

kind

The irradiation dose kind.

Possible values (default is underlined): null | FRACTIONAL

String

dose

The dose.

Measurement


3.16.4. Web Service Example

Text Example (Irradiation): "Fraktionierte Ganzkörperbestrahlung (TBI) über opponierende Felder mit einer Gesamtdosis von 12 Gy vom 18.11. bis 20.11.2008"

    {
      "begin": 0,
      "end": 41,
      "type": "de.averbis.types.health.Irradiation",
      "coveredText": "Fraktionierte Ganzkörperbestrahlung (TBI)",
      "id": 1854,
      "negatedBy": null,
      "matchedTerm": "Ganzkörperbestrahlung",
      "dictCanon": "Bestrahlung",
      "irradiationDose": {
        "begin": 93,
        "end": 98,
        "type": "de.averbis.types.health.IrradiationDose",
        "coveredText": "12 Gy",
        "id": 1855,
        "dose": {
          "begin": 93,
          "end": 98,
          "type": "de.averbis.types.health.Measurement",
          "coveredText": "12 Gy",
          "id": 1856,
          "unit": "Gy",
          "normalizedUnit": "m²/s²",
          "normalizedValue": 12,
          "value": 12,
          "dimension": "[L]²/[T]²"
        },
        "kind": "FRACTIONAL"
      },
      "conceptID": "10037794",
      "source": "Averbis-Therapy_1.0",
      "uniqueID": "Averbis-Therapy_1.0:10037794",
      "dateInterval": {
        "begin": 99,
        "end": 124,
        "type": "de.averbis.types.health.DateInterval",
        "coveredText": "vom 18.11. bis 20.11.2008",
        "id": 1857,
        "endDate": "2008-11-20",
        "kind": "DATEINTERVAL",
        "value": "[2008-11-18, 2008-11-20]",
        "startDate": "2008-11-18"
      }


3.17. Lab Values

3.17.1. Description

This component detects laboratory values and vital signs, such as blood pressure levels, ECOG (Eastern Cooperative Oncology Group) and NYHA (New York Heart Association) performance status, left ventricular ejection fraction and many more.

The annotation of measurements is already integrated in this pipeline block. If measurements are needed for other components (e.g. for Medications), they should be executed afterwards. For more details of measurements see Health Measurements.

3.17.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.17.3. Output

Annotation Type: de.averbis.types.health.LaboratoryValue


Table ?: LaboratoryValue Features
AttributeDescriptionType

parameter

Parameter of actual laboratory value.

LaboratoryParameter

fact

Measurement of actual laboratory value.

Measurement

factAssessment

A optional relative assessment of the fact.

String

lowerLimit

Lower reference value of actual laboratory value.

Measurement

upperLimit

Upper reference value of actual laboratory value.

Measurement

interpretation

Interpretation of fact depending on reference values or interpretation in text (also possible without fact).

Possible values (default is underlined): null | normal | abnormal | high | low

String

qualitativeValue

Qualitative value of the actual laboratory value.

QualitativeValue

belongsTo

Indicates, whether the laboratory value belongs to a donor or recipient (e.g. in case of transplantations) or to a family member.

Possible values (default is underlined): null | DONOR | FAMILY | RECIPIENT

String


Annotation Type: de.averbis.types.health.LaboratoryParameter


Table ?: LaboratoryParameter Features
AttributeDescriptionType

dictCanon

Preferred term of the LaboratoryParameter concept.

String

matchedTerm

Matching synonym of the LaboratoryParameter concept.

String

uniqueID

Unique identifier of the LaboratoryParameter concept of the format 'terminologyId:conceptID'.

String

conceptID

The concept id.

String

source

The name of the terminology source.

String

negatedBySpecifies the negation word, if one exists.String


Annotation Type: de.averbis.types.health.QualitativeValue


Table ?: QualitativeValue Features
AttributeDescriptionType
value

Qualitative statement on a laboratory value.

Possible values (default is underlined): null | 1- | - - | 2- | - - - | 3- | 1+ | ++| 2+| +++| 3+ | APPROPRIATE | BORDERLINE | EVIDENCE | NEGATIVE | NO_EVIDENCE | NOT_QUANTIFIABLE | PARTIAL | POSITIVE | SPECKLED | STAINING | UNKOWN

String
modifier

Describes the characteristic of a qualitative value.

Possible values (default is underlined): null | ABNORMAL | ALTERNATING | BORDERLINE | CENTROMERE | CIRCULAR | CONTINUOUS | CYTOPLASMATIC | DEMONSTRABLE | HOMOGENEOUS | MODERATE | NOT | NOT_QUANTIFIABLE | NUCLEOLAR | PERINUCLEOLAR | QUANTIFIABLE | QUALITATIVE | REPEATED | STRONG | WEAK

String

NEW: Standard Feature

This type has now all standard features: 'begin' 'end'; 'type', 'coveredText' and 'id'



Annotation Type: de.averbis.types.health.BloodPressure


Table ?: BloodPressure Features
AttributeDescriptionType

systolic

Measurement of systolic blood pressure.

Measurement

diastolic

Measurement of diastolic blood pressure.

Measurement

interpretation

Interpretation of systolic and diastolic values depending on named interpretations in the text.

Possible values (default is underlined): null | normal | abnormal | high | low

String


Annotation Type: de.averbis.types.health.ECOG


Table ?: ECOG Features
AttributeDescriptionType

stage

Stage of the he ECOG (Eastern Cooperative Oncology Group) Performance Status (numeric scale).

String



Annotation Type: de.averbis.types.health.NYHA


Table ?: NYHA Features
AttributeDescriptionType
stage

Stage of the he ECOG (Eastern Cooperative Oncology Group) Performance Status (numeric scale).

String


Annotation Type: de.averbis.types.health.Organism


Table ?: Features of Organism
AttributeDescriptionType

matchedTerm

Matching synonym of the organism concept found in the text.

String

dictCanon Preferred term of the organism concept.String
kind The kind of the organism, e.g. 'Bacterium', 'Virus' or 'Fungus'String
conceptID The ID of the concept.String
source The name of the terminology source.String
uniqueID Unique identifier of the concept of the format 'terminologyId:conceptID'.String

                  
                    negatedBy
                  
                
Specifies the negation word, if one exists.String

3.17.4. Terminology Binding


Table ?: Terminology Bindings
NameLanguagesVersionIdentifierComment

Averbis Lab Terminology

EN, DE

2.0

Averbis-Lab-Terminology_2.1

Laboratory and vital signs parameters, ID based on LOINC codes (LOINC parts) composed by Averbis.

SNOMED-CT BacteriaEN, DE2020

SNOMED-CT-Bacteria_2020

Terminology of bacteria, ID based on SNOMED-CT codes composed and enriched by Averbis.
SNOMED-CT FungusEN, DE2020

SNOMED-CT-Fungus_2020

Terminology of fungi, ID based on SNOMED-CT codes composed and enriched by Averbis.
SNOMED-CT VirusEN, DE2020

SNOMED-CT-Virus_2020

Terminology of viruses, ID based on SNOMED-CT codes composed and enriched by Averbis.

3.17.5. Web Service Example

Example 1 (LabValue with interpretation): "Uric acid 9.6 mg/dl (3.5-7.0)"

    {
      "begin": 0,
      "end": 29,
      "type": "de.averbis.types.health.LaboratoryValue",
      "coveredText": "Uric acid 9.6 mg/dl (3.5-7.0)",
      "id": 2247,
      "factAssessment": null,
      "fact": {
        "begin": 10,
        "end": 19,
        "type": "de.averbis.types.health.Measurement",
        "coveredText": "9.6 mg/dl",
        "id": 2248,
        "unit": "mg/dL",
        "normalizedUnit": "kg/m³",
        "normalizedValue": 0.096,
        "value": 9.6,
        "dimension": "[M]/[L]³"
      },
      "interpretation": "high",
      "parameter": {
        "begin": 0,
        "end": 9,
        "type": "de.averbis.types.health.LaboratoryParameter",
        "coveredText": "Uric acid",
        "id": 2246,
        "negatedBy": null,
        "matchedTerm": "Uric Acid",
        "dictCanon": "Urate",
        "conceptID": "LP15935-7",
        "source": "Averbis-Lab-Terminology_2.1",
        "uniqueID": "Averbis-Lab-Terminology_2.1:LP15935-7"
      },
      "upperLimit": {
        "begin": 25,
        "end": 28,
        "type": "de.averbis.types.health.Measurement",
        "coveredText": "7.0",
        "id": 2250,
        "unit": "mg/dL",
        "normalizedUnit": "kg/m³",
        "normalizedValue": 0.07,
        "value": 7,
        "dimension": "[M]/[L]³"
      },
      "qualitativeValue": null,
      "lowerLimit": {
        "begin": 21,
        "end": 24,
        "type": "de.averbis.types.health.Measurement",
        "coveredText": "3.5",
        "id": 2249,
        "unit": "mg/dL",
        "normalizedUnit": "kg/m³",
        "normalizedValue": 0.035,
        "value": 3.5,
        "dimension": "[M]/[L]³"
      },
      "belongsTo": null
    }


Text Example 2 (QualitativeValue): "CMV antibody strong positive"

    {
      "begin": 0,
      "end": 28,
      "type": "de.averbis.types.health.LaboratoryValue",
      "coveredText": "CMV antibody strong positive",
      "id": 838,
      "factAssessment": null,
      "fact": null,
      "interpretation": null,
      "parameter": {
        "begin": 0,
        "end": 12,
        "type": "de.averbis.types.health.LaboratoryParameter",
        "coveredText": "CMV antibody",
        "id": 837,
        "negatedBy": null,
        "matchedTerm": "CMV antibody",
        "dictCanon": "Cytomegalovirus Ab",
        "conceptID": "LP37878-3",
        "source": "Averbis-Lab-Terminology_2.1",
        "uniqueID": "Averbis-Lab-Terminology_2.1:LP37878-3"
      },
      "upperLimit": null,
      "qualitativeValue": {
        "begin": 13,
        "end": 28,
        "type": "de.averbis.types.health.QualitativeValue",
        "coveredText": "strong positive",
        "id": 839,
        "modifier": "STRONG",
        "value": "POSITIVE"
      },
      "lowerLimit": null,
      "belongsTo": null
    }


Text Example 3 (BloodPressure): "BP 129/61 mmHg"

    {
      "begin": 0,
      "end": 14,
      "type": "de.averbis.types.health.BloodPressure",
      "coveredText": "BP 129/61 mmHg",
      "id": 1072,
      "systolic": {
        "begin": 3,
        "end": 6,
        "type": "de.averbis.types.health.Measurement",
        "coveredText": "129",
        "id": 1073,
        "unit": "mmHg",
        "normalizedUnit": "kg/(m·s²)",
        "normalizedValue": 17198.538,
        "value": 129,
        "dimension": "[M]/([L]·[T]²)"
      },
      "diastolic": {
        "begin": 7,
        "end": 14,
        "type": "de.averbis.types.health.Measurement",
        "coveredText": "61 mmHg",
        "id": 1074,
        "unit": "mmHg",
        "normalizedUnit": "kg/(m·s²)",
        "normalizedValue": 8132.642,
        "value": 61,
        "dimension": "[M]/([L]·[T]²)"
      },
      "interpretation": null
    }


Text Example 4 (ECOG Performance Status): "Patient's performance status is ECOG 2."

    {
      "begin": 32,
      "end": 38,
      "type": "de.averbis.types.health.ECOG",
      "coveredText": "ECOG 2",
      "id": 1008,
      "stage": "2"
    }


Text Example 5 (NYHA Classification): "NYHA Class II"

    {
      "begin": 0,
      "end": 13,
      "type": "de.averbis.types.health.NYHA",
      "coveredText": "NYHA Class II",
      "id": 675,
      "stage": "2"
    }


Text Example 6 (Organism, LabValue): "Klebsiella pneumoniae positiv"

    {
      "begin": 0,
      "end": 29,
      "type": "de.averbis.types.health.LaboratoryValue",
      "coveredText": "Klebsiella pneumoniae positiv",
      "id": 832,
      "factAssessment": null,
      "fact": null,
      "interpretation": null,
      "parameter": {
        "begin": 0,
        "end": 21,
        "type": "de.averbis.types.health.LaboratoryParameter",
        "coveredText": "Klebsiella pneumoniae",
        "id": 833,
        "negatedBy": null,
        "matchedTerm": "Klebsiella pneumoniae",
        "dictCanon": "Klebsiella pneumoniae",
        "conceptID": "56415008",
        "source": "SNOMED-CT-Bacteria_2020",
        "uniqueID": "SNOMED-CT-Bacteria_2020:56415008"
      },
      "upperLimit": null,
      "qualitativeValue": {
        "begin": 22,
        "end": 29,
        "type": "de.averbis.types.health.QualitativeValue",
        "coveredText": "positiv",
        "id": 834,
        "modifier": null,
        "value": "POSITIVE"
      },
      "lowerLimit": null,
      "belongsTo": null
    }
    {
      "begin": 0,
      "end": 21,
      "type": "de.averbis.types.health.Organism",
      "coveredText": "Klebsiella pneumoniae",
      "id": 835,
      "negatedBy": null,
      "matchedTerm": "Klebsiella pneumoniae",
      "dictCanon": "Klebsiella pneumoniae",
      "kind": "Bacterium",
      "conceptID": "56415008",
      "source": "SNOMED-CT-Bacteria_2020",
      "uniqueID": "SNOMED-CT-Bacteria_2020:56415008"
    }


3.18. Language Detection

3.18.1. Description

This component recognizes and sets the text language. It currently supports German and English. In contrast to the LanguageSetter, this component decides individually for each document which language it is and sets the language accordingly.

If no language can be detected the language is set to 'German'.

3.18.2. Input

The component does not expect any annotations.

3.18.3. Output

The component sets the parameter 'documentLanguage' in the type

de.averbis.types.health.DocumentAnnotation

3.18.4. Web Service Example

Text Example: "this is a sample text."

    {
      "begin": 0,
      "end": 22,
      "type": "de.averbis.types.health.DocumentAnnotation",
      "coveredText": "this is a sample text.",
      "id": 678,
      "language": "en",
      "version": null
    }


3.19. LanguageSetter

3.19.1. Description

A language setter sets the text language in a document. It should only be used if the language is the same for all documents that are sent to this pipeline.

3.19.2. Input

The component does not expect any annotations.

3.19.3. Output

The component sets the parameter documentLanguage.

3.19.4. Configuration


Table ?: Configuration LanguageSetter
NameDescriptionTypeMultiValuedMandatory

language

The document language to set if not already set in CAS.

String

false

true

overwriteExisting

If true an existing document language will be overwritten.

Boolean

false

true

3.19.5. Web Service Example

The language is currently not returned in the web service.


3.20. Laterality

3.20.1. Description

This component annotates the laterality or body site of different annotation types, e.g. Diagnosis , Procedure and Ophthalmology.

3.20.2. Input

Above this annotator, the following annotators must be included in the pipeline:

This annotator must be included above the annotators whose feature 'side' it sets.

3.20.3. Output

This annotator sets the feature 'side' in above mentioned annotation types.

3.20.4. Web Service Example

As a standalone component, this doesn’t return anything in the web service.


3.21. Medications

3.21.1. Description

This component detects medications, which are a combination of the active ingredient or preparation, a strength, a dose frequency, the dose form, the route of administration and date intervals or a single date.

3.21.2. Input

Above this annotator, the following annotators must be included in the pipeline:

For the annotation of measurements either the LabValues block or the HealthMeasurements block should be executed beforehand.

3.21.3. Output

Annotation Type: de.averbis.types.health.Medication


Table ?: Medication Features
AttributeDescriptionType

drugs

Drug or multi drug of the actual medication.

Multi-Value Field

Drug

doseFrequency

Dose frequency of the actual medication.

Possible forms are a general DoseFrequency or the more detailed DayTimeDoseFrequency, WeekDayDoseFrequency, TimeMeasurementDoseFrequency etc.

DoseFrequency

doseForm

Dose form of the actual medication.

DoseForm

date

Temporal information (date or date interval) about the actual medication.

Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17)

Date or DateInterval

administrations

The routes of administration of this medication, presented as String(s). Please see Web Service Example 2 for more details.

Multi-Value Field

Administration

rateQuantity

Amount of medication per unit of time, e.g., 2 doses.

Double

status

Status of the medication.

Possible values (default is underlined): null | ADMISSION | ALLERGY | INPATIENT | DISCHARGE | NEGATED | CONSIDERED | INTENDED | FAMILY |CONDITIONING_TREATMENT

String

termTypes

Additional information on clinical drug, e.g. semantic clinical drug (RxNorm TermType).

Multi-Value Field

TTY


Annotation Type: de.averbis.types.health.Date


Table ?: Date Features
AttributeDescriptionType

kind

Kind of the date information, here: "DATE".

String
value

Value of the date.

Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17)

String


Annotation Type: de.averbis.types.health.DateInterval


Table ?: Dateinterval Features
AttributeDescriptionType

kind

Kind of the date information, here: "DATEINTERVAL"

String
value

Value of the date.

Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17)

String
startDate First date of the date interval.String
endDate Second date of the date interval.String


Annotation Type: de.averbis.types.health.Drug


Table ?: Drug Features
AttributeDescriptionType

ingredient

Ingredient of the drug.

Ingredient

strength

Strength of the drug.

Strength


Drugs with more than one ingredient (multi drugs) are also detected and consist of multiple Drug-annotations.


Annotation Type: de.averbis.types.health.Ingredient


Table ?: Features of Ingredient
AttributeDescriptionType

dictCanon

Preferred term of Ingredient.

String

matchedTerm

Matching synonym of Ingredient.

String

uniqueID

Unique identifier of Ingredient of the format 'terminologyId:conceptID'.

String

conceptID

The concept id.

String

source

The name of the terminology source.

String

negatedBySpecifies the negation word, if one exists.String


Annotation Type: de.averbis.types.health.strength


Table ?: Features of Strength
AttributeDescriptionType

dictCanon

Preferred term of Ingredient.

String

matchedTerm

Matching synonym of Ingredient.

String

uniqueID

Unique identifier of Ingredient of the format 'terminologyId:conceptID'.

String

conceptID

The concept id.

String

source

The name of the terminology source.

String

negatedBySpecifies the negation word, if one exists.String
unitThe unit of the measurement.String
normalizedUnitNormalized string value of the unit.String
dimension

The dimension of the unit, e.g. [M] standing for mass in the example below.


valueThe numeric value of the measurement.String
normalizedValue

Normalized value of the measurement.

This value is the result of the transformation of the numeric value according to the transformation of the unit to its standard unit.

String


Annotation Type: de.averbis.types.health.DoseForm


Table ?: DoseForm Features
AttributeDescriptionType

dictCanon

Preferred term of the dose form.

String

matchedTerm

Matching synonym of the Ingredient.

String

uniqueID

Unique identifier of the dose form concept of the format 'terminologyId:conceptID'.

String

conceptID

The concept id.

String

source

The name of the terminology source.

String

negatedBySpecifies the negation word, if one exists.String


Annotation Type: de.averbis.types.health.DoseFrequency


Table ?: DoseFrequency Features
AttributeDescriptionType

dictCanon

Preferred term of the dose frequency (optional).

String

matchedTerm

The matched Term of the Ingredient concept (optional).

String

uniqueID

Unique identifier of the dose frequency of the format 'terminologyId:conceptID' (optional).

String

conceptID

The concept id (optional).

String

source

The name of the terminology source (optional).

String

negatedBy               
Specifies the negation word, if one exists.String

interval

The taking interval of a medication, e.g. day, week, month etc.

String

totalCount

Total count of taken drug units per interval.

Double

totalDose

Total dose of taken drug per interval.

Measurement

morning / midday / evening / atNight

Only available for DayTimeDoseFrequency: represent the count of drug units to be taken at the different daytimes.

Double

monday / tuesday / … / sunday

Only available for WeekTimeDoseFrequency: represent the count of drug units to be taken at the different week days.

Double


Annotation Type: de.averbis.types.health.TTY


Table ?: Features of TermTypes (TTY)
AttributeDescriptionType

code

Term type code for the medication.

String

kind

The kind of the TTY, e.g. "IN" for ingridient, "SCDC" for ingridient and drug.

String

description

Term type description for the medication.

String

NEW: Standard Feature

This type has now all standard features: 'begin' 'end'; 'type', 'coveredText' and 'id'

3.21.4. Terminology Binding


Table ?: Terminology Bindings
CountryNameVersionIdentifierComment

United States

RxNorm Ingredients

2020

RxNorm-Ingredients_2020_08

Subset of RxNorm, a US-specific terminology in medicine that contains all medications available on the US market in 2020, enriched with synonyms by Averbis. This subset contains only the ingredients.

United States

RxNorm Strength

2019

RxNormStrength_2019071

Subset of RxNorm, a US-specific terminology in medicine that contains all medications available on the US market in 2020, enriched with synonyms by Averbis. This subset contains only the strengths.

United States

Averbis-Dose-Frequency

1.0

Averbis-Dose-Frequency_1.0

Terminology of dose frequencies, ID based on SNOMED-CT codes composed and enriched by Averbis.

United States / Germany

Averbis Dose Form

1.0

Averbis-Dose-Form_1.0

Terminology of dose forms, composed and enriched by Averbis. Based on SNOMED-CT, RxNorm and Abdamed.

Germany

Abdamed-Averbis

2017

Abdamed-Averbis_2017

Database of pharmaceutical and medication terminology in Germany, 2017, enriched with synonyms by Averbis.

3.21.5. Web Service Example

Text Example1: "Medication on discharge: Aspirin 100 mg 1-0-1 TAB from 01/01 to 01/30/2018"

    {
      "begin": 25,
      "end": 49,
      "type": "de.averbis.types.health.Medication",
      "coveredText": "Aspirin 100 mg 1-0-1 TAB",
      "id": 2421,
      "date": {
        "begin": 50,
        "end": 74,
        "type": "de.averbis.types.health.DateInterval",
        "coveredText": "from 01/01 to 01/30/2018",
        "id": 2430,
        "endDate": "2018-01-30",
        "kind": "DATEINTERVAL",
        "value": "[2018-01-01, 2018-01-30]",
        "startDate": "2018-01-01"
      },
      "administrations": [],
      "drugs": [
        {
          "begin": 25,
          "end": 39,
          "type": "de.averbis.types.health.Drug",
          "coveredText": "Aspirin 100 mg",
          "id": 2422,
          "ingredient": {
            "begin": 25,
            "end": 32,
            "type": "de.averbis.types.health.Ingredient",
            "coveredText": "Aspirin",
            "id": 2423,
            "negatedBy": null,
            "matchedTerm": "Aspirin",
            "dictCanon": "Aspirin",
            "conceptID": "1191",
            "source": "RxNorm_2020_08",
            "uniqueID": "RxNorm_2020_08:1191"
          },
          "strength": {
            "begin": 33,
            "end": 39,
            "type": "de.averbis.types.health.Strength",
            "coveredText": "100 mg",
            "id": 2424,
            "negatedBy": null,
            "unit": "mg",
            "matchedTerm": "100 MG",
            "dictCanon": "100 MG",
            "conceptID": "STR4",
            "normalizedUnit": "kg",
            "source": "RxNormStrength_2019071",
            "normalizedValue": 0.0001,
            "value": 100,
            "dimension": "[M]",
            "uniqueID": "RxNormStrength_2019071:STR4"
          }
        }
      ],
      "termTypes": null,
      "doseForm": {
        "begin": 46,
        "end": 49,
        "type": "de.averbis.types.health.DoseForm",
        "coveredText": "TAB",
        "id": 2427,
        "negatedBy": null,
        "matchedTerm": "Tabs",
        "dictCanon": "Oral tablet (qualifier value)",
        "conceptID": "SCT421026006",
        "source": "AverbisDoseForm_1.0",
        "uniqueID": "AverbisDoseForm_1.0:SCT421026006"
      },
      "rateQuantity": "NaN",
      "doseFrequency": {
        "begin": 40,
        "end": 45,
        "type": "de.averbis.types.health.DayTimeDoseFrequency",
        "coveredText": "1-0-1",
        "id": 2428,
        "negatedBy": null,
        "midday": 0,
        "matchedTerm": null,
        "source": null,
        "totalCount": 2,
        "atNight": "NaN",
        "morning": 1,
        "totalDose": {
          "begin": 40,
          "end": 45,
          "type": "de.averbis.types.health.Measurement",
          "coveredText": "1-0-1",
          "id": 2429,
          "unit": "mg",
          "normalizedUnit": null,
          "normalizedValue": "NaN",
          "value": 200,
          "dimension": "[M]"
        },
        "dictCanon": null,
        "conceptID": null,
        "interval": "daytime",
        "evening": 1,
        "uniqueID": null
      },
      "status": "DISCHARGE"
    },
    {
      "begin": 0,
      "end": 74,
      "type": "de.averbis.types.health.ClinicalSection",
      "coveredText": "Medication on discharge: Aspirin 100 mg 1-0-1 TAB from 01/01 to 01/30/2018",
      "id": 2418,
      "label": "DischargeMedication",
      "keyword": {
        "begin": 0,
        "end": 23,
        "type": "de.averbis.types.health.ClinicalSectionKeyword",
        "coveredText": "Medication on discharge",
        "id": 2419,
        "negatedBy": null,
        "matchedTerm": "Medication on discharge",
        "dictCanon": "Medication on discharge",
        "conceptID": "10183-2",
        "source": "clinical_sections_en",
        "uniqueID": "clinical_sections_en:10183-2"
      }
    }


Text Example 2: "Lisinopril 5 MG tablet Take 5 mg by mouth daily."

    {
      "begin": 0,
      "end": 47,
      "type": "de.averbis.types.health.Medication",
      "coveredText": "Lisinopril 5 MG tablet Take 5 mg by mouth daily",
      "id": 1812,
      "date": null,
      "administrations": [
        "by mouth"
      ],
      "drugs": [
        {
          "begin": 0,
          "end": 15,
          "type": "de.averbis.types.health.Drug",
          "coveredText": "Lisinopril 5 MG",
          "id": 1813,
          "ingredient": {
            "begin": 0,
            "end": 10,
            "type": "de.averbis.types.health.Ingredient",
            "coveredText": "Lisinopril",
            "id": 1814,
            "negatedBy": null,
            "matchedTerm": "Lisinopril",
            "dictCanon": "Lisinopril",
            "conceptID": "29046",
            "source": "RxNorm_2020_08",
            "uniqueID": "RxNorm_2020_08:29046"
          },
          "strength": {
            "begin": 11,
            "end": 15,
            "type": "de.averbis.types.health.Strength",
            "coveredText": "5 MG",
            "id": 1815,
            "negatedBy": null,
            "unit": "mg",
            "matchedTerm": "5 MG",
            "dictCanon": "5 MG",
            "conceptID": "STR133",
            "normalizedUnit": "kg",
            "source": "RxNormStrength_2019071",
            "normalizedValue": 0.000005,
            "value": 5,
            "dimension": "[M]",
            "uniqueID": "RxNormStrength_2019071:STR133"
          }
        }
      ],
      "termTypes": null,
      "doseForm": {
        "begin": 16,
        "end": 22,
        "type": "de.averbis.types.health.DoseForm",
        "coveredText": "tablet",
        "id": 1818,
        "negatedBy": null,
        "matchedTerm": "Tablets",
        "dictCanon": "Tablet dose form (qualifier value)",
        "conceptID": "SCT385055001",
        "source": "AverbisDoseForm_1.0",
        "uniqueID": "AverbisDoseForm_1.0:SCT385055001"
      },
      "rateQuantity": "NaN",
      "doseFrequency": {
        "begin": 42,
        "end": 47,
        "type": "de.averbis.types.health.TimeMeasurementDoseFrequency",
        "coveredText": "daily",
        "id": 1819,
        "negatedBy": null,
        "totalDose": {
          "begin": 42,
          "end": 47,
          "type": "de.averbis.types.health.Measurement",
          "coveredText": "daily",
          "id": 1820,
          "unit": "mg",
          "normalizedUnit": null,
          "normalizedValue": "NaN",
          "value": 5,
          "dimension": "[M]"
        },
        "matchedTerm": "Daily",
        "dictCanon": "Daily (qualifier value)",
        "conceptID": "69620002",
        "interval": "1/day",
        "source": "DoseFrequency_1.0",
        "totalCount": 1,
        "uniqueID": "DoseFrequency_1.0:69620002"
      },
      "status": null
    }


3.22. Medication Status

3.22.1. Description

The annotator recognizes the status of medications. Different status includes, for example, "INTENDED" or "FAMILY".

3.22.2. Input

Above this annotator, the following annotator must be included in the pipeline:

3.22.3. Output

This annotator sets the feature status in annotations of type Medication .

3.22.4. Web Service Example

Text Example: "A very good alternative, if the tumor is ER positive, is treatment with Tamoxifen."

    {
      "begin": 72,
      "end": 81,
      "type": "de.averbis.types.health.Medication",
      "coveredText": "Tamoxifen",
      "id": 1552,
      "date": null,
      "administrations": [],
      "drugs": [
        {
          "begin": 72,
          "end": 81,
          "type": "de.averbis.types.health.Drug",
          "coveredText": "Tamoxifen",
          "id": 1553,
          "ingredient": {
            "begin": 72,
            "end": 81,
            "type": "de.averbis.types.health.Ingredient",
            "coveredText": "Tamoxifen",
            "id": 1554,
            "negatedBy": null,
            "matchedTerm": "Tamoxifen",
            "dictCanon": "Tamoxifen",
            "conceptID": "10324",
            "source": "RxNorm_2020_08",
            "uniqueID": "RxNorm_2020_08:10324"
          },
          "strength": null
        }
      ],
      "termTypes": null,
      "doseForm": null,
      "rateQuantity": "NaN",
      "doseFrequency": null,
      "status": "CONSIDERED"
    }


3.23. Morphology

3.23.1. Description

This component detects morphologys. It is mainly used in pathology reports.

3.23.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.23.3. Output

The component creates annotations of type:

Annotation Type: de.averbis.types.health.Morphology


Table ?: Morphology Features
AttributeDescriptionType

dictCanon

Preferred term of the Morphology.

String

matchedTerm

Matching synonym of the Morphology.

String

uniqueID

Unique identifier of the Morphology of the format 'terminologyId:conceptID'.

String

conceptID

The concept id.

String

source

The name of the terminology source.

String

negatedBy

Contains "true" if the concept is negated.

String

confidenceThe confidence feature denotes the probability of the annotation (Diagnosis/Morphology/Topography concept) to be valid, i.e. the higher the confidence, the closer to a valid annotation.Double

3.23.4. Terminology Binding


Table ?: Terminology Bindings
CountryNameVersionIdentifierComment

United States

ICD-O

3.1

ICD-O_3.1

International Classification of Diseases for Oncology WHO edition, enriched with synonyms by Averbis.

Germany

ICD-O-DE

3.1

ICD-O-DE_3.1

International Classification of Diseases for Oncology German Edition, enriched with synonyms by Averbis.

3.23.5. Web Service Example

Text Example: "Adenocarcinoma of the rectum"

    {
      "begin": 0,
      "end": 14,
      "type": "de.averbis.types.health.Morphology",
      "coveredText": "Adenocarcinoma",
      "id": 974,
      "negatedBy": null,
      "matchedTerm": "Adenocarcinoma",
      "dictCanon": "Adenocarcinoma, NOS",
      "confidence": 0,
      "conceptID": "8140/3",
      "source": "ICD-O-Morphology-EN_3.1",
      "uniqueID": "ICD-O-Morphology-EN_3.1:8140/3"
    }


3.24. Negations

3.24.1. Description

This component detects negated expressions. The negations are detected and assigned to concept annotations that are affected by these expressions. The negation detection component is optimized for medical texts.

3.24.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.24.3. Output

This component sets the following internal type that is not visible in the annotation editor:

Annotation Type*: de.averbis.types.health.MedicalNegation

If a concept is successfully negated, the feature negatedBy will be set to the corresponding negation term. If the DiagnosisStatus annotator is included behind it, the’verificationStatus' feature is additionally set to NEGATED.

3.24.4. Web Service Example

Text Example: "No Crohn’s disease"

    {
      "begin": 3,
      "end": 18,
      "type": "de.averbis.types.health.Diagnosis",
      "coveredText": "Crohn’s disease",
      "id": 832,
      "negatedBy": "No",
      "side": null,
      "matchedTerm": "Crohn's disease",
      "verificationStatus": "NEGATED",
      "kind": null,
      "confidence": 0,
      "onsetDate": null,
      "source": "ICD10CM_2021",
      "clinicalStatus": null,
      "approach": "DictionaryLookup",
      "laterality": null,
      "dictCanon": "Crohn's disease, unspecified, without complications",
      "conceptID": "K50.90",
      "belongsTo": null,
      "uniqueID": "ICD10CM_2021:K50.90"
    }


3.25. Ophthalmology

3.25.1. Description

This component detects indicators for the left and the right eye, the intraocular pressure, mentions of visual acuity and concepts concerning the field of ophthalmology.

3.25.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.25.3. Output

Annotation Type: de.averbis.types.health.Ophthalmology


Table ?: Ophthalmology Features
AttributeDescriptionType

dictCanon

Preferred term of Ophthalmology.

String

matchedTerm

Matching synonym of Ophthalmology.

String

uniqueID

Unique identifier of Ophthalmology of the format 'terminologyId:conceptID'.

String

conceptID

The concept id.

String

source

The name of the terminology source.

String

negatedBy

Specifies the negation word, if one exists.

String


Annotation Type: de.averbis.types.health.Tensio


Table ?: Tensio Features
AttributeDescriptionType

leftEye

Tensio measurement of left eye.

Measurement

rightEye

Tensio measurement of right eye.

Measurement


Annotation Type: de.averbis.types.health.RelevantVisualAcuity

Best or actual visual acuity, selected from multiple VisualAcuity- or VisualAcuityValues.


Annotation Type: de.averbis.types.health.VisualAcuity


Table ?: VisualAcuity Features
AttributeDescriptionType

leftEye

Left eye’s visual acuity.

VisualAcuityValue

rightEye

Right eye’s visual acuity.

VisualAcuityValue


Annotation Type: de.averbis.types.health.VisualAcuityValue


Table ?: VisualAcuityValue Features
AttributeDescriptionType

fact

Normalized value of visual acuity.

String

meter

Visual acuity measured with blackboard.

Boolean

correction

Normalized value of correction during measuring visual acuity.

String

refraction

The measured refraction.

Refraction

pinHole

Visual acuity measured with pin hole.

Boolean

additionalInformation

Kind of comment, e.g. "AR_NOT_POSSIBLE", "DOES_NOT_IMPROVE".

VisualAcuityAdditionalInformation


Annotation Type: de.averbis.types.health.Refraction


Table ?: Refraction Features
AttributeDescriptionType

sphere

The spheric value of the actual refraction.

Double

cylinder

The cylinder value of the actual refraction.

Double

axis

The axis value of the actual refraction.

Double


Annotation Type: de.averbis.types.health. VisualAcuityAdditionalInformation


Table ?: Featurs of VisualAcuityAdditionalInformation
AttributeDescriptionType

normalized

Normalization of additional information on visual acuity, e.g. "AR_NOT_POSSIBLE" for "DOES_NOT_IMPROVE"

String

3.25.4. Web Service Example

Text Example1 (Tensio): "Tensio RA 13 mmHg LA 14 mmHg"

    {
      "begin": 7,
      "end": 28,
      "type": "de.averbis.types.health.Tensio",
      "coveredText": "RA 13 mmHg LA 14 mmHg",
      "id": 1580,
      "rightEye": {
        "begin": 10,
        "end": 17,
        "type": "de.averbis.types.health.Measurement",
        "coveredText": "13 mmHg",
        "id": 1582,
        "unit": "mmHg",
        "normalizedUnit": "kg/(m·s²)",
        "normalizedValue": 1733.1860000000001,
        "value": 13,
        "dimension": "[M]/([L]·[T]²)"
      },
      "leftEye": {
        "begin": 21,
        "end": 28,
        "type": "de.averbis.types.health.Measurement",
        "coveredText": "14 mmHg",
        "id": 1581,
        "unit": "mmHg",
        "normalizedUnit": "kg/(m·s²)",
        "normalizedValue": 1866.508,
        "value": 14,
        "dimension": "[M]/([L]·[T]²)"
      }


Text Example 2 (Visual Acuity): "Visus RA 0,16 (AR +1,0 -3,25 84) LA sc 1/35 (AR nicht möglich)"

    {
      "begin": 0,
      "end": 62,
      "type": "de.averbis.types.health.VisualAcuity",
      "coveredText": "Visus RA 0,16 (AR +1,0 -3,25 84) LA sc 1/35 (AR nicht möglich)",
      "id": 3043,
      "rightEye": {
        "begin": 9,
        "end": 32,
        "type": "de.averbis.types.health.VisualAcuityValue",
        "coveredText": "0,16 (AR +1,0 -3,25 84)",
        "id": 3046,
        "additionalInformation": null,
        "pinHole": false,
        "fact": "0.16",
        "refraction": {
          "begin": 14,
          "end": 32,
          "type": "de.averbis.types.health.Refraction",
          "coveredText": "(AR +1,0 -3,25 84)",
          "id": 3047,
          "sphere": 1,
          "cylinder": -3.25,
          "axis": 84
        },
        "meter": false,
        "correction": "AR"
      },
      "leftEye": {
        "begin": 36,
        "end": 62,
        "type": "de.averbis.types.health.VisualAcuityValue",
        "coveredText": "sc 1/35 (AR nicht möglich)",
        "id": 3044,
        "additionalInformation": {
          "begin": 44,
          "end": 62,
          "type": "de.averbis.types.health.VisualAcuityAdditionalInformation",
          "coveredText": "(AR nicht möglich)",
          "id": 3045,
          "normalized": "AR_NOT_POSSIBLE"
        },
        "pinHole": false,
        "fact": "1/35",
        "refraction": null,
        "meter": true,
        "correction": "SC"
      }
    }


Text Example 3 (Relevant Visual Acuity): "Visus RA 0,16 (AR +1,0 -3,25 84) LA sc 1/35 (AR nicht möglich)"

    {
      "begin": 0,
      "end": 62,
      "type": "de.averbis.types.health.RelevantVisualAcuity",
      "coveredText": "Visus RA 0,16 (AR +1,0 -3,25 84) LA sc 1/35 (AR nicht möglich)",
      "id": 3048,
      "rightEye": {
        "begin": 9,
        "end": 32,
        "type": "de.averbis.types.health.VisualAcuityValue",
        "coveredText": "0,16 (AR +1,0 -3,25 84)",
        "id": 3046,
        "additionalInformation": null,
        "pinHole": false,
        "fact": "0.16",
        "refraction": {
          "begin": 14,
          "end": 32,
          "type": "de.averbis.types.health.Refraction",
          "coveredText": "(AR +1,0 -3,25 84)",
          "id": 3047,
          "sphere": 1,
          "cylinder": -3.25,
          "axis": 84
        },
        "meter": false,
        "correction": "AR"
      },
      "leftEye": {
        "begin": 36,
        "end": 62,
        "type": "de.averbis.types.health.VisualAcuityValue",
        "coveredText": "sc 1/35 (AR nicht möglich)",
        "id": 3044,
        "additionalInformation": {
          "begin": 44,
          "end": 62,
          "type": "de.averbis.types.health.VisualAcuityAdditionalInformation",
          "coveredText": "(AR nicht möglich)",
          "id": 3045,
          "normalized": "AR_NOT_POSSIBLE"
        },
        "pinHole": false,
        "fact": "1/35",
        "refraction": null,
        "meter": true,
        "correction": "SC"
      }
    }


Text Example 4 (Ophthalmology): "Kataraktoperation"

  {
      "begin": 0,
      "end": 17,
      "type": "de.averbis.types.health.Ophthalmology",
      "coveredText": "Kataraktoperation",
      "id": 624,
      "negatedBy": null,
      "side": null,
      "matchedTerm": "Kataraktoperation",
      "dictCanon": "Katarakt-Operation",
      "conceptID": "110473004",
      "source": "Ophthalmologie_1.0",
      "uniqueID": "Ophthalmologie_1.0:110473004"
    },


3.26. Organizations

3.26.1. Description

This component detects types of organizations and gives correspondance information, e.g. if the organization is the sender of a clinical note. Please note: at present, this annotator is used exclusively to identify if the sender of a record is a German hospital and to assign a hospital type (e.g. university hospital, general hospital...) to this sender.

3.26.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.26.3. Output

Annotation Type: de.averbis.types.health.Organization


Figure ?: Features of Organization
AttributeDescriptionType
organizationType

Type of the organization.

Possible values: general hospital | university hospital | specialist clinic | rehabilitation clinic | physician's admitting hospital

String
correspondence
Contains "sender" if the organization is detected as sending party of the record.*String

*please note: at present, organizations are extracted only, if detected as "sender".

3.26.4. Terminology Binding

CountryNameVersionComment
DEHospital1.0List of german hospital names enriched with synonyms by Averbis.

3.26.5. Web Service Example

Text Example (organizations): "Zentrum für Psychiatrie Emmendingen"

    {
      "begin": 0,
      "end": 35,
      "type": "de.averbis.types.health.Organization",
      "coveredText": "Zentrum für Psychiatrie Emmendingen",
      "id": 765,
      "organizationType": "specialist clinic",
      "correspondence": "sender"
    },

3.26.6. Departments

3.26.6.1. Description

This component is part of the Organizations Annotator and annotates medical departments in clinical notes, e.g. Paediatrics, Neurology, Orthodontics...

3.26.6.2. Input
See Annotator "Organizations"
3.26.6.3. Output

Annotation Type: de.averbis.types.health.Department


Table ?: Features
AttributeDescriptionType

dictCanon

Preferred term of the department (concept) as defined in the terminology.String

matchedTerm

The term that matched to a department concept in the terminology.String

conceptID

The ID of the matched department concept in the terminology.String
departmentType

Additional information about the department type, currently limited to the information, if the department is the sending department of a clinical note. Note: there may be more than one sending department, e.g. "Division of Hematology and Oncology".

Possible values (default is underlined): null | sender

String

source

The name of the terminology source.String

uniqueID

Unique identifier of the department concept of the format 'terminologyId:conceptID'.String
negatedBy Specifies the negation word, if one exists.String
3.26.6.4. Terminology Binding

Table ?: Terminology Binding
CountryLanguagesVersionIdentifierComment
United States, GermanyEN, DE1.0

Averbis-SpecialistDepartment_1.0

Terminology of department names, composed by Averbis enriched with terms from SNOMED-CT.
3.26.6.5. Web Service Examples

Example Text (departments): Service: "Division of Hematology and Oncology"

    {
      "begin": 0,
      "end": 22,
      "type": "de.averbis.types.health.Department",
      "coveredText": "Division of Hematology",
      "id": 923,
      "negatedBy": null,
      "matchedTerm": "Hematology",
      "dictCanon": "Haematology",
      "conceptID": "394916005",
      "source": "SpecialistDepartment_1.0",
      "departmentType": "sender",
      "uniqueID": "SpecialistDepartment_1.0:394916005"
    },
    {
      "begin": 27,
      "end": 35,
      "type": "de.averbis.types.health.Department",
      "coveredText": "Oncology",
      "id": 924,
      "negatedBy": null,
      "matchedTerm": "Oncology",
      "dictCanon": "Medical oncology",
      "conceptID": "394593009",
      "source": "SpecialistDepartment_1.0",
      "departmentType": "sender",
      "uniqueID": "SpecialistDepartment_1.0:394593009"
    }


3.27. Pathology Documentation

3.27.1. Description

The component aggregates relevant information from pathology reports (e.g. diagnosis, topography, morhology, TNM, grading and others) into one annotation type. Hereby, the PathologyDocumentation Annotator determines a single valid value for each pathology feature per pathology report based on various rule-based and machine-learning based methods. This information can be used, for example, to encode reports in cancer registries.

This component is available in german only.

3.27.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.27.3. Output

Annotation Type: de.averbis.types.health.PathologyDocumentation


Table ?: PathologyDocumentation Features
AttributeDescriptionType

diagnosis

The code of the relevant diagnosis based on ICD-10, e.g. "C50.9" for

malignant neoplasm of breast of unspecified site.

diagnosis

topography

Topography code based on ICD-O, e.g. "C50.9"

topography

morphology

Morphology code based in ICD-O, e.g. "8520/3".

morphology

tumor

T-Value according to TNM classification.

String

node

N-Value according to TNM classification.

String

metastasis

M-Value according to TNM classification.

String

location Further specification of the metastasis: Pulmonary (PUL), Bone marrow (MAR), Osseous (OSS), Pleura (PLE), Hepatic (HEP), Peritoneum (PER), Brain (BRA), Adrenals (ADR), Lymph nodes (LYM), Skin (SKI), Others (OTH)String

grading

The grading of the tumor.

Possible values (default is underlined): null | U | 1 | 2 | 3 | 4

String
resultDate  Date of resultString

rClass

Residual classification (R-Classification) of the tumor.

Possible values (default is underlined): null | Rx | R0 | R1 | R2

String
side

The laterality of the pathology item, e.g. R,L,B,U

String
lymphnodesTested The amount of tested lymph nodes.Integer
lymphnodesAffected The amount of affected lymph nodes.Integer
sentinelLymphnodesTested The amount of tested sentinel lymph nodes.Integer
sentinelLymphnodesAffected The amount of affected sentinel lymph nodes.Integer
category Classification of the pathology report by organ entity, e.g. "MAMMA", "COLON", "PROSTATE"category
thickness The thickness of the tumor.thickness
lymphaticInvasion

Invasion of cancer cells in the lymphatic system.

Possible Values (default is underlined): null, L0, L1.

String
vascularInvasion

Invasion of cancer cells in the blood vascular system.

Possible Values (default is underlined): null, V0, V1, V2.

String
perineuralInvasion

Perineural Invasion of cancer cells.

Possible Values (default is underlined): null, Pn0, Pn1

String

pathologyScoresGrouping of pathological scoresPathologyScore


For the annotation types Diagnoses, Morphology and Topography the following features a referenced in the PathologyDocumentation type.


Table ?: Features of Diagnosis, Morphology, Topography linked in PathologyDocumentation
AttributeDescriptionType
dictCanon The preferred term of the Diagnosis/Morphology/Topography concept as defined in the terminology.String
matchedTerm

Matching synonym of the section concept.


conceptID The conceptID.String
uniqueID Unique identifier of a concept of the format 'terminologyId:conceptId'.String
confidence The confidence feature denotes the probability of the annotation (Diagnosis/Morphology/Topography concept) to be valid, i.e. the higher the confidence, the closer to a valid annotation.Double
source

The name of the terminology source.

String
negatedBy

Specifies the negation word, if one exists.

String


For the tumor thickness the following features are referenced in the PathologyDocumentation type.


Table ?: Features of type Thickness linked in PathologyDocumentation
AttributeDescriptionType
value The value of the tumor thickness.Double
unit

The unit of the measurement.

By default, the tumor thickness is presented in millimeters (mm).

String

NEW: Standard Feature

This type has now all standard features: 'begin' 'end'; 'type', 'coveredText' and 'id'


The category has the following features referenced in the PathologyDocumentation type:


Table ?: Features of type category linked in PathologyDocumentation
AttributeDescriptionType
label

The label of the category, e.g. MAMMA, PROSTATA, COLON, MELANOMA, BASALIOM

String
confidence The confidence feature denotes the probability of the label to be valid.Double

NEW: Standard Feature

This type has now all standard features: 'begin' 'end'; 'type', 'coveredText' and 'id'


The pathologyScores has the following features referenced in the PathologyDocumentation type.


Table ?: Features of pathologyScores linked in PathologyDocumentation
AttributeDescriptionType
gleasonScoreThe Gleason grading system is used to determine the prognosis of men with prostate cancer using sample results from a prostate biopsy.String

3.27.4. Web Service Example

Text Example (german only): "Klinische Angaben: Mammakarzinom links, cT1a bei 0,5 cm Größe. Tumorklassifikation: TNM (7.Aufl.): pT1b, pN1 (1/1 sn), L0, V0, Pn0 Grading: G2, R-Klassifikation (lokal): R0, ICD-O: 8500/3, ICD-10: C50.9."

    {
      "begin": 0,
      "end": 203,
      "type": "de.averbis.types.health.PathologyDocumentation",
      "coveredText": "Klinische Angaben: Mammakarzinom links, cT1a bei 0,5 cm Größe. Tumorklassifikation: TNM (7.Aufl.): pT1b, pN1 (1/1 sn), L0, V0, Pn0 Grading: G2, R-Klassifikation (lokal): R0, ICD-O: 8500/3, ICD-10: C50.9.",
      "id": 6775,
      "side": "L",
      "grading": "2",
      "morphology": {
        "begin": 181,
        "end": 187,
        "type": "de.averbis.types.health.Morphology",
        "coveredText": "8500/3",
        "id": 6781,
        "negatedBy": null,
        "matchedTerm": "8500/3",
        "dictCanon": "Invasives duktales Karzinom",
        "confidence": 0.999591052532196,
        "conceptID": "8500/3",
        "source": "ICD-O-DE_3.1",
        "uniqueID": "ICD-O-DE_3.1:8500/3"
      },
      "thickness": null,
      "topography": {
        "begin": 197,
        "end": 202,
        "type": "de.averbis.types.health.Topography",
        "coveredText": "C50.9",
        "id": 6779,
        "negatedBy": null,
        "matchedTerm": "C50.9",
        "dictCanon": "Brust",
        "confidence": 0.9736150503158569,
        "conceptID": "C50.9",
        "source": "ICD-O-DE_3.1",
        "uniqueID": "ICD-O-DE_3.1:C50.9"
      },
      "diagnosis": {
        "begin": 0,
        "end": 203,
        "type": "de.averbis.types.health.Diagnosis",
        "coveredText": "Klinische Angaben: Mammakarzinom links, cT1a bei 0,5 cm Größe. Tumorklassifikation: TNM (7.Aufl.): pT1b, pN1 (1/1 sn), L0, V0, Pn0 Grading: G2, R-Klassifikation (lokal): R0, ICD-O: 8500/3, ICD-10: C50.9.",
        "id": 6778,
        "negatedBy": null,
        "side": null,
        "matchedTerm": null,
        "verificationStatus": null,
        "kind": null,
        "confidence": 0.9736150503158569,
        "onsetDate": null,
        "source": "ICD10GM_2021",
        "clinicalStatus": null,
        "approach": null,
        "laterality": null,
        "dictCanon": "Bösartige Neubildung: Brustdrüse, nicht näher bezeichnet",
        "conceptID": "C50.9",
        "belongsTo": null,
        "uniqueID": "ICD10GM_2021:C50.9"
      },
      "lymphnodesTested": 1,
      "resultDate": null,
      "rClass": "R0",
      "sentinelLymphnodesTested": 1,
      "node": "pN1",
      "sentinelLymphnodesAffected": 1,
      "lymphnodesAffected": 1,
      "metastasis": "pMu",
      "tumor": "pT1b",
      "lymphaticInvasion": "L0",
      "vascularInvasion": "V0",
      "location": null,
      "category": {
        "begin": 0,
        "end": 203,
        "type": "de.averbis.types.health.TumorCategory",
        "coveredText": "Klinische Angaben: Mammakarzinom links, cT1a bei 0,5 cm Größe. Tumorklassifikation: TNM (7.Aufl.): pT1b, pN1 (1/1 sn), L0, V0, Pn0 Grading: G2, R-Klassifikation (lokal): R0, ICD-O: 8500/3, ICD-10: C50.9.",
        "id": 6776,
        "confidence": 0.9995457743445415,
        "label": "MAMMA"
      },
      "perineuralInvasion": "Pn0"
    }


3.28. Patient Information

3.28.1. Description

With this component, different information about the patient shall be detected, such as admission and discharge dates, the gender of the patient and the information as to whether the patient is deceased. In addition, if a list of patient names was imported as terminology to Averbis Health Discovery, these patient names can be extracted, too.

3.28.2. Input

Above this annotator, the following annotators must be included in the pipeline:

The HealthPreprocessing pipeline block provides the prerequisite annotation types to ensure the proper functionality of this annotator.

3.28.3. Annotation of patient names

Patient names can only be extracted from clinical notes, if they exist as an entry in a terminology called "patientnames". Therefore, the following preparations are necessary to annotate patient names.

Step 1: Create a terminology in the "Terminology Administration" with the Terminology-ID "patientnames", Concept-Type “de.averbis.textanalysis.types.health.PatientNameConcept" and language "Miscellaneous". Label and Version can be set freely. See Create your own Terminology for more details on how to create a terminology.


Figure ?: Add terminology named "patientnames"

Step 2: Import your list of patient names into the terminology using OBO-Format or enter the patient names manually into the terminology using the "Terminology Editor". In order to distinguish between first names and last names, the terms must follow the following syntax: Firstname[semicolon]Lastname, e.g. John;Doe.

Your OBO-file with patient names may look like:

[Term]

id: 1
name: Sue;Miller

[Term]
id: 2
name: John;Doe

....

Step 3: View the results of your import/editing in the "Terminology Editor" to make sure everything worked out smoothly. The imported terminology/OBO-file should contain the patients' first and last name as preferred term. Synonyms do not need to be added.


Figure ?: Entries in terminology "patientnames", view in Terminology Editor

Step 4: Switch to the "Terminology Administration" and submit the terminology to the text analytics module.


Figure ?: Submit terminology for use in text analytic pipelines.

Step 5: Reuse an existing pipeline where "Patient Information" is included or create a pipeline and include the following annotators:

Step 6: (Re)Start the pipeline. After completing steps 1 through 5, the pipeline is now ready to annotate the imported patient names.

3.28.4. Output

Annotation Type: de.averbis.types.health.PatientInformation


Table ?: Patient Information Features
AttributeDescriptionType
name Matching preferred term in the terminology "patientnames".
PatientName
gender

Gender of the patient.

Possible values (default is underlined): null, female, male

String
deathdate

Deathdate of the patient.

Date
deceased

Information as to whether the patient is deceased.

Possible values (default is underlined): false, true

Boolean



Table ?: Patient Name Features
AttributeDescriptionType
firstName The first part (before the semicolon) of the matching preferred term in the terminology "patientnames".
String
lastName The last part (after the semicolon) of the matching preferred term in the terminology "patientnames".
String



Table ?: Features of Death Date
AttributeDescriptionType
Date

Deathdate of the patient.

Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17)

String


Annotation Type: de.averbis.types.health.Hospitalisation


Table ?: Hospitalisation Features
AttributeDescriptionType

admissionDate

Date of admission to hospital.

Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17)

String

dischargeDate

Date of discharge from hospital.

Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17)

String

3.28.5. Terminology Binding for patientnames


Table ?: Terminology Binding
CountryNameVersionIdentifierComment
All<define your name><define your version>patientnames

To annotate the patient's name, a terminology with ID "patientnames" has to be created and filled with an individual list of patient names, which should be annotated.

See chapter Annotate Patient Names for more details.

3.28.6. Web Service Example

Example text (Hopitalisation, PatientInformation): "We're reporting on the patient John Doe. He stayed in our hospital from 1/01/2018 until 2/01/2018."

   {
      "begin": 72,
      "end": 97,
      "type": "de.averbis.types.health.Hospitalisation",
      "coveredText": "1/01/2018 until 2/01/2018",
      "id": 1648,
      "admissionDate": "2018-01-01",
      "dischargeDate": "2018-02-01"
    },
    {
      "begin": 0,
      "end": 98,
      "type": "de.averbis.types.health.PatientInformation",
      "coveredText": "We're reporting on the patient John Doe. He stayed in our hospital from 1/01/2018 until 2/01/2018.",
      "id": 1650,
      "firstName": null,
      "lastName": null,
      "deceased": false,
      "gender": "male",
      "deathDate": null
    }


Example text (death date): "The patient died on 2/01/2018 in the course of a multiorgan failure."

    {
      "begin": 0,
      "end": 68,
      "type": "de.averbis.types.health.PatientInformation",
      "coveredText": "The patient died on 2/01/2018 in the course of a multiorgan failure.",
      "id": 1285,
      "firstName": null,
      "lastName": null,
      "deceased": true,
      "gender": null,
      "deathDate": "2018-02-01"
    }


3.29. PHI

3.29.1. Description

Protected health information (PHI), also referred to as personal health information, generally refers to demographic information, medical histories, test and laboratory results, mental health conditions, insurance information, and other data that a healthcare professional collects to identify an individual and determine appropriate care.

This component identifies protected health information like names, dates, locations, IDs, contact information, professions and others.

3.29.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.29.3. Output

Annotation Type: de.averbis.types.health.Age

Age mentioned in the document.

Annotation Type: de.averbis.types.health.Date

All dates in the document.

Annotation Type: de.averbis.types.health.Name

AttributeDescriptionType

kind

possible values:

PATIENT | DOCTOR | OTHER

String


Annotation Type: de.averbis.types.health.Location

AttributeDescriptionType

kind

possible values:

STREET | CITY | ZIP | COUNTRY (country or nationality) | STATE | HOSPITAL | ORGANIZATION (Other organizations beside the hospital, e.g. the employer ot the patient) | OTHER

String


Annotation Type: de.averbis.types.health.Id

AttributeDescriptionType

kind

possible values:

PATIENTID | OTHER

String


Annotation Type: de.averbis.types.health.Contact

AttributeDescriptionType

kind

possible values:

PHONE | FAX | URL | EMAIL

String


Annotation Type: de.averbis.types.health.Profession

The profession of the patient or his relatives.


Annotation Type: de.averbis.types.health.PHIOther

Other protected health informations.


Annotation Type: de.averbis.types.health.DeidentifiedDocument

This type returns the deidentified text. By default it replaces the recognized PHI concepts with "X".

AttributeDescriptionType

kind

Defines the deidentification method. E.g. "crossout" replaces the PHI concepts with "X" while "tag" replaces the PHI information with a tag such as <DATE/>, <CITY/> etc.

Possible values (default is underlined): crossout | tag

Please note: the kind can be set via parameter within the PHIDeidentifier compound of the PHI Annotator.

String

kind the deidentified textString

3.29.4. Configuration

The PHI module consists of two compounds which can be configured to influence the output.


Figure: Compounds of PHI Annotator ?:
3.29.4.1. Configuration of PHIAnnotator

The PHI Annotator is provided with configuration parameters, as shown in the diagram below. Please keep the default values of the configuration parameters and not to make any changes.


Figure: ?: Configuration parameter of PHI annotator (default settings)
3.29.4.2. Configuration of PHIDeidentifier

The PHI Deidentifier compound is provided with a parameter called "deidentificationMethod".


Figure ?: parameter of PHIDeidentifier compound

The default value for the deidendifiation method is "crossout". The second method is named "tag", by which identified PHIs are replaced by a tag name.

3.29.5. Web Service Example

Text Example for Deidentification: "Mr. Jim Jack was born on 25/10/1990 in Boston."

Method: crossout

{
    "begin": 0,
    "end": 46,
    "type": "de.averbis.types.health.DeidentifiedDocument",
    "coveredText": "Mr. Jim Jack was born on 25/10/1990 in Boston.",
    "id": 5133,
    "kind": "crossout",
    "deidentifiedText": "Mr. XXX XXXX was born on XXXXXXXXXX in XXXXXX."
}


Method: tag

{
    "begin": 0,
    "end": 46,
    "type": "de.averbis.types.health.DeidentifiedDocument",
    "coveredText": "Mr. Jim Jack was born on 25/10/1990 in Boston.",
    "id": 4933,
    "kind": "tag",
    "deidentifiedText": "Mr. <NAME/> was born on <DATE/> in <LOCATION/>."
}

Text Example for further PHI-Types:

"Universitätsklinik Denzlingen, Abteilung Innere Medizin, Elzstraße 165, 54679 Denzlingen, Telefon: +49 6789 - 1234 00, Telefax: +49 6789 - 1234 01, IBAN DE89 4568 2145 5698 4565 12, http://www.uniklinik-denzlingen.de

An:
Dr. med. Markus Bernoulli
Philipp-Furtwängler-Straße 89
32568 Waldkirch


betrifft Patienten:
Hr. Fieseler, Benjamin
geboren am 5.10.53
Libellenallee 34
65432 Reute


Denzlingen, den 28. Januar 2016

Sehr geehrter Kollege,

Wir bedanken uns für die Überweisung von Herrn Benjamin Fieseler (PatientenID: 123456789) zur Entnahme der Schilddrüse und berichten Ihnen nachfolgend über dessen Aufenthalt in unserem Hause."

{
    "begin": 164,
    "end": 182,
    "type": "de.averbis.types.health.Contact",
    "coveredText": "+49 6789 - 1234 00",
    "id": 61978,
    "kind": "PHONE"
},
{   
    "begin": 211,
    "end": 245,
    "type": "de.averbis.types.health.Contact",
    "coveredText": "http://www.uniklinik-denzlingen.de",
    "id": 62076,
    "kind": "URL"
},
{
    "begin": 192,
    "end": 210,
    "type": "de.averbis.types.health.Contact",
    "coveredText": "+49 6789 - 1234 01",
    "id": 62032,
    "kind": "FAX"
},
{
    "begin": 382,
    "end": 389,
    "type": "de.averbis.types.health.Date",
    "coveredText": "5.10.53",
    "id": 43922
},
{
    "begin": 437,
    "end": 452,
    "type": "de.averbis.types.health.Date",
    "coveredText": "28. Januar 2016",
    "id": 43936
},
{
    "begin": 558,
    "end": 567,
    "type": ""de.averbis.types.health.ID",
    "coveredText": "123456789",
    "id": 44649
    "kind": "PATIENTID"
},
{
    "begin": 0,
    "end": 29,
    "type": "de.averbis.types.health.Location",
    "coveredText": "Universitätsklinik Denzlingen",
    "id": 67425,
    "kind": "HOSPITAL"
},
{
    "begin": 123,
    "end": 136,
    "type": "de.averbis.types.health.Location",
    "coveredText": "Elzstraße 165",
    "id": 69052,
    "kind": "STREET"
},
{
    "begin": 138,
    "end": 143,
    "type": "de.averbis.types.health.Location",
    "coveredText": "54679",
    "id": 70568,
    "kind": "ZIP"
},
{
   "begin": 144,
   "end": 154,
   "type": "de.averbis.types.health.Location",
   "coveredText": "Denzlingen",
   "id": 70772,
   "kind": "CITY" 
},
{
   "begin": 280,
   "end": 309,
   "type": "de.averbis.types.health.Location",
   "coveredText": "Philipp-Furtwängler-Straße 89",
   "id": 69194,
   "kind": "STREET"
},
{
    "begin": 310,
    "end": 315,
    "type": "de.averbis.types.health.Location",
    "coveredText": "32568",
    "id": 70598,
    "kind": "ZIP"
},
{
    "begin": 316,
    "end": 325,
    "type": "de.averbis.types.health.Location",
    "coveredText": "Waldkirch",
    "id": 70790,
    "kind": "CITY"
},
{
    "begin": 390,
    "end": 406,
    "type": "de.averbis.types.health.Location",
    "coveredText": "Libellenallee 34",
    "id": 69280,
    "kind": "STREET"
},
{
    "begin": 407,
    "end": 412,
    "type": "de.averbis.types.health.Location",
    "coveredText": "65432",
    "id": 70612,
    "kind": "ZIP"
},
{
    "begin": 413,
    "end": 418,
    "type": "de.averbis.types.health.Location",
    "coveredText": "Reute",
    "id": 70804,
    "kind": "CITY"
},
{
    "begin": 421,
    "end": 431,
    "type": "de.averbis.types.health.Location",
    "coveredText": "Denzlingen",
    "id": 70818,
    "kind": "CITY"
},
{
    "begin": 263,
    "end": 279,
    "type": "de.averbis.types.health.Name",
    "coveredText": "Markus Bernoulli",
    "id": 65723,
    "kind": "DOCTOR"
},
{
    "begin": 352,
    "end": 370,
    "type": "de.averbis.types.health.Name",
    "coveredText": "Fieseler, Benjamin",
    "id": 65785,
    "kind": "PATIENT"
}


3.30. Physical Therapies

3.30.1. Description

The component annotates physical therapies (e.g. cryotherapy, occupational therapy) from clinical notes.

3.30.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.30.3. Output

Annotation Type: de.averbis.types.health.PhysicalTherapy


Table ?: Features
AttributeDescriptionType
dictCanon Preferred term of the physical therapy.
String
matchedTerm The matching synonym of the physical therapy.
String

uniqueID

Unique identifier of a concept of the format 'terminologyId:conceptID'.
String

                  
                    conceptID
                  
                
The concept id of the physical therapy.
String

                  
                    source
                  
                
The identifier of the terminology.
String

negatedBy

Specifies the negation word, if one exists.

String

status

Describes the status of the physical therapie.

Possible values (default is underlined): null | PLANNED | CANCELED | COMPLETED | NEGATED

String

3.30.4. Terminology Binding


Table ?: Terminology Bindings
CountryNameVersionIdentifierComment
EN,DEAverbis -Therapy1.0Averbis-Therapy_1.0Averbis' own multilingual terminology for physical and related therapies.

3.30.5. Web Service Example

Text Example: "Ergotherapy terminated at patient's request."

    {
      "begin": 0,
      "end": 11,
      "type": "de.averbis.types.health.PhysicalTherapy",
      "coveredText": "Ergotherapy",
      "id": 949,
      "negatedBy": null,
      "matchedTerm": "ergotherapy",
      "dictCanon": "ergotherapy",
      "conceptID": "PT000003",
      "source": "AverbisTherapies_1.0",
      "uniqueID": "AverbisTherapies_1.0:PT000003",
      "status": "CANCELED"
    },


3.31. Procedures

3.31.1. Description

The component annotates surgical procedures from clinical notes. Optional: As an additional annotation to the procedures component  ProcedureCandidate can be visualized too. This component can be optionally activated which specifically detect procedures candidates to optimize DRG coding.

This component is currently only available in English.

3.31.2. Input

Above this annotator, the following annotators must be included in the pipeline:

To get the full functionality, the following annotators should also be included below this annotator in the given order:

3.31.3. Output

Annotation Type: de.averbis.types.health.Procedure


Table ?: Features
AttributeDescriptionType

dictCanon

Preferred term of the procedure.

String

matchedTerm

The matching synonym of the procedure.

String

uniqueID

Unique identifier of a concept of the format 'terminologyId:conceptID'.

String

conceptID

The concept id of the procedure.

String

source

The identifier of the terminology.

String

negatedBy

Specifies the negation word, if one exists.

String

status

Describes the status of the procedure.

Possible values (default is underlined): null | PLANNED | CANCELED | COMPLETED | NEGATED

String

side

The laterality of the procedure.

Possible values (default is underlined): null | RIGHT | LEFT | BOTH

String

date

The date of the procedure.

Format: YYYY-MM-DD: Year-Month-Day with leading zeros (e.g. 2020-02-17)

String


Annotation Type (optional): de.averbis.types.health.ProcedureCandidate


Table ?: Features
AttributeDescriptionType

dictCanon

Preferred term of the procedure.

String

conceptID

The concept id of the procedure.

String

approach

Information about the text mining approach used to generate the annotation.

Possible values: DictionaryLookup | SimilarityMatching | DocumentClassification | DerivedByLabValue

String

confidence

For approaches using machine learning (e.g. "DocumentClassification"), the confidence is calculated that the respective annotation has been correctly generated.

Possible value range: 0-1

Note: Annotations generated with non-machine learning approaches such as terminology mappings (approach = "DictionaryLookup") are reflected with a confidence value of 0.

Double

3.31.4. Terminology Binding


Table ?: Terminology Bindings
CountryNameVersionIdentifierComment

United States

SNOMED-CT-US

2018-09-01

SNOMED-CT-US_2018-09-01

The SNOMED CT United States (US) Edition, subtree of concept "387713003 Surgical procedure (procedure)"

3.31.5. Web Service Example

Text example for Procedure: "Hysterectomy was not performed due to hypertension of the patient."

    {
      "begin": 0,
      "end": 12,
      "type": "de.averbis.types.health.Procedure",
      "coveredText": "Hysterectomy",
      "id": 1052,
      "date": null,
      "negatedBy": null,
      "side": null,
      "matchedTerm": "Hysterectomy",
      "dictCanon": "Hysterectomy (procedure)",
      "conceptID": "236886002",
      "source": "SNOMED-CT-Procedures_20180901",
      "uniqueID": "SNOMED-CT-Procedures_20180901:236886002",
      "status": "CANCELED"
    }


Text example for ProcedureCandidate: "Hysterectomy was not performed due to hypertension of the patient."

    {
      "begin": 0,
      "end": 12,
      "type": "de.averbis.types.health.ProcedureCandidate",
      "coveredText": "Hysterectomy",
      "id": 1052,
      "dictCanon": "Hysterectomy (procedureCandidate)",
      "conceptID": "236886002",
      "approach": "DocumentClassification",
      "confidence": 0,
    }

3.32. Receptors

3.32.1. Description

This component detects different receptors important in oncology. Currently the annotation is restricted to the receptors HER2 (human epidermal growth factor receptor 2), Progesterone and Estrogen. Additionally, an interpretation and the percentage will be extracted, if available.

3.32.2. Input

Before this annotator, the following annotator must be included in the pipeline:

3.32.3. Output

Annotation Type: de.averbis.types.health.HER2


Table ?: HER2 Features
AttributeDescriptionType

status

The status of the HER2 expression.

Possible values (default is underlined): -,+,++,+++

String

percentage

The percentage represented as a measurement.

Measurement


Annotation Type: de.averbis.types.health.EstrogenReceptor


Table ?: Features
AttributeDescriptionType

status

The status of the Estrogen expression.

Possible values (default is underlined): -,+,++,+++

String

percentage

The percentage represented as a measurement.

Measurement


Annotation Type: de.averbis.types.health.ProgesteroneReceptor


Table ?: Features
AttributeDescriptionType

status

The status of the Progesterone expression.

Possible values (default is underlined): -,+,++,+++

String

percentage

The percentage represented as a measurement.

Measurement

3.32.4. Web Service Example

Example Text (Her2): "Her2 positive (45%)"

    {
      "begin": 0,
      "end": 18,
      "type": "de.averbis.types.health.HER2",
      "coveredText": "Her2 positive (45%)",
      "id": 1119,
      "percentage": {
        "begin": 15,
        "end": 18,
        "type": "de.averbis.types.health.Measurement",
        "coveredText": "45%",
        "id": 1120,
        "unit": "%",
        "normalizedUnit": "",
        "normalizedValue": 0.45,
        "value": 45,
        "dimension": ""
      },
      "status": "++"
    }


Example Text (Progesterone): "PROGESTERONE RECEPTOR: POSITIVE (10%)"

    {
      "begin": 0,
      "end": 36,
      "type": "de.averbis.types.health.ProgesteroneReceptor",
      "coveredText": "PROGESTERONE RECEPTOR: POSITIVE (10%",
      "id": 1222,
      "percentage": {
        "begin": 33,
        "end": 36,
        "type": "de.averbis.types.health.Measurement",
        "coveredText": "10%",
        "id": 1223,
        "unit": "%",
        "normalizedUnit": "",
        "normalizedValue": 0.1,
        "value": 10,
        "dimension": ""
      },
      "status": "++"
    }


3.33. RutaEngine

3.33.1. Description

The RutaEngine is a generic annotator which interprets and executes a rule-based scripting language for Apache UIMA, called UIMA Ruta. Due to its generic nature, the annotator is able create and modify all available types of annotations.

Detailed documentation on the use of Ruta can be found at the Apache UIMA official manual.

3.33.2. Input

This general RutaEngine annotator does not expect any annotations.

If special characters are to be annotated - before this annotator, the following annotator must be included in the pipeline:

In addition, the typesystem "de.averbis.textanalysis.typesystems.AverbisTypeSystem" should be imported as well (see figure below).


Figure ?: Example of a Ruta rule with special characters


3.33.3. Output

The Entity type is described here as a exemplary and recommended placeholder for possible types of annotations that are created by this annotator. Entity is a generic type which semantics are specified by its features label and value.

Annotation Type: de.averbis.extraction.types.Entity


Table ?: Features of Entity
AttributeDescriptionType

value

This feature provides the text of the annotated mention.

String

label

The type of the entity; e.g., PERSON, LOCATION etc.

String

3.33.4. Configuration

NameDescriptionTypeMultiValuedMandatory

rules

A String parameter representing the rule that should be applied by the analysis engine. If set, it replaces the content of file specified by the mainScript parameter.

String

false

false

3.33.5. Web Service Example

Example Ruta Script:

"pack years|py|pack year" -> Keyword; 
(n:NUM k:Keyword){-> CREATE(Entity, "label" = k.ct, "value" = n.ct)};
 

Example Text (Ruta Script applied): "40 pack years"

{
    "begin": 0,
    "end": 12,
    "type": "de.averbis.types.health.Entity",
    "coveredText": "40 pack years",
    "id": 626,
    "label": "pack years",
    "value": "40" 
}


3.34. Specimen 

3.34.1. Description

During surgical procedures or biospies, tissue or fluid samples are often taken. In pathological reports, these samples or so-called specimens are often listed and described individually. This annotator extracts specimen information in pathology reports. The specimen annotator is a so-called complex annotator and contains several individual annotators such as Morphology, Topography and Diagnoses.

3.34.2. Input

Before this annotator, the following annotator must be included in the pipeline:

3.34.3. Output

Annotation Type: de.averbis.types.health.Specimen


Table ?: Features of Specimen
AttributeDescriptionType

identifier

Identifier of the specimen.

Possible values: SINGLE (if only one specimen identified) or Number of Specimen (e.g. 1, 2, 3....)

String

morphology

The morphology of the specimen, according to ICD-O (see Morphology for more details).

Morphology

topography The topography of the specimen, according to ICD-O. (see Topography for more details).Topography
diagnosis Diagnosis information of the specimen (see Diagnoses for more details).Diagnosis
laterality

Laterality of the specimen.

Possible values (default is underlined): null | RIGHT | LEFT | BOTH

String
descriptions Description text, if available.StringArray

3.34.4. Web Service Example

Example Text: "SKIN, LEFT FOREARM (BIOPSY): BASAL CELL CARCINOMA EXTENDING TO THE EXCISION MARGINS"

    {
      "begin": 0,
      "end": 83,
      "type": "de.averbis.types.health.Specimen",
      "coveredText": "SKIN, LEFT FOREARM (BIOPSY): BASAL CELL CARCINOMA EXTENDING TO THE EXCISION MARGINS",
      "id": 1444,
      "identifier": "SINGLE",
      "morphology": {
        "begin": 29,
        "end": 49,
        "type": "de.averbis.types.health.Morphology",
        "coveredText": "BASAL CELL CARCINOMA",
        "id": 1443,
        "negatedBy": null,
        "matchedTerm": "Basal cell carcinoma",
        "dictCanon": "Basal cell carcinoma, NOS",
        "confidence": 0,
        "conceptID": "8090/3",
        "source": "ICD-O-Morphology-EN_3.1",
        "uniqueID": "ICD-O-Morphology-EN_3.1:8090/3"
      },
      "topography": {
        "begin": 0,
        "end": 18,
        "type": "de.averbis.types.health.Topography",
        "coveredText": "SKIN, LEFT FOREARM",
        "id": 1446,
        "negatedBy": null,
        "matchedTerm": "Skin left forearm",
        "dictCanon": "Skin of upper limb and shoulder",
        "confidence": 0,
        "conceptID": "C44.6",
        "source": "ICD-O-Topography-EN_3.1",
        "uniqueID": "ICD-O-Topography-EN_3.1:C44.6"
      },
      "diagnosis": {
        "begin": 0,
        "end": 83,
        "type": "de.averbis.types.health.Diagnosis",
        "coveredText": "SKIN, LEFT FOREARM (BIOPSY): BASAL CELL CARCINOMA EXTENDING TO THE EXCISION MARGINS",
        "id": 1445,
        "negatedBy": null,
        "side": null,
        "matchedTerm": null,
        "verificationStatus": null,
        "kind": null,
        "confidence": 0,
        "onsetDate": null,
        "source": "ICD10CM_2021",
        "clinicalStatus": null,
        "approach": null,
        "laterality": null,
        "dictCanon": "Basal cell carcinoma of skin of left upper limb, including shoulder",
        "conceptID": "C44.619",
        "belongsTo": null,
        "uniqueID": "ICD10CM_2021:C44.619"
      },
      "descriptions": null,
      "laterality": "LEFT"
    }


3.35. TNM

3.35.1. Description

This component detects and annotates abbreviated notations and free-text remarks of the TNM classification. It is able to identify the tumor (T), node (N), metastasis (M), the grading of tumor.

3.35.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.35.3. Output

The following TNM types exist:


Table ?: Types of TNM
AttributeDescriptionType

tumor

The tumor annotations.

TNMTumor

node

The node annotations.

TNMNode

metastasis

The metastasis annotations.

TNMMetastasis

grading

The grading annotations.

TNMGrading

LymphaticInvasion The annotation of lymphatic invasion.TNMAdditional
VascularInvasion The annotation of vascular invasion.TNMAdditional
PerineuralInvasion The annotation of perineural invasion.TNMAdditional


The TNM types "TNMTumor", "TNMNode", "TNMMetastasis" and "TNMGrading" have the following attribute:


Table ?: Attribute TNM Types
AttributeDescriptionType

value

The value of the TNM classification or grading.

String


The TNM type "TNMAdditional" has the following attributes:


Table ?: Attributes of type "TNMAdditional"
AttributeDescriptionType
label

The label of the additional TNM classification.

Possible values are: LymphaticInvasion, VascularInvasion, PerineuralInvasion

String

value

The value of L,V or Pn.

Possible values are (default is underlined):

  • null, L0, L1 for lympathic invasion
  • null, V0, V1, V2 for vaschular invasion
  • null, Pn0, Pn1 for perineural invasion

String

3.35.4. Web Service Example

Text Example: "pTis, N0, Mx, G3, L1, V0, Pn0"

   {
      "begin": 18,
      "end": 20,
      "type": "de.averbis.types.health.TNMAdditional",
      "coveredText": "L1",
      "id": 3995,
      "label": "LymphaticInvasion",
      "value": "L1"
    },
    {
      "begin": 22,
      "end": 24,
      "type": "de.averbis.types.health.TNMAdditional",
      "coveredText": "V0",
      "id": 4003,
      "label": "VascularInvasion",
      "value": "V0"
    },
    {
      "begin": 26,
      "end": 29,
      "type": "de.averbis.types.health.TNMAdditional",
      "coveredText": "Pn0",
      "id": 4011,
      "label": "PerineuralInvasion",
      "value": "Pn0"
    },


    {
      "begin": 14,
      "end": 16,
      "type": "de.averbis.types.health.TNMGrading",
      "coveredText": "G3",
      "id": 1903,
      "value": "G3"
    },
    {
      "begin": 10,
      "end": 12,
      "type": "de.averbis.types.health.TNMMetastasis",
      "coveredText": "Mx",
      "id": 1904,
      "value": "Mx"
    },
    {
      "begin": 6,
      "end": 8,
      "type": "de.averbis.types.health.TNMNode",
      "coveredText": "N0",
      "id": 1905,
      "value": "N0"
    },
    {
      "begin": 0,
      "end": 4,
      "type": "de.averbis.types.health.TNMTumor",
      "coveredText": "pTis",
      "id": 1906,
      "value": "Tis"
    }
        

3.36. TumorStage

3.36.1. Description

The annotator extracts stage information concerning tumors like 'end-stage' or 'stage II-B'.

3.36.2. Input

3.36.3. Output

Annotation Type: de.averbis.types.health.TumorStage


Table ?: TumorStage Features
AttributeDescriptionType

stage

Numeric value of the tumor stage

Possible values: NULL, 1,2,3,4

String

modifier

Modifier of the tumor stage, e.g. a, b, c

String

3.36.4. Web Service Example

Example Text: "Mamma Ca stage IIa"

    {
      "begin": 9,
      "end": 18,
      "type": "de.averbis.types.health.TumorStage",
      "coveredText": "stage IIa",
      "id": 931,
      "stage": "2",
      "modifier": "a"
    }


3.37. Topography

3.37.1. Description

This component detects topography. It is mainly used in pathology reports.

3.37.2. Input

Above this annotator, the following annotators must be included in the pipeline:

3.37.3. Output

Annotation Type: de.averbis.types.health.Topography


Table ?: TopographyConcept Features
AttributeDescriptionType

dictCanon

Preferred term of the Topography.

String

matchedTerm

Matching synonym of the Topography.

String

uniqueID

Unique identifier of the Topography of the format 'terminologyId:conceptID'.

String

conceptID

The concept id.

String

source

The name of the terminology source.

String

negatedBy

Specifies the negation word, if one exists.

String

confidenceThe confidence feature denotes the probability of the annotation (Diagnosis/Morphology/Topography concept) to be valid, i.e. the higher the confidence, the closer to a valid annotation.Double

3.37.4. Terminology Binding


Table ?: Terminology Bindings
CountryNameVersionIdentifierComment

United States

ICD-O

3.1

ICD-O_3.1

International Classification of Diseases for Oncology WHO edition, enriched with synonyms by Averbis.

Germany

ICD-O-DE

3.1

ICD-O-DE_3.1

International Classification of Diseases for Oncology German Edition, enriched with synonyms by Averbis.

3.37.5. Web Service Example

Text Example: "Adenocarcinoma of the Rectum"

    {
      "begin": 22,
      "end": 28,
      "type": "de.averbis.types.health.Topography",
      "coveredText": "Rectum",
      "id": 981,
      "negatedBy": null,
      "matchedTerm": "Rectum",
      "dictCanon": "Rectum, NOS",
      "confidence": 0,
      "conceptID": "C20.9",
      "source": "ICD-O-Topography-EN_3.1",
      "uniqueID": "ICD-O-Topography-EN_3.1:C20.9"
    }


3.38. WordlistAnnotator

3.38.1. Description

The WordlistAnnotator allows users to directly embed simple wordlists into pipelines. It identifies words from the wordlist in texts and creates an annotation of type Entity. Optionally, a 'label' and a 'value' can be specified in columns 2 and 3 of the wordlist to fill the corresponding attributes of type Entity (see example below).

3.38.2. Input

Above this annotator, the following annotator must be included in the pipeline:

3.38.3. Configuration


Table ?: Configuration Parameters
NameDescriptionTypeMultiValuedMandatory

delimiter

The separator of different terms in the wordlist, separating the searched term from its features.

String
false
true

ignoreCase

Option to ignore the case of the terms in the wordlist.

Possible values (default is underlined): ACTIVE | INACTIVE

boolean
false
true

onlyLongest

Option to filter matches that are part of a longer match. Example: 'diabetes mellitus' but not 'diabetes'.

Possible values (default is underlined): ACTIVE | INACTIVE

boolean
false
true

wordlist

The wordlist (dictionary) content.

The first line contains the complete package name of type Entity. If columns 2 and 3 are filled, line 1 has to be filled with the attribute names 'label' and 'value'.

The remaining lines contain the words of the wordlist (column 1) and optionally 'label' and 'value' values (columns 2 and 3).

Example Wordlist:

de.averbis.extraction.types.Entity;label;value

Lip;Organ;C00

Tongue;Organ;C01

String
false
false

3.38.4. Output

The annotator creates an annotation of type Entity.

Exemplary Annotation Type: de.averbis.extraction.types.Entity


Table ?: Features
AttributeDescriptionType
label Represents the string in the feature "label" of the matched term in the wordlist.String
value Represents the string in the feature "value" of the matched term in the wordlist.String

3.38.5. WebService Example

Text Example: "The lip"

{
    "begin": 4,
    "end": 7,
    "type": "de.averbis.types.health.Entity",
    "coveredText": "lip",
    "id": 306,
    "label": "Organ",
    "value": "C00",
}

3.39. AnnotationMapper

3.39.1. Description

The AnnotationMapper is an extension of WordlistAnnotator and allows the user to perform simple annotation mapping based on the predefined configuration parameter “mappingList”. The searched term and its features should be separated by a “delimiter”. Annotations of the type given by the parameter “sourceType” are investigated. If the value of their feature “sourceFeature” is present in the “mappingList”, then a new annotation of the type “targetType” is created and its features specified in “targetFeatures” are filled with values defined in the remaining columns (in the same order).


Figure ?: AnnotationMapper


3.39.2. Input

The component expects the annotations of the types specified by the user in the configuration parameters “sourceType” and optionally “sourceFeature”.

Above this annotator, the following annotator must be included in the pipeline:

3.39.3. Configuration



Table ?: Configuration Parameters

NameDescriptionTypeMultiValuedMandatory

delimiter

The separator of different terms in the wordlist, separating the searched term from its features.

String
false
true

mappingList

The dictionary content specifying the mapping. The first column defines the source feature value. The remaining columns specify the target feature values.
String
false
false

sourceFeature

The feature name for the source annotations. Annotations with the features values given in mappingList will be mapped.
String
false
true

sourceType

The type name of the source annotations.
String
false
true
targetFeatureThe feature names for the target annotations. These features will be filled for the newly created annotations according to the mappingList
String
true
true
targetTypeThe type name of the target annotations
String
false
true
ignoreCaseOption to ignore the case of terms in the dictionary
Boolean
false
true
ignorePatternA regular expression for text occurrences that should be ignored by the dictionary lookup.
String
false
true

3.39.4. Output

The component creates concepts of the types which have been set in the configuration parameter “targetType”.


Exemplary Annotation Type: de.averbis.extraction.types.Concept



Table ?: TargetFeatures

AttributeDescriptionType
conceptIDRepresents the string in the feature "label" of the matched term in the wordlist.String

3.39.5. WebService Example

Text Example: "Headache"


{
      "begin": 4,
      "end": 39,
      "type": "de.averbis.extraction.types.Concept",
      "coveredText": "Headache",
      "id": 817,
      "conceptID": "M30.1"
}

4. Available Text Mining Pipelines

The respective components are described in detail in Annotators.

4.1. deid Pipeline

4.1.1. Description

This experimental pipeline identifies protected health information (PHI) like names, dates, locations, IDs, contact information, professions and others. The resulting annotations can be used for deidentification procedures.

4.1.2. Components

The following components are part of the deid pipeline:

4.2. Discharge Pipeline

4.2.1. Description

This pipeline extracts the basic medical information in physician letters. Since these letters mainly originate when patients are discharged from the hospital or transferred to another doctor, they are called discharge letters. After some preprocessing, this pipeline annotates information concerning diagnoses, laboratory values and medications. The resulting annotations undergo a postprocessing considering enumerations, negations, disambiguity and possible status.

4.2.2. Components

The following components are part of the discharge pipeline:

4.3. Ophthalmology Pipeline

4.3.1. Description

This pipeline extracts information concerning diagnoses, laboratory values, medications, negations, visual acuity, tensio and further information in the field of ophthalmology.

4.3.2. Components

The following components are part of the discharge pipeline:

4.4. Pathology Pipeline

4.4.1. Description

This pipeline extracts information from pathology reports. After some preprocessing, this pipeline annotates information concerning diagnoses, morphology, topography and TNM classification.

4.4.2. Components

The following components are part of the discharge pipeline:

4.5. Transplantation Pipeline

4.5.1. Description

This pipeline extracts information concerning diagnoses, laboratory values, medications, graft-versus-host-disease, conditioning regimens and negations from physician letters after a transplantation,

4.5.2. Components

The following components are part of the discharge pipeline:

5. GUI Overview

5.1. Welcome Screen

Users with administration rights can create new users and projects. When these users are logged in, they can see the "Project administration" and "User administration" areas.


Figure 1: Home page of an administration user


5.2. Project administration

In the project administration area, you first see a list with all projects that are currently available in the system.


Figure 2: Overview of available projects


  • Name: name of the project. The name also functions as a link to the corresponding project. The link goes to the project’s overview page.

  • Description: description of the project.

  • Operations | Edit project: this allows you to modify the name and the description of the project.

  • Operations | Delete project: this allows you to delete a project.

Below the table is a button that you can use to create a new project.

5.3. User administration

In the user administration area, you first see a list with all local user accounts that are currently available in the system. This list can be filtered using the text box on the top left.



Figure 3: Overview of registered users.


  • Username: the user’s login name.

  • Lastname: the user’s last name.

  • Firstname: the user’s first name.

  • Email: the user’s email address.

  • Blocked: if a user is temporarily blocked, a padlock icon is displayed here.

  • Administrator: if the user is an administrator, a checkmark is displayed here.

  • Local Account: Indicates if this user is a local user.
  • Operations | Rights: using this button you can see an overview of the rights that the user currently has. Rights cannot be edited here. Editing rights is done using the corresponding button in each project.

  • Operations | Edit: in the Edit dialog, you can edit the user profile data (firstname, lastname, email address). You can also use this dialog to block a user.

  • Operations | Change password: this allows you to enter a new user password.

  • Operations | Delete user: this allows you to delete the user.

Below the table is a button that you can use to create a new user.

5.3.1. Add and/or edit users

Use the 'Create new user' or 'Edit user' button to open a dialog and edit the user’s metadata.



Figure 4: Create new user.


In addition to editing the profile metadata, you can also assign an initial password when creating the user (to edit the password of an already existing user, please use the corresponding 'Change password' button in the user administration overview table).

You can also use this dialog to block the user.

5.3.2. Change password

Using the Change password buttons you can open a dialog which allows you enter a new password.



Figure 5: Changing the password of an existing user.


5.4. LDAP Configuration

supports LDAP to authenticate users and groups against directory services, e.g. Microsoft Active Directory. The LDAP configuration must be done from a local user account with admin privileges. It includes the following configuration parameters:

  • Display Name: An identifier that is displayed on the login page.
  • LDAP URL: The URL of the LDAP server.
  • Search Base: Defines the starting point for the search in your directory tree.
  • Manager Account Name: The distinguished name (DN) of the manager account. This account needs at least read access to the LDAP groups you want to integrate into .
  • Manager Account Password: The password of the manager account.
  • User Attribute: The unique identifier for users that will be used on the login page.
  • User Filter: The filter that selects the users from LDAP. The Test Query button can be used to validate the filter.
  • Admin Filter: The filter that selects the user with administrator privileges from LDAP. The Test Query button can be used to validate the filter.
  • Enabled: Activated or deactivates a LDAP configuration.

If you're using LDAP over SSL (LDAPS) please make sure to add your certificate to the JAVA trust store. This can be done using the following command:

keytool -import -alias yourCertificate.pem -file yourCertificate.pem -keystore $JAVA_HOME/lib/security/cacerts -storepass changeit

The users in LDAP are required to have an attribute named distinguishedName. Without this attribute LDAP users will not be able to log on to


supports multiple LDAP configurations for multiple domains.

One configuration can be selected as default, which means that it will be pre-selected on the login page:

5.5. General guidelines

When a user without global administration rights opens the application, his/her home page contains an overview of the projects assigned to this user (My projects). The project names act as links to the corresponding projects. On the project overview page, the user can find all the functions for which he/she has the relevant project rights.



Figure 6: Home page of a non-administrator user


After selecting a project, a page is displayed with a list of all the modules in the project. This list is also available on other pages with the project navigation menu in the upper right area.



Figure 7: Overview page of a project with buttons for opening each module.


5.6. Language and web interface localization

The web interface is currently available in German and English. The language is recognized automatically from the browser or the system settings of your operating system and the content of the user interface is displayed in the corresponding language.

5.7. Outer navigation bars

The top and left side outer navigation bars can be hidden when required. This saves space when the navigation tools are not required. To show/hide the navigation bars, click the small menu icon on the upper right edge of the application.




Figure 8: Menu icon to show/hide the outer navigation bars

5.8. Keyboard Shortcuts

To simplify working with the application, some functions are implemented with keyboard shortcuts. Press Shift + ? to display a summary of the defined shortcuts.


Figure 9: Summary of all defined keyboard shortcuts. Open with Shift + ?


5.9. Flash messages

To provide information about the progress and outcome of processes or to display general information flash messages are displayed that are standard for all applications. The background of the flash messages differs according to the message category. Information messages are blue, success messages green, error messages red. Flash messages disappear automatically after a few seconds. Flash messages that display errors however remain displayed until they are closed manually by the user.



Figure 10: Flash messages that display errors are closed by clicking the cross mark in the top right corner.


5.10. Documentation

Complete user documentation is available that describes the functionality of each component. This documentation can be accessed directly from the help menu in the navigation bar on the left side of the web interface.

5.11. Embedded help

In addition to the complete online help, you can find information in several places directly embedded in the interface. You can access this wherever you see a blue question mark on a white background. Move the mouse cursor over the question mark.



Figure 11: Embedded help


6. Connector Management & Document Import

6.1. Managing Standard Connectors

Connectors are used to import documents into the system. A connector monitors a specific resource (like a file system or a database), automatically imports new documents and updates changes so that imported documents are kept in sync with the document source. Connectors can also be scheduled to certain times of day, for example to import and update documents only at night and reduce system load during office hours.

Connectors can be created and administered on the connector management page. The figure below shows the connector management with the list of all connectors that have been created within the current project:


connectorAdminOverview


Figure 1: Overview of all connectors.


  • Connector: The name of the connector.

  • Type: The connector type. For example file connector or database connector.

  • Active: Indicates whether the connector is active. Only active connectors import and update documents.

  • Schedules: Displays the periods of time in which the connector is active. 0-24 means that the connector is active 24 hours a day.

  • Statistics: The statistics show the following values

    • Documents whose URLs have been reported by the connector.

    • Documents that have already been requested by the connector and whose contents have been received.

    • Documents that have already been enriched with metadata.

    • Documents that have already been saved.

  • Actions | Start connector : Starts the connector.

  • Actions | Stop connector : Stops the connector.

  • Actions | Reset connector : If you reset a connector, all documents from this connector are re-imported.

  • Actions | Edit connector : Opens the edit connector dialog. All parameters except the connector name can be edited.

  • Actions | Edit mapping : Opens the edit mapping dialog where connector matadata fields like title and content can be mapped to document fields.

  • Actions | Schedule connector : Opens the schedule dialog.

  • Actions | Delete documents of connector : Deletes all documents that have been imported by the connector.

  • Actions | Delete connector : Deletes the connector. All documents that have been import by the connector will be deleted as well.

In order to create a new connector, the connector type has to be selected first. After clicking the Create connector button the connector can be configured in the create new connector dialog. Please refer to the connector specific documentation for further details.


6.1.1. File System Connector

A file system connector imports documents from file system resources. It monitors one or multiple directories (including sub-directories) and imports documents from files in these directories. The following file types are supported:

  • .txt
  • .pdf
  • .doc/docx

  • .ppt/pptx
  • .xls/xlsx
  • .html

There are currently two implementations: FileConnectorType and AverbisFileConnectorType. The AverbisFileConnectorType remembers the current position when stopping, so that it does not start from the beginning when restarting.

A file system connector can be configured using the following parameters:

  • Name: Name of the connector. This name can be chosen freely and serves e. g. as label within the connector overview. They must not contain spaces and special characters nor underscores.

  • Start paths: For each line, you can specify a file system path that is taken into account by the connector. The connector runs through these directories recursively, i. e. all subdirectories are considered.

  • Exclude pattern: Here you can specify patterns to exclude certain files or file types (Black List).

  • Include pattern (optional): Here you can specify patterns to include certain files or file types only (White List).

The file system connector can only read local file systems. For docker deployments, the directories from which data is to be imported have to be mounted into the gcm container. This can be done by adding an additional volume to the gcm service in the docker-compose.yml file. In the example below the /externalData directory on the docker host will be mounted to /data in the gcm container.

gcm:
  image: registry.averbis.com/gcm/gcm:X.X.X
  ...
  volumes:
    - gcmVol-hd:/opt/resources/connector-manager/home
    - gcmvoljdbchd:/opt/apache-karaf/lib/ext
    - /externalData:/data

Please make sure to restart the docker containers with docker-compose up -d to apply the changes.


6.1.2. Database Connector

With a database connector, structured data can be imported from a database connection. The database connector supports JDBC compliant databases and can crawl database tables using SQL queries. Each row from the SQL query result is treated as a separate document. The database connector keeps track of changes that are made to the database tables and synchronizes these changes automatically into .

In order to use the database connector, the database JDBC driver has to be provided to the Tomcat server instance that is running . Please ask your system administrator to put the database JDBC driver library into Tomcats lib directory.


The database connector can be configured using the following parameters:

  • Name: Name of the connector. This name can be chosen freely and serves e. g. as label within the connector overview. They must not contain spaces and special characters nor underscores.

  • JDBC Driver Classname: Fully qualifying class name of the database JDBC driver. E.g. com.mysql.jdbc.Driver

  • JDBC Connection URL: JDBC connection URL to the database. E.g. jdbc:mysql://localhost:3306/documentDB

  • Username: Database username.

  • Password: Database password.

  • Traversal SQL Query: SQL select query. E.g. SELECT id, title, content FROM documents

  • Primary Key Fields: Name of the column that represents the primary key and identifies a table row. E.g. id

The database connector default field mapping concatenates all queried columns (like id, title and content) and maps it into the document field named content. The field mapping can be configured in the connector field mapping dialog (See section Editing field mappings for further details). The figure below shows a custom field mapping that maps the database columns to document fields. The id column is mapped to the document_name field, title and content are mapped to identical document fields.


connectorAdminFieldMapping


Figure 2: Database connector custom field mapping.

6.1.3. Editing field mappings

Connectors read different sources to extract structured data from them. The extracted data is then written to fields of a solr core. Field mappings define which information from the original documents is written to which fields of the Solr Index.

Specific default mappings can be specified for each index and connector throughout the system. These are automatically taken into account when a new connector is created.

When editing the field mappings, select a connector field on the left. On the right, select the core field in which you want the connector to write this data. All core fields that have been activated in the Solr schema configuration and are writable are available here. In addition to editing the default mappings, you can also specify further mappings or remove existing ones.

You can also specify a sequence for the mappings. This order is relevant when mapping multiple connector fields to a core field. If the core field can contain more than one value, it lands in the field in the order specified here. If the core field can only contain one value, it will be the value that is the lowest in the mapping sequence.

After you have edited a field mapping, you must reset the connector so that the changes to the mapping are taken into account.



Figure 1: Editing field mappings.


There are currently three different mapping types:

  • Copy Mapping: Der Standard Typ: The connector field is mapped 1:1 to the specified document field.

  • Constant Mapping: Instead of a connector field, a constant value can be mapped to a document field.

  • Split Mapping: The value of a connector field is divided into several values by a character to be entered. This can be used to convert comma-separated lists into multi valued document fields.

6.2. Document Import

In addition to defining connectors that can monitor and search different document sources, it is also possible to import pre-structured data into a search engine index. Unlike connectors, this data is imported once, i. e. no subsequent synchronization takes place.

6.2.1. Manage document imports

Any number of document sets can be imported in the application and deleted if necessary. For each set of imported documents, known as import batches, you see a row in the overview table. In addition to the name of the import batch, you can also see how many documents are part of the batch. The status indicates whether the import is still running, whether it was successful, or whether it has failed.



Figure 2: Overview of all previously imported document batches.


Below the overview table you will find the form elements to import a new document set. To do this, enter a name and click the Browse button. A window opens in which the local file system is displayed.

You can import single files as well as zip archives with several files. Make sure that there are no (hidden) subdirectories in such ZIP file and that the files have the correct file extensions.


These import formats are currently available:

Text Importer

Text importers can be used to import any plain text files. The complete content of the file is imported into a field. The file name of the file is available later as a metadate. CAS Importer Allows the import of serialized UIMA CAS (currently as XMI). This means that for example documents are imported as gold standards.

Please note that the type system of this CAS has to be compatible with the type system of .


Solr XML Importer

A simple XML format that allows the import of pre-structured data. During the import, the fields defined in XML are written to the search index in fields with the same value. Please make sure that the field names in the XML file correspond to the field names of the search index associated with your project.

Images that can be imported to the documents and displayed together with them are a special feature. To upload an image, you have to pack the XML document (s) together with the images into a ZIP archive. With each document you can now add as many image_reference fields as you like. Relative paths to the image are expected. Images can be stored in any subfolders within the ZIP archive. Supported image formats are. gif,. png,. jpg and. tif.

...
<field name="image_reference">images/image.png</field>
<field name="image_reference">./images/pics/picture.png</field>
...

An example of the supported import format is shown below

<?xml version='1.0' encoding='UTF-8'?>
<!--Averbis Solr Import file generated from: medline15n0771.xml.gz-->
<update>
  <add>
    <doc>
      <field name="id">24552733</field>
      <field name="title">Treatment of sulfate-rich and low pH wastewater by sulfate reducing bacteria with iron shavings in a laboratory.		</field>
      <field name="content">Sulfate-rich wastewater is an indirect Tag der Arbeit threat to the environment especially at low pH. Sulfate reducing bacteria (SRB) could use sulfate as the terminal electron acceptor for the degradation of organic compounds and hydrogen transferring SO(4)(2-) to H2S. However their acute sensitivity to acidity leads to a greatest limitation of SRB applied in such wastewater treatment. With the addition of iron shavings SRB could adapt to such an acidic environment, and 57.97, 55.05 and 14.35% of SO(4)(2-) was reduced at pH 5, pH 4 and pH 3, respectively. Nevertheless it would be inhibited in too acidic an environment. The behavior of SRB after inoculation in acidic synthetic wastewater with and without iron shavings is presented, and some glutinous substances were generated in the experiments at pH 4 with SRB culture and iron shavings.</field>
      <field name="tag">Hydrogen-Ion Concentration; Iron; Oxidation-Reduction; Sulfur-Reducing Bacteria; Waste Water; Water Purification</field>
      <field name="author">Liu X, Gong W, Liu L</field>
      <field name="descriptor">Evaluation Studies; Journal Article; Research Support, Non-U.S. Gov't</field>
    </doc>
    <doc>
      <field name="id">24552734</field>
      <field name="title">Environmental isotopic and hydrochemical characteristics of groundwater from the Sandspruit Catchment, Berg River Basin, South Africa.</field>
      <field name="content">The Sandspruit catchment (a tributary of the Berg River) represents a drainage system, whereby saline groundwater with total dissolved solids (TDS) up to 10,870 mg/l, and electrical conductivity (EC) up to 2,140 mS/m has been documented. The catchment belongs to the winter rainfall region with precipitation seldom exceeding 400 mm/yr, as such, groundwater recharge occurs predominantly from May to August. Recharge estimation using the catchment water-balance method, chloride mass balance method, and qualified guesses produced recharge rates between 8 and 70 mm/yr. To understand the origin, occurrence and dynamics of the saline groundwater, a coupled analysis of major ion hydrochemistry and environmental isotopes (d(18)O, d(2)H and (3)H) data supported by conventional hydrogeological information has been undertaken. These spatial and multi-temporal hydrochemical and environmental isotope data provided insight into the origin, mechanisms and spatial evolution of the groundwater salinity. These data also illustrate that the saline groundwater within the catchment can be attributed to the combined effects of evaporation, salt dissolution, and groundwater mixing. The salinity of the groundwater tends to vary seasonally and evolves in the direction of groundwater flow. The stable isotope signatures further indicate two possible mechanisms of recharge; namely, (1) a slow diffuse type modern recharge through a relatively low permeability material as explained by heavy isotope signal and (2) a relatively quick recharge prior to evaporation from a distant high altitude source as explained by the relatively depleted isotopic signal and sub-modern to old tritium values. </field>
      <field name="tag">Groundwater; Isotopes; Rivers; Salinity; South Africa; Water Movements</field>
      <field name="author">Naicker S, Demlie M</field>
      <field name="descriptor">Journal Article; Research Support, Non-U.S. Gov't</field>
    </doc>
  </add>
</update>


7. Text Analysis

7.1. Pipeline Configuration

The text analysis annotators and pipelines used in  can be graphically administered and monitored in a centralized way. This is done in the Pipeline configuration module.



Figure 1: Link for opening the graphical configuration of text analysis components.


The overview page lists all the text analysis pipelines available in the project. The following information and operations are provided in the table.

  • "Pipeline Name": name of the pipeline.

  • "Status": Status of the pipeline: STOPPED, STARTING or STARTED. As soon as the pipeline started, it reserves system resources. Only after it started, it accepts analysis requests.

  • "Instances": Number of pipeline instances, by default set to 1. Increasing the value to n means the pipelines can process n requests from the web interface in parallel. Note: memory requirements increase as well.
  • "Throughput": here, two indicators for the pipeline throughput are given: the total number of processed texts, and the average number of processed texts per second. The statistics are reinitialized each time the pipeline stops/starts.

  • "Operations | Initialize pipeline" : this is used to initialize a pipeline. As soon as it has been initialized, it can process texts.

  • "Operations | Stop pipeline" : to save system resources, pipelines can also be stopped.

  • "Operations | Edit pipeline" : this is used to configure a pipeline, for example to add other components to it, to remove them or to modify their configuration parameters. Pipelines can only be edited when they are stopped.

  • "Operations | Update pipeline" : this is used to update the statistics (throughput) and status of the pipeline.

  • "Operations | Delete pipeline" : this allows pipelines to be permanently deleted, if they are no longer needed.



Figure 2: Overview of all available text analysis pipelines in the project.


To create new pipelines, use the 'Create pipeline' button below the overview table.

To copy an existing pipeline, click on the "Clone pipeline" button  on the right button bar of the pipeline

7.2. Pipeline details

With the pencil icon in the taskbar of the overview table, you can access the details page of the pipeline. At the top left, all annotators are displayed in the order in which they are used in the pipeline.

To the right of each annotator name (green panel), you can see the annotator-specific throughput data, indicating the total number of processed texts and the average number of texts per second. By clicking the relevant annotator, you can show all the configurable configuration parameters.



Figure 3: Viewing the details of a preconfigured pipeline


7.3. Editing a pipeline

There are some general rules for pipeline configuration:

  1. Health Discovery comes with a set of preconfigured pipelines (see section Available Text Analysis Pipelines). Other than self-created pipelines, these preconfigured pipelines cannot be edited.
  2. As long as a pipeline is running, it cannot be edited.

Pipelines can be edited in the details page.



Figure 4: Editing a pipeline.


Left panel in green: your pipeline configuration

The arrow buttons and the x-button on the right can be used to move annotators to another position within the pipeline or to remove them. Individual configuration parameters of the annotators are now also editable.

Right panel in blue: available annotators

A list of available annotators, which can also be added to the pipeline by clicking on the horizontal arrow button.

7.4. Managing / Adding new text analysis annotators

The application allows to add new text analysis annotators at runtime. There is no need to reinstall or redeploy the application. For that, so called UIMA™ PEAR components (Processing Engine ARchive) are used. PEAR is a packaging format, which allows to ship textanalysis components alongside all needed resources in a single artifact.

You find a list of all available PEAR components in the Pipeline Configuration where you configure your textanalysis pipeline. Adding new annotators is done within the Textanalysis: Annotators module.



Figure 5: Show and import UIMA PEAR components.

7.5. Text Analysis Processes

Any number of text analysis results can be generated and stored for all known document sources in . Text analysis results can be created either automatically through pipelines or manually. This way, you can obtain different semantic views of the same document which enable you to evaluate several views side by side.



Figure 1: Overview of all currently created test analysis tasks.


The table contains the following columns:

  • "Type": indicates whether this is a manual or automatic text analysis.

  • "Name": name of the process. For example Demo - anatomy

  • "Status": Status of the process. It is either RUNNING or IDLE.

  • "Document source": the document source to which the task refers. In parentheses after the name is the number of processed fields. For example if two fields, contents and title, are processed in a corpus of 3000 documents, then at the end of the task, 6000 will be indicated here.

  • "Pipeline": in the case of an automatic text analysis, the pipeline that was used for the text analysis is indicated here.

Buttons:

  • Download: Download the whole result as set of UIMA XMI files.

  • Refresh: Button to refresh the current state of the process (e.g. to verify the processing is finished)
  • Delete: Delete whole process and all results.


By clicking on the process name, e.g. "Demo_Process", you can jump to the analysis results displayed in the Annotation Editor module.


When you create a new text analysis process, you can select whether it is a manual or an automatic text analysis.



Figure 2: Creating a new text analysis task: manual or automatic text analysis.


If you choose automatic text analysis, then in addition to the name and the document source, you are requested to give your text mining process a name and specify the document source and pipeline.



Figure 3: Creating a new automated text analysis process: Give your process a name and enter the document source and the pipeline you want to use.

7.6. Annotation Editor: Viewing and Editing Annotations

To be able to make a judgment about text analysis components, it is frequently essential to have the results displayed graphically. You may also want to correct text analysis results manually or annotate documents completely manually, for example to create gold standards, which are then used to evaluate text analysis components. For all these purposes, the Annotation Editor can be used.

7.6.1. Viewing annotations inside a document source

The Annotation Editor can be used to display text analysis results graphically. Using the annotation editor, all documents from a document source can be easily viewed, section by section, and all annotations can be graphically highlighted.

In Annotation Editor, you first select a document source (1). You then select the text analysis process that you wish to view (2).  If document names have been given to the documents in the source, the name of the first document in the source is displayed (3).

Once you have selected the source and the text analysis, the first document in the corpus is displayed. The document is displayed section by section. There is a selector above the text of each available annotation to enable the content of the annotation to be graphically highlighted (4).

In the main window (5), you can see the corresponding section of the document with the currently activated highlights. Below the main window, there are buttons for navigating through the individual sections of a document (6). Above it there are similar buttons, which you can use to navigate between the individual documents in a source (7).


Figure 1: Displaying the annotations in the documents of a document source.


A table with a list of all the currently highlighted annotations can be displayed on the right of the main window.



Figure 2: Overview table of annotations.

Clicking on a position in the text shows the annotations found at this position in the details view.


Figure 3: Especially emphasizing individual annotations.


The overview table is also used to view the individual attributes of the annotation. By expanding the annotation in the table, you can obtain a list of all the annotation’s attributes.


Figure 4: Show annotations' attributes.


7.6.2. Configuring section sizes

As described above, the documents are displayed section by section. By default, 5 sentences are displayed on each page. This setting can be configured in the interface by clicking on the wheel at the right top.

In principle, you can combine a character-based sectioning with an annotation-based sectioning. While the standard sectioning is the character-based sectioning, annotation-based sectioning may has the advantage that you don’t miss cross section annotations. When combining both sections, the sections are always shown with a slight overlap. The end of section n is displayed again at the beginning of section n+1 to avoid the section being taken out of context. Furthermore, when sectioning by characters, the sectioning automatically ensures that the section splits are not made in the middle of a word.

Any change to the section size the graphical configuration is applied immediately after closing the window. Using the reset button, you can restore the configure default values.



Figure 5: Annotation Editor settings window.


7.6.3. Manually editing, adding and deleting annotations

The annotation editor can also be used to add annotations manually or to edit them. Using the button on the right (1), you can switch to edit mode.

In edit mode, a combo box appears above the main window where you can select the annotation type you want to annotate (2). After you select the type, you can create annotations of this type in the text. To create annotations of this type, simply highlight an area of text in the main window using the mouse. A quick way of adding an annotation is to simply click a word. An annotation of the corresponding type is then created for the whole word. 

Edit mode also allows you to edit and delete existing annotations (3). To do this, click the cross mark in the overview table of annotations on the right.

After you have made changes to the document, these can be saved or discarded by clicking the buttons (4).


Figure 6: Editing Annotations.


7.6.4. Displayed and editable annotation types, attributes and colours

Currently, the user cannot configure which annotation types and attributes are visible in the annotation editor, which colors are assigned to these annotation types, and which attributes are editable. This is currently preset by Averbis.

7.7. Text Analysis Evaluation

The results of various text analysis tasks can be evaluated against each other, e.g., to compare a text mining process against gold standards.

To do this, you may first choose the document’s source (1) which serves as the basis of the evaluation. Then, you choose the reference view (2) in the left part of the window, and, on the right side (3), you choose the text analysis process that you wish to evaluate.

If you choose a source and two text analysis processes, one can evaluate the results visually, one against the other, in a split-view with two separate annotation editors. The representation of the sections in the right window is thereby coupled to the sections in the left window. Matching annotations are indicated by a green background, non-matching annotation are marked red.



Figure 1: The image shows the example of a DoseFrequenceyConcept annotation on the left that does not match on the right: 3 TIMES DAILY


7.7.1. "Matches" and "Partial Matches"

When evaluating, it is possible to distinguish between exact and partial matches. Annotations are marked as an exact match if their type, characterizing attributes and position in the text are identical.

To obtain an extra level between a hit and a no-hit, it is also possible to define a partial match. Annotations that are not exactly identical, but still meet these criteria, are marked accordingly both in the graphical and table presentation. In the graphical presentation they are indicated with a yellow background.




Figure 2: Displaying a partial match.


7.7.2. Configuring the match criteria

The definition of what should be considered as a match, partial match and mismatch can be configured by the user in the interface.

The general rule is that two annotations are considered as a match when they are of the same type and are found at exactly the same place in the document. For each annotation type you can then define which annotation attributes also have to match. If we use a concept, this could be the concept’s unique ID. This means that two concepts would be identified as a match only if this attribute was identical in both annotations.

It is also possible to configure for each annotation type, when two annotations of this type should be considered as a partial match. Here you can choose between four different options:

  • "No partial matches": only exact matches are allowed.

  • "Annotations must overlap": a partial match is given whenever the annotations overlap.

  • "Allow fixed offset": at the beginning and end of the annotations, a configurable offset is allowed.

  • "Are within the same annotation of a specific type": a partial match is found whenever the annotations are within the same larger annotation. For example, if they are inside the same sentence.



Figure 3: Graphical configuration of the match criteria.


7.7.3. Corpus evaluation

Using the Evaluate metrics button, a window can be opened, displaying the precision, recall, F1 score and standard deviation for either a single document or the whole corpus. The numbers are split by annotation type.



Figure 4: Evaluation at corpus level.


In the Settings panel, you can configure which types are to be taken into account in the corpus evaluation.


Figure 5: Selecting the annotation types to be taken into account in the corpus evaluation.

7.8. Annotation Overview

For the quality assessment and improvement of text analysis pipelines, an aggregated overview of the assigned annotations is often helpful. For this purpose, the Annotation overview is used. You can create any number of these overviews. To do this, you first select a source and an existing text analysis process. Next, you select the annotation type to be analyzed.

After pressing the green button, the aggregation is calculated. Depending on the scope of the selected source, this may take some time. All overviews are listed in the table. As soon as an overview has been calculated, the results can be displayed via the list symbol.



Figure 1: Listing and management of the available annotation overviews.


7.8.1. Aggregation und Context

If you select an overview from the table using the list symbol, you will see an aggregated list of the annotations found for the corresponding type. By default, the list is sorted in descending order by frequency. By clicking on an annotation in the table, you can display some example text in which the annotations occur. In addition to the analysis, the overview is also suitable for directly improving the results. In this way, false positives as well as false negatives can be identified and corrected.

Currently, the attributes that appear in the list for each annotation, are preconfigured by Averbis. This setting cannot yet be made graphically via the GUI.


8. Terminologies/Lexical resources

In this module, you can manage the lexical resources, which are used within the text analysis components.

8.1. Terminology Administration

That module lists all available terminologies within the current project. You can:

8.1.1. Create a new terminology

When creating a new terminology, you can specify the following parameters:

Terminology-ID

A unique identifier. E.g. MeSH_2017.

Label

A label. E.g. MeSH.

Version

A version number. E.g. 2017.

Concept type

The concept type when being used within text analysis. E.g. de.averbis.extraction.types.Concept.

Hierarchical

When unchecking this box, the terminology will not contain any hierarchical relations (flat list).

Encrypted export

ConceptAnnotator dictionaries can be exported encrypted to prevent having sensible data on the disk.

This parameter only affects Concept Dictionary XML Exports. Other exports still are unencrypted.

Besides, you can specify, which languages are available within that terminology.



Figure 1: Add a new terminology.


8.1.1.1. Available languages

Your terminology can contain term for all languages which are selected here. There is no need to use all languages for all terms. So there could be concepts, which only have terms in a subset of those languages. Since in some situations, we need to compute one cross-lingual preferred term, we need to decide which language to use, if there are no terms in specific languages. For that, you can specify a language priority by moving the language up/down in this list. If you have English at the top, followed by German, we try to display the English preferred term. If no English preferred term is available, the German one is displayed.

There is one special language, called Diverse. Terms in that language are mapped in every language. You can mark language independent terms with that language (e.g. Roman numerals).

8.1.2. Edit terminology´s meta data

You can edit the meta data, that you specified when creating the terminology, via the edit-button.

8.1.3. Delete a terminology

The delete-button allows to delete a terminology, when there is no active import or export running.

8.1.4. Import content into a terminology

You can import content from OBO files (versions 1.2 [1 and 1.4[2]) into an existing terminology. If you have multilingual terminologies, version 1.4 needs to be used. Optionally, a mapping mode for each synonym can be imported, too.

The source file may be zipped to support large files.


The minimal structure of your OBO terminology looks like this:

Example of an OBO terminology

synonymtypedef: DEFAULT_MODE "Default Mapping Mode"  //OPTIONAL - only if using mapping modes
synonymtypedef: EXACT_MODE "Exact Mapping Mode" //OPTIONAL - only if using mapping modes
synonymtypedef: IGNORE_MODE "Ignore Mapping Mode" //OPTIONAL - only if using mapping modes

[Term] id: 1 name: First Concept synonym: "First Concept" DEFAULT_MODE []
synonym: "First Synonym" IGNORE_MODE []
synonym: "Second Synonym" EXACT_MODE []
 
[Term] id: 2 name: First Child is_a: 1 ! First Concept

To import terms with mapping modes, the OBO terminology begins with the synonym type definitions, as shown in the first three lines of the OBO terminology in the example above.

Each concept begins with the flag "[TERM]", followed by an "id" and a preferred name with the flag "name". After that you can add as many synonyms as you like with the flag "synonym", followed by the desired mapping mode (optionally). Note: if you would like to define a mapping mode for your concept name (flag "name"), you have to add the term as synonym, as shown in the example for "First Concept".

Furthermore, if your terminology contains a hierarchy, you can use "is_a" to refer to other concepts of your terminology.

To import a terminology like the one shown above, proceed as follows:

  1. In "Project Overview", click on "Terminology Administration".

  2. Click on "Create New Terminology". Fill in the dialog as described in Add Terminology.

  3. Once you have created a terminology, click the up arrow icon to the right of the terminology.

  4. In the "Import Terminology" dialog, select the terminology you want to import from the file system. Click on "Import".

    1. By clicking on the "Refresh" button to the right of the terminology you can check the progress of the import. When the terminology has been fully imported, the status changes to "Completed".

    2. To browse your terminology, switch to the "Terminology Editor" by going to the "Project Overview" page and clicking on "Terminology Editor".



Figure 2: Import content into existing terminology


After an import has started, the current status is shown in the overview.



Figure 3: Status of currently running processes.


Besides, you can see some details of the latest import (including error messages).



Figure 4: Detailed information regarding the latest process.


After successful terminology import, terms, hierarchies and mapping modes can be checked in the Terminology Editor.




Figure 5: Terminology Editor showing imported terminology

8.1.5. Submit terminologies for use in text mining pipelines

To use a terminology within the text analysis, , it must be handed over to the text analytic module via the "submit terminology to text analytics" button.


Figure 6: Submit terminology for use in text analytic module


After submitting the terminology to the text analytic module, you need to stop and restart the pipelines, which use this terminology.

8.1.6. Download terminology

To download a terminology in OBO format, open the Terminology Administration and perform the following steps:

Step 1: Click the button "Preparing for OBO download"

The preparation time depends on the size of your terminology. Once the download is ready, a notification appears in the bell symbol in the upper menu bar.


Step 2: After the preparation step is completed, refresh the terminology. This will enable the button "Download OBO file"


Step 3: Click the button "Download OBO file". Depending on your local browser settings, the download will start automatically or the download prompt will open.


8.2. Terminology Editor

The Terminology Editor allows to edit the content of terminologies.

8.2.1. Free text search and autosuggest

The centered search bar at the top of the Terminology Editor is meant for doing a free text search across multiple terminologies. You can include or exclude terminologies from the search by checking them within the drop down menu next to the search bar. While entering a search term, the system suggests different possible matches via autosuggest, grouped by terminology.



Figure 1: Terminology auto suggest.


Doing a free text search, you can use the asterisk symbol (*) for truncation (e.g. Appendi\*). The results of a free text search are listed within the upper right section. Results are grouped by their terminologies.

The settings menu on the top right allows to customize some search and autosuggest settings. You can specify whether Concept IDs are included within the search, and define the number of hits that shall be displayed.



Figure 2: Configuration of search and autosuggest.


8.2.2. Displaying concepts hierarchically

The tree view in the Terminology Editor allows to view its position in the terminology hierarchy. Just click on a concept within the list of search results.



Figure 3: Displaying concepts hierarchically.


You can configure whether the Concept ID shall be shown in the tree as well, and whether the tree view shall show the siblings of a concept along its hierarchy.



Figure 4: Tree with and without strictly focusing on the selected concept.


8.2.3. Terms

In the lower right corner of the windows you see the concept’s details. The first tab shows concept synonyms. You can edit, add or delete synonyms here as well.



Figure 5: Adding new terms.


8.2.4.  Mapping Mode

Every term has a so called Mapping Mode. Mapping Modes are an efficient way of increasing the accuracy of terminology based annotations. They allow to ignore certain synonyms which are irrelevant or lead to false positive hits (IGNORE). Synonyms can also be set to EXACT matches, which is especially good for acronyms and abbreviations (AIDS != aid).

Currently, there are 3 Mapping Modes

DEFAULT

Term is preprocessed the same way the pipeline is configured.

EXACT

Term is only mapped when the string matches exactly to the text without any modification by preprocssing (including case).

IGNORE

Term will be ignored. It won’t be used within the text analysis.


8.2.5. Relations

The second tab shows all relations known for that concept. You can use this view to add or delete relations, too. Currently, only hierarchical relations are supported. When adding a new relation, you get an autosuggest to find the correct concept that you want to relate.

8.2.6. Mapping Mode and comment

In the third tab, you can add a comment to a concept. Besides, you can set a concept-wide Mapping Mode. Terms, which do not have a specific Mapping Mode inherit it from this concept Mapping Mode.


9. Document Search

9.1. Solr Core Administration

As soon as the Solr Admin module is used, the application has a default Solr Core. This core is displayed in the administration panel.

uses Solr to create a search index and to make documents searchable. Choose "Solr Core Administration" on the project overview to create the basic settings.

9.1.1. Indexing pipeline

Documents that are imported or crawled go through a text analysis pipeline in order to add metadata to the search index.

The corresponding pipeline is selected here - a separate indexing pipeline can be used for each project.



Figure 1: Choosing the indexing pipeline.


If you choose an indexing pipeline, all documents that are imported or crawled in the future will be processed. If you want to use a different pipeline for processing search queries, you can set it in the Solr Core Management section.

You can also switch the indexing pipeline within a project. To avoid a heterogeneous set of metadata, all documents are re-processed.

9.1.2. Query Pipeline

Here you can select which of the available pipelines should be used for analyzing the search query. By default, the same pipeline is used here as selected for indexing the documents.



Figure 2: Initial state in which no query pipeline is selected.



Figure 3: Choose a query pipeline.


9.1.3. Solr Core Overview

A so-called "Solr Core" is available for each project, the administration of which can be accessed via the "Solr Core Management" button on the project page.



Figure 4: Key figures and information on the search index of a project.


  • "Core Name": The name of the Solr instance (generated automatically)

  • "Path to solrconfig.xml": This is the path to the configuration file of this Solr instance. Expert settings can be made in this configuration file. After editing this file, the Solr instance must be restarted in order for the changed settings to take effect.

  • "Path to schema.xml": The index fields are configured in this configuration file. This file should only be edited manually in exceptional cases and by experts.

  • "Indexed documents": Number of documents currently in the index.

  • "Pending documents": Number of documents that are currently in the processing queue of the Solr instance.

After pending documents have been processed by Solr, a commit must take place before these documents are actually available in the index. Since a commit is quite resource-intensive, the number of commits are kept low. By default, a commit therefore only takes place every 15 minutes. The processed documents therefore appear under the indexed documents with a delay.


  • "Operations": At the level of the Solr core, there are three operations available:

    • "Refresh" : You can update the displayed key figures by clicking on this icon.

    • "Commit" : This command executes a commit on the Solr core, including documents in the index that are not visible beforehand. By default, this happens every 30 minutes in the background.

    • "Delete all documents from the index" : With a click on this icon, all documents are deleted from the index.

9.1.4. Configuration of the search index schema

The configuration of the schema of the current search index can be reached via the module "Solr schema configuration".

9.1.4.1. Overview of all schema fields

Each Solr core has a schema that defines which information is stored in which kinds of fields. The Solr schema configuration lists all available fields in alphabetical order. The following information and operations are available for field in the index:

  • "Field name": Name of the field as defined in the Solr schema. This name is often chosen in such a way that it is unpleasant for people to read. If a field is a system field, that is, a field whose values must not be overwritten by the user, a small lock symbol () is displayed to the right of the field name.

  • "Type": The type specifies the contents of this field. In addition to an abstract description (e. g. string) the complete class name of the field is specified in parentheses.

  • "Active": This button controls whether the field contains information to be displayed or used elsewhere in the application. These fields are then available, for example, to be displayed in the search result, to form facets or to be used via query builder for the formulation of complex, field-based search restrictions. Fields that are not activated can still be used by the system, but they are not available for manual configuration to the users. If a field is activated, the line is highlighted in green.

  • "Label": The field name itself is often not suitable for displaying because it is not legible, and it is not localized. Therefore, you can define meaningful display names for all fields in different languages. These names are used wherever the user accesses or displays field contents. If no corresponding display name is defined for the user’s language, the illegible field name is displayed.



Figure 5: Overview of the Solr cores scheme.
9.1.4.2. Dynamic fields

In the overview, dynamically generated Solr fields are also displayed as soon as they have been created (that is, as soon as they have been filled with values once). As soon as the field has data, it remains permanently in the overview, even if all documents containing values in this field have been deleted in the meantime.

9.2. Manage and use search interface

The functionality and appearance of the search interface can be influenced by configuration.

9.2.1. Configuring the display of search results

Starting from the overview page of a project, the display of search results can be configured by using the "Field Layout Configuration" module. You can specify which fields/contents of the indexed documents are to be displayed in the interface. This applies to both the fields on the results overview page and the fields on the detail page of the documents (accessible by clicking on the title information of the result). Fields that are only displayed on the overview page of the search results are highlighted in green. In addition to selecting the fields, you can also configure whether the field title should be displayed, as well. If this option is activated, the display name created in the Solr schema management for the language of the respective user is displayed.

In addition, the length of content of a particular field can be specified, as well as some style settings.



Figure 1: Configuring the display of search results.


9.2.2. Configure Facets

So-called facets provide the user with additional filter options. They are displayed on the left side of the search page. The configuration of facets can be accessed via the module "Facet Configuration" on the project overview page.

On the configuration page, you can select and configure the facet fields displayed in the user interface. When selecting a facet, you can configure whether the entries within a facet are AND- or OR-linked. In the case of AND facets, only documents that combine all the terms selected in this facet are displayed. OR facets, on the other hand, offer the option of finding documents that contain only individual terms (e. g. documents of "Category 1" OR "Category 2").

In addition, you can configure how many entries are to be displayed within each facet. The order of the facets can be determined with the arrows. The display in the search interface is similar to the order in the administration panel. The display name of a facet is selected according to the labels assigned in the Solr schema configuration (see above).



Figure 2: Configure Facets.


9.2.3. Configuring auto-completion

Settings for automatic completion of search terms can be made via the "Autosuggest" module that you access on the project overview page. There are various methods by which users can make suggestions to complete their searches in a meaningful way. Currently, four methods are available to choose from, and they can be freely combined as needed.

The proposals are grouped by their mode in the search interface. The order of the groups corresponds to the order in which the modes are listed here (if more than one mode is used). Use the arrow keys to change the order.

In addition to the number of proposals per group, you can also specify a description for each group, which is displayed in the search interface above the respective proposal block.

Changes will take effect immediately after saving for all users of the search.

If one of the two concept-based methods is used, an additional field appears where you select which Solr field is to be used for the lookup. All fields that are recognized as concept-based fields are available for selection.



Figure 3: Configuring auto-completion.


The methods are characterized as follows:

"Prefixed Facet Mode"

  • The proposals for completing the search query come from the documents in the search index. No external sources are therefore used for the proposals.

  • The suggestions are intended to complete the term currently entered, no additional term is proposed (no multiple word suggestions).

  • The current search restrictions (e. g. via facets) are taken into account in the proposals. Therefore, only those terms are suggested for which there are also hits in the body, taking into account all active search restrictions.

  • The proposals are not based on the order of the terms in the documents. If you enter a search query that consists of several partial words, the proposed word does not have to be directly behind the term it is in the search query.

"Shingled Prefixed Facet Mode"

  • The proposals for completing the search query come from the documents in the search index. No external sources are therefore used for the proposals.

  • Unlike simple prefixed facet mode, suggestions can consist of several words. In addition to the completion of the term currently entered, it is also suggested terms that are often directly or closely related to this term in the documents. Entering Appen in this mode could therefore lead to suggestions such as treating _appendicitis.

  • The current search restrictions (e. g. via facets) are taken into account in the proposals. Therefore, only those terms are suggested for which there are also hits in the body, taking into account all active search restrictions.

  • If the query consists of several words, the suggestions for the order are based on the last of these words. All terms before this last word are still used as filters. The entry Hospital Appendi could therefore also lead to the suggestion Hospital Treat Appendicitis, if Hospital Treat Appendicitis is not in the immediate vicinity of Hospital in the text.

Concept Mode with guaranteed hits (concepts_hit)

  • The suggestions for completing the search query are taken from synonyms of the stored terminology.

  • Proposals show the wording of the synonym and the title of the terminology as well as the preferred name of the concept in the user’s language.

  • If you select a proposal (synonym), a search with the associated concept is executed.

  • Documents that contain the exact synonym text (that is, documents that cannot be found using another synonym) are given a higher weighting and are displayed in the results list above.

  • Only proposals that guarantee at least one hit are displayed.

Concept Mode without guaranteed hits (concepts_all).

This mode differs from the conventional concept mode in that proposals are also displayed that do not lead to a hit. All terms from the stored terminology are displayed.

The activation of the concept modes is not completely implemented via the GUI. Please contact support.


9.2.4. Search restrictions

Switch to the "Search" module of the project to get to the search page of the application. All search terms entered remain comprehensible for the user at any time. You can easily see which search terms have led to the currently presented result set. The current search restrictions are listed next to each other on the left side of the search bar. They are highlighted in the same color as the corresponding highlighting in the text. If the restriction by a term originates from a facet, the name of the facet is listed before the search term (see screenshot below).

If the number of search restrictions is too long to be displayed in the search bar, they are displayed in a pop-up and collapsible menu on the left in the search bar. The small cross symbol next to each search restriction removes this restriction and updates the search results accordingly. With the cross button to the right of the search bar you can also remove all current search restrictions at once.



Figure 4: Display of the current search restriction.


9.2.5. Faceted search

Facets represent one of the core functionalities of the search. With the help of the facets, the search results can be quickly limited to relevant results. In the admin panel you can configure for which categories facets should be displayed.

Within the facets, the most frequent terms from the respective category appear, which are contained in the indexed documents. The number after the faceted entries indicates how many documents are contained in the index (or current search result set) that match the corresponding term.

The faceted entries can be clicked on, whereupon the search result will be limited accordingly. Different terms can be combined here - even across facets. This allows a high degree of flexibility in restricting the search results.



Figure 5: Concept facet with selected restriction to 'Diagnosis'.


9.2.6. AND-linked facets

By default, all selected facet entries are AND-linked. This means that only documents matching all selected criteria are listed. The currently selected filters are highlighted in orange. The restriction can be removed by clicking on the faceted entry again.

9.2.7. OR-linked facets

This filter yields to result sets in which at least one of the selected criteria appears. only one or only a few of the selected terms appear. In the case of these OR-linked facets, a checkbox is displayed in front of each entry.

9.2.8. Querybuilder / Expert Search

With the query builder, a comfortable mechanism is available in the system to create complex search queries. This allows for combining different criteria to a a query using any fields from the index.

The Querybuilder can be opened using the magic wand icon in the search bar.



Figure 6: The magic wand on the right of the search bar opens the query builder.


The input mask allows you to add search restrictions on all activated schema fields. Depending on the type of the selected schema field, different comparison operators are available. Text fields allow the operators contains and contains not. Any text can be entered as a restricting value. The asterisk * is used as a wildcard.

Date fields are provided by the comparison operators >= and <=. Numerical fields are provided by the comparison operators =, <>, >= and <=. By combining two date or number fields, the search can also be restricted to periods or ranges.



Figure 7: Input mask of the query builder


Concept-based fields allow the operators contains and contains not like text fields.

Any number of conditions can be added. These are linked with each other using the boolean operators AND and/or OR. The criteria can also be grouped together to create any logical combinations. In addition to the graphical display, you can also find the logical expression that results from the current compilation of search restrictions in the upper area of the query builder. Once the complex search query has been created, it can be activated using the Apply button. The search results are calculated accordingly. In addition, the magic wand icon in the search bar turns orange to indicate that a complex search restriction is active. The search query can be reloaded by clicking on this button and can be edited until the result matches your expectations.

The query created using the Querybuilder behaves in addition to any other search restrictions, such as by means of free text search or facet restriction.

9.2.9. Document details and original document

The title field of a document serves as a link to a detail page containing additional information about the document (see "Solr Schema Configuration" module on the project overview page).

In addition to the detailed view, you can also download the underlying original documents (e.g. PDF, office document etc.) if they are available. You can recognize this by a small icon on the right of the document title. The symbol differs depending on the document category. Clicking on the file icon starts the download of the original document.

9.3. Export search results

Documents in the system can be exported - both individual documents and complete search result sets.

9.3.1. Selection of documents to be exported

If the user has the necessary permissions to export documents, checkboxes are provided on the search results page to mark individual documents. There is also a checkbox to mark all currently displayed documents. In addition, the button "Export search results" is displayed above the search results, where the selected documents can be exported.

Another option is to export all documents that meet the current search restrictions. In this case, all checkbox have to be deselected.



Figure 1: Controls to mark and export documents.


9.3.2. Selection of the exporter and the fields to be exported

After selecting the documents to be exported, a dialog box appears in which the exporter type can be selected. To this day, there is an exporter that exports selected fields of the documents to an Excel document.

After selecting the fields to be included in the export and confirming with the "Export" button, the export starts. Once the export is complete, the result is offered for download.



Figure 2: Selection of the exporter and the fields to be exported.


10. Document Classification

10.1. Manage classification

10.1.1. Administration of the label system

The target categories for automatic classification of documents are called the label system that can be edited and maintained in the module "Label System". In a new project, the label system is initially empty.

Clicking on "Create new label" at the bottom left adds a new label. The pen symbol on the right-hand side is used to rename the label. The plus symbol to its right adds a new label as a child of the current label. It is therefore used to create hierarchically organized label systems. Clicking on the red cross symbol deletes labels (only labels that have no children can be deleted).

In a hierarchical labeling system, the hierarchical arrangement can also be edited via drag & drop.



Figure 1: Labels can be added, edited, moved or deleted in the label system administration.


10.1.2. Administration of different classification sets

The starting point for the automatic classification of documents are so-called classification sets.



Figure 2: Menu item for managing classification sets.


10.1.2.1. Create a new classification set

Any number of classification sets can be created for each project. This means that you can classify the same document source with different classification parameters.

There is only one label system per project. The same label system is used for each classification set. Please make sure that the label system has been created before you create a classification set.


To be able to view the results of the classification in the interface, you should select an indexing pipeline in Solr Core Management before you create classification sets.


When creating a new classification set, following settings can be adjusted:

  • Name: Name under which this classification set is referenced.

  • Document fields: From all document fields known to the system, you can select those that are used for training the classifier (so-called features).

  • High confidence threshold: The system distinguishes between documents with high and low confidence for automatically classified documents. This parameter can be used to define the value above which the confidence is interpreted as "high".

  • Classifier: In principle, different implementations can be used for classification. At present, the implementation offered is a support vector machine.

    • SVM: Support vector machine

  • Single/multi-label: This parameter determines how many categories can be assigned to a single document. With Single only one label is assigned. With a Multi, a document can be categorized in several classes.

  • Classification method: The classification method determines how the machine selects from several candidates. Depending on whether it is a single-label or multi-label scenario, different options and configuration parameters are available:

    • Single-Label

      • Best Labels: With Single-Label-Classification there is only one classification method: the Best Labels method chooses the class with the highest confidence.

        • Threshold : The threshold value can be used to determine that only classes that have a certain minimum confidence are taken into account. This allows for filtering assignments for which the machine is very unsafe.

    • Multi-Label: For Multi-Label Classification several methods are available (for a deeper theoretical background, see Matthew R. Boutell: Learning multi-label scene classification ):

      • All Labels: This method simply selects the available instance labels in a decreasing confidence order.

      • T-criterion: Using the T-criterion, instances first get filtered by a minimum confidence threshold of 0.5. If the confidences are too low, i.e. no labels are assigned, another filter step is used. The second step checks if the entropy of the confidences is lower than the minimum entropy threshold, i.e. confidences are distributed unevenly. If this is the case, the labels are assigned based on a lower minimum confidence threshold.

        • Entropy: 1.0 (default minimum entropy)

        • Threshold value: 0.1 (default minimum confidence)

      • C-criterion: This method ensures the selection of the best prediction values depending on the configuration parameters (i.e. Percentage and Threshold values). It first selects the label with the highest confidence (larger than the threshold value) and continues to assign labels whose confidence is at least at 75% of the highest confidence value.

        • Percentage value: 0.75

        • Threshold value: 0.1 (minimal default confidence).

      • Top n labels: This method selects those categories that have the highest confidence.

        • n: the number of classes to be assigned

The classification configuration can be changed on the classification administration page by clicking on the edit button.

After changing parameters of an existing classification set re-training and re-classification are necessary for all changes to take effect.


Before documents can be automatically classified, the machine requires appropriate training material. This refers to a small set of intellectually classified documents used by the machine to train a model.

Training data can be created in two ways. Either by manually assigning classes via the graphical user interface (please see "Browse classifications" below) or by importing a CSV file that contains appropriate assignments.

10.1.2.2. Import of training material

The button opens a dialog for importing a CSV file with training material. The CSV file must contain the name of the document in the first column (referred to document_name in the system). The subsequent columns contain the label assignments (one column for each label in a mult-label scenario). The columns must be separated by semicolons. The values of the columns can be enclosed with double quotation marks if required (mandatory if the values contain semicolons).

Example :
trainset.csv

doc1;label_1;label_2
doc2;label_1;
doc3;label_1;label_3
...

The document name, which is used to identify the document in the list, must contain the value that is entered in the field document_name in the application.


If a training file contains several labels per document, but the selected training set is a single-label classification, only the first label is used.


If the document names or labels contain semicolons, the values must be enclosed in double quotation marks to avoid incorrectly interpreting the semicolon as a field separator.


Only values that are part of the label system in the application (or project) are allowed as labels (all others are ignored).


When you import training material, any labels that may already be assigned to the documents in the list are deleted.

10.1.2.3. Train a model

As soon as the system has access to training material by importing a training list or manually assigning labels, a model can be trained using the button. Use to update the information on "State" and "Model": the training has finished if "State" is IDLE and "Model" is READY.

10.1.2.4. Quality of the current model

After each training session, an evaluation is carried out to evaluate the current quality of the model. For this purpose, the machine uses the document set of intellectually confirmed labels. This quantity is divided into a training set (90%) and a test set (10%). The test set is classified by the machine on the basis of a model that has been trained for this training set. The results of the automatic classification are then compared with the intellectually assigned labels. To smooth the results, the machine repeats this 10 times for different divisions of test and training sets. The results of the tests can be viewed in the form of a diagram using the button. The diagrams show the following metrics per label, which are derived from the number of correct assignments (true positives - TP), false assignments (false positives - FP), and missing assignments (false negatives - FN):

Accuracy: The ratio of all correct assignments (and correct non-assignments) to the total sum of all observations: 

        TP + TN
____________________

TP + FP + FN + TN


Precision: The ratio of correct assignments to all assignments:

    TP
_________

TP + FP

If one attaches great importance to the fact that there are no misallocations, this value is of particular relevance.


Recall: The ratio of correct assignments to the sum of all existing correct assignments:

    TP
_________

TP + FN

If you take some misallocations into account in order to increase the number of hits, this value is of particular relevance.


F1-Score: A weighted average between Precision (P) and Recall (R):

            P x R
2 x     _________

            P + R

 

10.1.2.5. Automatic classification of all unclassified documents

As soon as an initial model has been created, all previously unclassified documents can be automatically classified on the basis of this model via on the classification configuration page.

Once the classification is complete, the results can be viewed in the graphical user interface. The assigned classes are displayed above each document (see "Browse classifications" below).

10.1.2.6. Status information

The overview table depicts information of the current status of the classification set:

  • IDLE: No process is currently running.

  • TRAINING: A training is in progress. During this time, no other processes can be started on this classification set.

  • CLASSIFYING: Documents are currently being classified. During this time, no other processes can be started on this classification set.

  • ABORTING: A process (training or classification) is being aborted. During this time, no processes can be started on this classification set.

The resulting model of a classification set comes with additional information:

  • NONE: No model has been trained yet.

  • READY: A valid model exists and a classification process can be started.

  • OUTDATED: Since the last training, manual classifications have been added or automatic classifications have been confirmed or rejected. The model should be re-trained in order to make changes take effect.

  • INVALID: Changes were made to the label system or a manually assigned label were deleted, which invalidates the current model. The model has to be re-trained.

10.2. Index, evaluate and manually classify documents

For all classification sets, you can use a graphical user interface to navigate through the documents, review results, confirm or delete automatically assigned classes, and assign classes manually. You can access this browser view by clicking on "Classification" on the project overview page.

10.2.1. Structure of the interface

The interface is similar to the search interface, both in terms of its structure and functionality. The classification page has three predefined facets on the left side of the screen, that can be used to filter documents according to the assigned class (Label), the assigned confidences (Confidence) or the assignment status on the document level (Status).

This makes it very easy to display, for example, only those documents that have been automatically classified (Status = Autoclassified) and that have labels with low confidence (Confidence = low). By making corrections/confirmations to the resulting documents the classification model can be improved (i.e. the system learns exactly where it is currently most unsafe (so-called Active Learning).

To the right of the search input field, the classification set on which you want to work can be chosen. If you have created several classification sets, you can quickly switch between them.

10.2.2. Confirm or reject automatically assigned labels

The labels that have been assigned to each document are depicted below the title information of each document. Manually assigned labels are displayed in blue (  ), automatically assigned classes are displayed in red (low confidence  ), or green (high confidence  ).

Automatically assigned labels have a button to confirm and to delete the label. By confirming an automatically assigned label, it changes its color and will be considered for the next training session to improve the model.

As soon as you confirm, delete or add labels, the model is considered OUTDATED. This means that since the last training session, new data has been collected to improve the model and re-training is necessary.


10.2.3. Execute actions on several selected documents

Similar to the conventional search interface, there are several document-centered actions for classification. In general, actions either refer to

  • exactly one document,

  • a selection of documents

  • all documents of the project or

  • all documents corresponding to the current search restrictions.

For any of these actions, there is a small button with a distinctive icon under the document title. Use this button to apply the action exactly to the corresponding document.

The same icons are displayed on larger buttons below the search bar ("Label documents(s)", "Classifiy document(s)", "Export classifications"). Clicking on these buttons apply the action to all documents that are marked with the checkbox left to their title. All documents on the current search result page are selected by clicking the uppermost checkbox on the page.

If no particular documents are selected at all, the action is applied to all documents that correspond to the current search restrictions. Since the result set can be very large, a window opens for approving the currents selection before the corresponding process starts in background.

10.2.4. Manually label documents

In addition to confirming or rejecting automatically assigned labels, categories can be assigned manually. The button attached to each document serves this purpose. The button opens a window in which you can select the desired label(s). You can also manually label several documents at the same time by using the checkboxes left to the documents title in conjunction with the uppermost button.

When manually assigning labels, a window opens with labeling information:

  • "Not selected": This label has not been assigned to any of the selected documents.

  • "Partially selected": This label has already been assigned for some (not all) selected documents (gray stripes).

  • "Completely selected": All selected documents already have this label (grey).

When assigning a label manually, automatically assigned labels of the same type are automatically overwritten, if existing.

As an example, if you select 100 documents to assign label A and 10 of them already have an automatically assigned label A, the status for the 10 documents will be switched to "Approved". An automatic assigned label B would not be replaced by this procedure (except in a single label classification scenario where only one label is allowed).

10.2.5. Classify documents automatically

The same selection mechanism as for manual labeling also applies to automatic classification (single documents, a selection of documents or the current search result set). The button "Classify document(s)" with the icon automatically classifies documents that are not manually categorized.

As a result, automatically assigned category labels are displayed in red (low confidence automatic label with low confidence), or green (high confidence automatic label with high confidence). The corresponding facet filters on the left (Label, Confidence and Status) will change when refreshing the page.

If documents are automatically classified, all previously unconfirmed automatically assigned classes of these documents are deleted from previous runs.

10.2.6. Export labels

The assignment of (confirmed or manual) labels can be exported from the interface to a CSV file (button "Export classifications"). The format has the same structure as the input format that is allowed for importing training material.

10.2.7. Training and classifying directly from the search page

With the button on the top right of the page a new model based on all previously manually classified or confirmed documents can be trained. Similar, the button on the top right is used to classify all unclassified documents based on the current model.


11. Application Interface: REST API

11.1. Overview

The REST API provides access to   functionality for third-party applications. The API is HTTP-based, so it can be used with any language that has an HTTP library, such as curl.

11.1.1. Base URL

All API endpoints are relative to the base URL. For example, assuming   is available at http://localhost:8080/information-discovery, the REST API base URL for all endpoints is:

http://localhost:8080/information-discovery/rest/

All API endpoints are relative to the base URL. For example, assuming   is available at http://localhost:8080/health-discovery, the REST API base URL for all endpoints is:

http://localhost:8080/health-discovery/rest/

All API endpoints are relative to the base URL. For example, assuming   is available at http://localhost:8080/patent-monitor, the REST API base URL for all endpoints is:

http://localhost:8080/patent-monitor/rest/

11.1.2.
API Versions

The   REST API has multiple versions. You can specify the version in the request URL after the REST API base URL. For example, here's a call to API version 1 indicated by the v1 URL path:

curl -X GET 'http://localhost:8080/information-discovery/rest/v1/buildInfo'

curl -X GET 'http://localhost:8080/health-discovery/rest/v1/buildInfo'
curl -X GET 'http://localhost:8080/patent-monitor/rest/v1/buildInfo'

11.1.3. Response

Typically, requests to the REST API are answered with a JSON return. The return object essentially consists of a payload property, which contains the actual user data, and an errorMessages property, which contains any error messages. Successful API requests are answered with a HTTP status code 200.

{
	"payload": {},
    "errorMessages": [
      "string"
    ]
}


11.1.4. API Tokens

The REST API (starting with API version 1) uses API tokens to protect resources against unauthorized use. Users can create personalized API tokens and use them for authentication on API calls:

To authorize an API call, the user underlying the API token is used. The API tokens must be transferred in the api-token header. The following example shows a REST API request using an API token:

curl -X GET --header 'api-token: 235907816cd27cc1411633bea37fc5c7af38030f6ce22888d0d49872b8b74ad6' 'http://localhost:8080/information-discovery/rest/v1/buildInfo'
curl -X GET --header 'api-token: 235907816cd27cc1411633bea37fc5c7af38030f6ce22888d0d49872b8b74ad6' 'http://localhost:8080/health-discovery/rest/v1/buildInfo'
curl -X GET --header 'api-token: 235907816cd27cc1411633bea37fc5c7af38030f6ce22888d0d49872b8b74ad6' 'http://localhost:8080/patent-monitor/rest/v1/buildInfo'

11.1.5.
Error Handling

Errors are indicated by standard HTTP error codes. The following error codes are used by the REST API:

CodeDescription
400Bad request. Please check the error message for further details.
401Unauthorized request. Please supply a valid API token in the api-token header.
403Forbidden. The user that does not have the required privileges to access the resource.
404Resource could not be found.
405

Request method not supported.

500Internal server error.


Additional information may be provided by the JSON response that contains more details about the error. In this case, this additional information will be contained in the errorMessages property.

{
    "payload": null,
    "errorMessages": [
        "Pipeline \"MyPipeline\" has not been initialized"
    ]
}


11.1.6. Browser Interface

 comes with a built-in browser interface for the REST API based on Swagger UI. It allows you to get an overview of the API and to submit sample requests directly from the browser. The browser interface is available at

http://localhost:8080/information-discovery/swagger-ui.html

http://localhost:8080/health-discovery/swagger-ui.html
http://localhost:8080/patent-monitor/swagger-ui.html

11.2. Text Analysis

11.2.1. Create Pipeline

This function creates a new text analysis pipeline using a given pipeline configuration.

POST /v1/textanalysis/projects/{projectName}/pipelines
11.2.1.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
pipelineConfigurationDtobodystringA JSON object that encapsulates the pipeline configuration.
11.2.1.1.1. Example pipelineConfigurationDto
{
    "schemaVersion": "1.2",
    "name": "MyPipeline",
    "description": "A very simple pipeline",
    "analysisEnginePoolSize": 1,
    "casPoolSize": 2,
    "fixedFlow": [
      {
        "refs": "LanguageSetter"
      },
      {
        "refs": "SentenceAndTokenAnnotator"
      }
    ],
    "collectionReader": null,
    "components": [
      {
        "analysisEngines": [
          {
            "name": "SentenceAndTokenAnnotator",
            "template": "de.averbis.textanalysis.components.jtokannotator.JTokAnnotator",
            "resourceRefs": [],
            "parameters": [
              {
                "name": "genre",
                "value": "patent"
              },
              {
                "name": "addParagraphs",
                "value": "false"
              }
            ]
          },
          {
            "name": "LanguageSetter",
            "template": "de.averbis.textanalysis.components.languagesetter.LanguageSetter",
            "resourceRefs": [],
            "parameters": [
              {
                "name": "language",
                "value": "en"
              },
              {
                "name": "overwriteExisting",
                "value": "false"
              }
            ]
          }
        ],
        "aggregatedAnalysisEngines": [],
        "resources": []
      }
    ]
  }


curl -X POST "http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 1, \"casPoolSize\": 2, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"language\", \"value\": \"en\" }, { \"name\": \"overwriteExisting\", \"value\": \"false\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"
curl -X POST "http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 1, \"casPoolSize\": 2, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"language\", \"value\": \"en\" }, { \"name\": \"overwriteExisting\", \"value\": \"false\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 1, \"casPoolSize\": 2, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"language\", \"value\": \"en\" }, { \"name\": \"overwriteExisting\", \"value\": \"false\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"

11.2.1.2. Response
{
  "payload": "MyPipeline",
  "errorMessages": []
}


11.2.2. Get Pipeline

The get pipeline function provides access to pipeline details and status information.

GET /v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}
11.2.2.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
pipelineNamepathstringThe name of the text analysis pipeline.


curl -X GET --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline'
curl -X GET --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline'
curl -X GET --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline'

11.2.2.2. Response
{
  "payload": {
    "id": 63506,
    "name": "MyPipeline",
    "description": "A simple pipeline",
    "pipelineState": "STARTED",
    "pipelineStateMessage": null,
    "preconfigured": false,
    "scaleOuted": false
  },
  "errorMessages": []
}

11.2.3. Get Pipeline Configuration

This function retrieves the detailed pipeline configuration. It can be used to clone pipeline configurations to other projects or instances.

11.2.3.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
pipelineNamepathstringThe name of the text analysis pipeline.

curl -X GET "http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5"
curl -X GET "http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5"
curl -X GET "http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5"

11.2.3.2. Response
{
  "payload": {
    "schemaVersion": "1.2",
    "name": "MyPipeline",
    "description": "A very simple pipeline",
    "analysisEnginePoolSize": 1,
    "casPoolSize": 2,
    "fixedFlow": [
      {
        "refs": "LanguageSetter"
      },
      {
        "refs": "SentenceAndTokenAnnotator"
      }
    ],
    "collectionReader": null,
    "components": [
      {
        "analysisEngines": [
          {
            "name": "SentenceAndTokenAnnotator",
            "template": "de.averbis.textanalysis.components.jtokannotator.JTokAnnotator",
            "resourceRefs": [],
            "parameters": [
              {
                "name": "genre",
                "value": "patent"
              },
              {
                "name": "addParagraphs",
                "value": "false"
              }
            ]
          },
          {
            "name": "LanguageSetter",
            "template": "de.averbis.textanalysis.components.languagesetter.LanguageSetter",
            "resourceRefs": [],
            "parameters": [
              {
                "name": "overwriteExisting",
                "value": "false"
              },
              {
                "name": "language",
                "value": "en"
              }
            ]
          }
        ],
        "aggregatedAnalysisEngines": [],
        "resources": []
      }
    ]
  },
  "errorMessages": []
}


11.2.4. Change Pipeline Configuration

This function can be used to change the configuration of a pipeline.

11.2.4.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
pipelineNamepathstringThe name of the text analysis pipeline.
pipelineConfigurationDtobodystringA JSON object that encapsulates the pipeline configuration.
11.2.4.1.1. Example pipelineConfigurationDto
{
    "schemaVersion": "1.2",
    "name": "MyPipeline",
    "description": "A very simple pipeline",
    "analysisEnginePoolSize": 2,
    "casPoolSize": 4,
    "fixedFlow": [
      {
        "refs": "LanguageSetter"
      },
      {
        "refs": "SentenceAndTokenAnnotator"
      },
      {
        "refs": "SnowballStemAnnotator"
      }
    ],
    "collectionReader": null,
    "components": [
      {
        "analysisEngines": [
          {
            "name": "SnowballStemAnnotator",
            "template": "de.averbis.textanalysis.components.snowballstemannotator.SnowballStemAnnotator",
            "resourceRefs": [],
            "parameters": []
          },
          {
            "name": "SentenceAndTokenAnnotator",
            "template": "de.averbis.textanalysis.components.jtokannotator.JTokAnnotator",
            "resourceRefs": [],
            "parameters": [
              {
                "name": "genre",
                "value": "patent"
              },
              {
                "name": "addParagraphs",
                "value": "false"
              }
            ]
          },
          {
            "name": "LanguageSetter",
            "template": "de.averbis.textanalysis.components.languagesetter.LanguageSetter",
            "resourceRefs": [],
            "parameters": [
              {
                "name": "overwriteExisting",
                "value": "false"
              },
              {
                "name": "language",
                "value": "en"
              }
            ]
          }
        ],
        "aggregatedAnalysisEngines": [],
        "resources": []
      }
    ]
  }         

curl -X PUT "http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 2, \"casPoolSize\": 4, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" }, { \"refs\": \"SnowballStemAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SnowballStemAnnotator\", \"template\": \"de.averbis.textanalysis.components.snowballstemannotator.SnowballStemAnnotator\", \"resourceRefs\": [], \"parameters\": [] }, { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"overwriteExisting\", \"value\": \"false\" }, { \"name\": \"language\", \"value\": \"en\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"
curl -X PUT "http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 2, \"casPoolSize\": 4, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" }, { \"refs\": \"SnowballStemAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SnowballStemAnnotator\", \"template\": \"de.averbis.textanalysis.components.snowballstemannotator.SnowballStemAnnotator\", \"resourceRefs\": [], \"parameters\": [] }, { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"overwriteExisting\", \"value\": \"false\" }, { \"name\": \"language\", \"value\": \"en\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"
curl -X PUT "http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/configuration" -H "accept: */*" -H "api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5" -H "Content-Type: application/json" -d "{ \"schemaVersion\": \"1.2\", \"name\": \"MyPipeline\", \"description\": \"A very simple pipeline\", \"analysisEnginePoolSize\": 2, \"casPoolSize\": 4, \"fixedFlow\": [ { \"refs\": \"LanguageSetter\" }, { \"refs\": \"SentenceAndTokenAnnotator\" }, { \"refs\": \"SnowballStemAnnotator\" } ], \"collectionReader\": null, \"components\": [ { \"analysisEngines\": [ { \"name\": \"SnowballStemAnnotator\", \"template\": \"de.averbis.textanalysis.components.snowballstemannotator.SnowballStemAnnotator\", \"resourceRefs\": [], \"parameters\": [] }, { \"name\": \"SentenceAndTokenAnnotator\", \"template\": \"de.averbis.textanalysis.components.jtokannotator.JTokAnnotator\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"genre\", \"value\": \"patent\" }, { \"name\": \"addParagraphs\", \"value\": \"false\" } ] }, { \"name\": \"LanguageSetter\", \"template\": \"de.averbis.textanalysis.components.languagesetter.LanguageSetter\", \"resourceRefs\": [], \"parameters\": [ { \"name\": \"overwriteExisting\", \"value\": \"false\" }, { \"name\": \"language\", \"value\": \"en\" } ] } ], \"aggregatedAnalysisEngines\": [], \"resources\": [] } ] }"

11.2.4.2. Response
{
  "payload": null,
  "errorMessages": []
}

11.2.5. Start Pipeline

The asynchronous start pipeline function allows to trigger a pipeline start. Starting a pipeline may take some time. The pipeline status can be queried using the get pipeline function.

PUT /v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}/start
11.2.5.1.
Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
pipelineNamepathstringThe name of the text analysis pipeline.

curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/start'
curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/start'
curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/start'

11.2.5.2. Response
{
  "payload": null,
  "errorMessages": []
}

11.2.6. Stop Pipeline

The asynchronous stop pipeline function allows to trigger a pipeline shutdown. Stopping a pipeline may take some time. The pipeline status can be queried using the get pipeline function.

PUT /v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}/stop
11.2.6.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
pipelineNamepathstringThe name of the text analysis pipeline.

curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/stop'
curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/stop'
curl -X PUT --header 'Content-Type: application/json' --header 'api-token: 6bd9edf94e9ea854933fa29f89008f1b9b66b44cbfd058f1f381490dcd4304a5' 'http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/stop'

11.2.6.2. Response
{
  "payload": null,
  "errorMessages": []
}

11.2.7. Analyse Text

The analyse text function allows to analyse plain text with a pipeline.

POST /v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}/analyseText
11.2.7.1.
Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
pipelineNamepathstringThe name of the text analysis pipeline.
textbodystringThe text that will be analyzed.
languagequerystringOptional parameter to specify the language of the text. Can be omitted if the pipeline has built-in language detection.
annotationTypesquerystringOptional parameter to specify what kind of annotations (like sentences, concepts, diagnoses) should be analyzed. Takes a comma separated list of annotation type class names as specified in the type system. Wildcards are supported.


curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d 'Some sample text' 'http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseText?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'
curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d 'Some sample text' 'http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseText?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'
curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d 'Some sample text' 'http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseText?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'

11.2.7.2. Response
{
    "payload": [
        {
            "begin": 0,
            "end": 16,
            "type": "de.averbis.types.Sentence",
            "coveredText": "Some sample text",
            "id": 13
        },
        {
            "begin": 0,
            "end": 4,
            "type": "de.averbis.types.Token",
            "coveredText": "Some",
            "id": 19
        },
        {
            "begin": 5,
            "end": 11,
            "type": "de.averbis.types.Token",
            "coveredText": "sample",
            "id": 39
        },
        {
            "begin": 12,
            "end": 16,
            "type": "de.averbis.types.Token",
            "coveredText": "text",
            "id": 59
        },
        {
            "begin": 0,
            "end": 4,
            "type": "de.averbis.extraction.types.Token",
            "coveredText": "Some",
            "id": 19,
            "tokenClass": "FIRST_UPPER_CASE",
            "componentId": "JTokAnnotator",
            "normalized": "some",
            "confidence": 0.0,
            "lemma": null,
            "ignoreByConceptMapper": false,
            "isStopword": false,
            "segments": null,
            "concepts": null,
            "entities": null,
            "posTag": null,
            "isAbbreviation": false,
            "isInvariant": false,
            "diacriticsFreeVersions": null,
            "stem": {
                "begin": 0,
                "end": 4,
                "type": "de.averbis.extraction.types.Stem",
                "coveredText": "Some",
                "id": 79,
                "componentId": "SnowballStemAnnotator",
                "confidence": 0.0,
                "value": "Some"
            },
            "abbreviations": null
        },
        {
            "begin": 5,
            "end": 11,
            "type": "de.averbis.extraction.types.Token",
            "coveredText": "sample",
            "id": 39,
            "tokenClass": "ALL_LOWER_CASE",
            "componentId": "JTokAnnotator",
            "normalized": "sample",
            "confidence": 0.0,
            "lemma": null,
            "ignoreByConceptMapper": false,
            "isStopword": false,
            "segments": null,
            "concepts": null,
            "entities": null,
            "posTag": null,
            "isAbbreviation": false,
            "isInvariant": false,
            "diacriticsFreeVersions": null,
            "stem": {
                "begin": 5,
                "end": 11,
                "type": "de.averbis.extraction.types.Stem",
                "coveredText": "sample",
                "id": 86,
                "componentId": "SnowballStemAnnotator",
                "confidence": 0.0,
                "value": "sampl"
            },
            "abbreviations": null
        },
        {
            "begin": 12,
            "end": 16,
            "type": "de.averbis.extraction.types.Token",
            "coveredText": "text",
            "id": 59,
            "tokenClass": "ALL_LOWER_CASE",
            "componentId": "JTokAnnotator",
            "normalized": "text",
            "confidence": 0.0,
            "lemma": null,
            "ignoreByConceptMapper": false,
            "isStopword": false,
            "segments": null,
            "concepts": null,
            "entities": null,
            "posTag": null,
            "isAbbreviation": false,
            "isInvariant": false,
            "diacriticsFreeVersions": null,
            "stem": {
                "begin": 12,
                "end": 16,
                "type": "de.averbis.extraction.types.Stem",
                "coveredText": "text",
                "id": 93,
                "componentId": "SnowballStemAnnotator",
                "confidence": 0.0,
                "value": "text"
            },
            "abbreviations": null
        }
    ],
    "errorMessages": []
}

11.2.8. Analyse HTML

The analyse html function allows to analyse HTML structured text with a pipeline.

POST /v1/textanalysis/projects/{projectName}/pipelines/{pipelineName}/analyseHtml
11.2.8.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
pipelineNamepathstringThe name of the text analysis pipeline.
textbodystringThe HTML structured text that will be analyzed.
languagequerystringOptional parameter to specify the language of the text. Can be omitted if the pipeline has built-in language detection.
annotationTypesquerystringOptional parameter to specify what kind of annotations (like sentences, concepts, diagnoses) should be analyzed. Takes a comma separated list of annotation type class names as specified in the type system. Wildcards are supported.

curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d '<html><body>Some sample text</body></html>' 'http://localhost:8080/information-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseHtml?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'
curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d '<html><body>Some sample text</body></html>' 'http://localhost:8080/health-discovery/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseHtml?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'
curl -X POST --header 'Content-Type: text/plain' --header 'api-token: beb71a3a6b9dc0a7c535b0e38b3d86166b179e5c9a4dc49b4355fe179f34d519' -d '<html><body>Some sample text</body></html>' 'http://localhost:8080/patent-monitor/rest/v1/textanalysis/projects/MyProject/pipelines/MyPipeline/analyseHtml?language=en&annotationTypes=de.averbis.types.Sentence%2C*.Token'

11.2.8.2. Response
{
    "payload": [
        {
            "begin": 0,
            "end": 16,
            "type": "de.averbis.types.Sentence",
            "coveredText": "Some sample text",
            "id": 73
        },
        {
            "begin": 0,
            "end": 4,
            "type": "de.averbis.types.Token",
            "coveredText": "Some",
            "id": 79
        },
        {
            "begin": 5,
            "end": 11,
            "type": "de.averbis.types.Token",
            "coveredText": "sample",
            "id": 99
        },
        {
            "begin": 12,
            "end": 16,
            "type": "de.averbis.types.Token",
            "coveredText": "text",
            "id": 119
        },
        {
            "begin": 0,
            "end": 4,
            "type": "de.averbis.extraction.types.Token",
            "coveredText": "Some",
            "id": 79,
            "tokenClass": "FIRST_UPPER_CASE",
            "componentId": "JTokAnnotator",
            "normalized": "some",
            "confidence": 0.0,
            "lemma": null,
            "ignoreByConceptMapper": false,
            "isStopword": false,
            "segments": null,
            "concepts": null,
            "entities": null,
            "posTag": null,
            "isAbbreviation": false,
            "isInvariant": false,
            "diacriticsFreeVersions": null,
            "stem": {
                "begin": 0,
                "end": 4,
                "type": "de.averbis.extraction.types.Stem",
                "coveredText": "Some",
                "id": 139,
                "componentId": "SnowballStemAnnotator",
                "confidence": 0.0,
                "value": "Some"
            },
            "abbreviations": null
        },
        {
            "begin": 5,
            "end": 11,
            "type": "de.averbis.extraction.types.Token",
            "coveredText": "sample",
            "id": 99,
            "tokenClass": "ALL_LOWER_CASE",
            "componentId": "JTokAnnotator",
            "normalized": "sample",
            "confidence": 0.0,
            "lemma": null,
            "ignoreByConceptMapper": false,
            "isStopword": false,
            "segments": null,
            "concepts": null,
            "entities": null,
            "posTag": null,
            "isAbbreviation": false,
            "isInvariant": false,
            "diacriticsFreeVersions": null,
            "stem": {
                "begin": 5,
                "end": 11,
                "type": "de.averbis.extraction.types.Stem",
                "coveredText": "sample",
                "id": 146,
                "componentId": "SnowballStemAnnotator",
                "confidence": 0.0,
                "value": "sampl"
            },
            "abbreviations": null
        },
        {
            "begin": 12,
            "end": 16,
            "type": "de.averbis.extraction.types.Token",
            "coveredText": "text",
            "id": 119,
            "tokenClass": "ALL_LOWER_CASE",
            "componentId": "JTokAnnotator",
            "normalized": "text",
            "confidence": 0.0,
            "lemma": null,
            "ignoreByConceptMapper": false,
            "isStopword": false,
            "segments": null,
            "concepts": null,
            "entities": null,
            "posTag": null,
            "isAbbreviation": false,
            "isInvariant": false,
            "diacriticsFreeVersions": null,
            "stem": {
                "begin": 12,
                "end": 16,
                "type": "de.averbis.extraction.types.Stem",
                "coveredText": "text",
                "id": 153,
                "componentId": "SnowballStemAnnotator",
                "confidence": 0.0,
                "value": "text"
            },
            "abbreviations": null
        }
    ],
    "errorMessages": []
}



11.2.9. Result Format (XML)

The answer of the web service is returned in XML format and contains the text analysis for the input data set. For more information about the data format, see chapter Available Text Mining Annotators & Web Service Specification.


11.3. Document Classification

11.3.1. Classify Document

The classify document function allows to automatically classify documents.

POST /classification/projects/{projectName}/classificationSets/{classificationSetName}/classifyDocument
11.3.1.1. Request Parameters
NameParameter TypeData TypeDescription
projectNamepathstringThe name of the project.
classificationSetNamepathstringThe name of the classification configuration.
typequerystringThe document format type. Supported values are Solr XML Importer
requestBodybodystringThe document content.
AcceptheaderstringSpecifies the resonse format. Supported values are application/json and application/xml.


curl -X POST --header 'Content-Type: text/plain' --header 'Accept: application/json' -d '<update><add><doc><field name="document_name">24552733</field><field name="title">Machine learning for automatic text classification</field><field name="content">Machine learning is a subset of artificial intelligence in the field of computer science.</field></doc></add></update>' 'http://localhost:8080/information-discovery/rest/classification/projects/MyProject/classificationSets/MyClassificationConfiguration/classifyDocument?type=Solr%20XML%20Importer'
curl -X POST --header 'Content-Type: text/plain' --header 'Accept: application/json' -d '<update><add><doc><field name="document_name">24552733</field><field name="title">Machine learning for automatic text classification</field><field name="content">Machine learning is a subset of artificial intelligence in the field of computer science.</field></doc></add></update>' 'http://localhost:8080/health-discovery/rest/classification/projects/MyProject/classificationSets/MyClassificationConfiguration/classifyDocument?type=Solr%20XML%20Importer'
curl -X POST --header 'Content-Type: text/plain' --header 'Accept: application/json' -d '<update><add><doc><field name="document_name">24552733</field><field name="title">Machine learning for automatic text classification</field><field name="content">Machine learning is a subset of artificial intelligence in the field of computer science.</field></doc></add></update>' 'http://localhost:8080/patent-monitor/rest/classification/projects/MyProject/classificationSets/MyClassificationConfiguration/classifyDocument?type=Solr%20XML%20Importer'

11.3.1.2. Response (JSON)
{
    "classifications": [
        {
            "documentIdentifier": "24552733",
            "success": true,
            "labels": [
                {
                    "confidence": 0.537,
                    "name": "Irrelevant"
                }
            ]
        }
    ]
}
11.3.1.3. Response (XML)
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<response>
  <classifications>
    <classification documentIdentifier="24552733" success="true">
      <labels>
        <label confidence="0.537">Irrelevant</label>
      </labels>
    </classification>
  </classifications>
</response>

11.4. Document Search

11.4.1. Select

The select function is used to search for documents. It supports Apache Solr query syntax. (Page 225). 

GET /v1/search/projects/{projectName}/select
11.4.1.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
debugQueryquerybooleanRequest additional debugging information in the response.
facetquerybooleanIf set to true, enables faceting.
facet.fieldquerystringIdentifies a field to be treated as a facet.
flquerystringLimits the information included in a query response to a specified list of fields.
fqquerystringApplies a filter query to the search results.
projectNamepathstringThe name of the project.
qquerystringDefines a query using standard query syntax.
rowsqueryintegerControls how many rows of responses are displayed at a time.
sortquerystringSorts the response to a query in either ascending or descending order based on the response’s score or another specified characteristic.
startqueryintegerSpecifies an offset (by default, 0) into the responses at which Solr should begin displaying content.


curl -X GET "http://localhost:8080/information-discovery/rest/v1/search/projects/NewProject/select?fl=title&q=*&rows=3" -H "accept: */*" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"
curl -X GET "http://localhost:8080/health-discovery/rest/v1/search/projects/NewProject/select?fl=title&q=*&rows=3" -H "accept: */*" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"
curl -X GET "http://localhost:8080/patent-monitor/rest/v1/search/projects/NewProject/select?fl=title&q=*&rows=3" -H "accept: */*" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"

11.4.1.2. Response
{
  "payload": {
    "solrResponse": {
      "responseHeader": {
        "status": 0,
        "QTime": 3
      },
      "response": {
        "numFound": 3000,
        "start": 0,
        "docs": [
          {
            "title": "Impact of the reconstruction method on delayed gastric emptying after pylorus-preserving pancreaticoduodenectomy: a prospective randomized study."
          },
          {
            "title": "Biomonitoring of cadmium, chromium, nickel and arsenic in general population living near mining and active industrial areas in Southern Tunisia."
          },
          {
            "title": "Vegetation response to hydrologic and geomorphic factors in an arid region of the Baja California Peninsula."
          }
        ]
      },
      "highlighting": {
        "Medline45b17f83-6241-4225-a49e-eba3deb9822d": {},
        "Medlinec167e5b9-7495-4b0d-8436-0d9c33fed3c2": {},
        "Medlined8553a84-4917-4d83-b3cf-f058ca4bad82": {}
      }
    },
    "conceptMapping": {},
    "entityMapping": {}
  },
  "errorMessages": []
}

11.4.1.3. Examples

Select the ids all documents that contain an ICD-10 R53 diagnose:

curl -X GET "https://localhost:8080/health-discovery/rest/v1/search/projects/NewProject/select?fl=id&q=R53" -H "accept: */*" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"



11.5. User Management

11.5.1. Generate API Token

This function generates an API token for a given user.

POST /v1/users/{userName}/apitoken
11.5.1.1. Request Parameters
NameParameter TypeData TypeDescription
userNamepathstringThe name of the user
webServiceLoginDtobodystringA JSON object that encapsulates the user password and an optional LDAP / ActiveDirectory user source name 
11.5.1.1.1. Example webServiceLoginDto with LDAP user
{
  "password": "mySecretPassword",
  "userSourceName": "CompanyLDAP"
}
11.5.1.1.2. Example webServiceLoginDto with local user 
{
  "password": "mySecretPassword",
  "userSourceName": ""
}

curl -X POST "http://localhost:8080/information-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X POST "http://localhost:8080/health-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"

11.5.1.2. Response
{
  "payload": "21234f9c4c4b8e1dd740168e2c5d84db8e9eaa2c3a7cbe61c0ff982aa0743040",
  "errorMessages": []
}

11.5.2. Regenerate API Token

This function replaces an existing API token with a new one.

PUT /v1/users/{userName}/apitoken
11.5.2.1. Request Parameters
NameParameter TypeData TypeDescription
userNamepathstringThe name of the user
webServiceLoginDtobodystringA JSON object that encapsulates the user password and an optional LDAP / ActiveDirectory user source name 
11.5.2.1.1. Example webServiceLoginDto with LDAP user
{
  "password": "mySecretPassword",
  "userSourceName": "CompanyLDAP"
}
11.5.2.1.2. Example webServiceLoginDto with local user 
{
  "password": "mySecretPassword",
  "userSourceName": ""
}

curl -X PUT "http://localhost:8080/information-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X PUT "http://localhost:8080/health-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X PUT "http://localhost:8080/patent-monitor/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"

11.5.2.2. Response
{
  "payload": "41ecdf9be70ae524c2431f54140674fcf719213f50ca34d1ebbdf5b2437cfe59",
  "errorMessages": []
}

11.5.3. Invalidate API Token

This function revokes the API token of a given user.

DELETE /v1/users/{userName}/apitoken
11.5.3.1. Request Parameters
NameParameter TypeData TypeDescription
userNamepathstringThe name of the user
webServiceLoginDtobodystringA JSON object that encapsulates the user password and an optional LDAP / ActiveDirectory user source name 

Example webServiceLoginDto with LDAP user

{
  "password": "mySecretPassword",
  "userSourceName": "CompanyLDAP"
}
11.5.3.1.1. Example webServiceLoginDto with local user 
{
  "password": "mySecretPassword",
  "userSourceName": ""
}

curl -X DELETE "http://localhost:8080/information-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X DELETE "http://localhost:8080/health-discovery/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X DELETE "http://localhost:8080/patent-monitor/rest/v1/users/admin/apitoken" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"

11.5.3.2. Response
{
  "payload": null,
  "errorMessages": []
}

11.5.4. Get API Token Status

This function returns the status of a users API token.

GET /v1/users/{userName}/apitoken/status
11.5.4.1. Request Parameters
NameParameter TypeData TypeDescription
userNamepathstringThe name of the user
webServiceLoginDtobodystringA JSON object that encapsulates the user password and an optional LDAP / ActiveDirectory user source name 
11.5.4.1.1. Example webServiceLoginDto with LDAP user
{
  "password": "mySecretPassword",
  "userSourceName": "CompanyLDAP"
}
11.5.4.1.2. Example webServiceLoginDto with local user 
{
  "password": "mySecretPassword",
  "userSourceName": ""
}

curl -X POST "http://localhost:8080/information-discovery/rest/v1/users/admin/apitoken/status" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X POST "http://localhost:8080/health-discovery/rest/v1/users/admin/apitoken/status" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/users/admin/apitoken/status" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"password\": \"admin\", \"userSourceName\": \"\"}"

11.5.4.2. Response
{
  "payload": "EMPTY",
  "errorMessages": []
}

11.5.5. Change Password

This function is used to change a users password.

PUT /v1/users/{userName}/changeMyPassword
11.5.5.1. Request Parameters
NameParameter TypeData TypeDescription
userNamepathstringThe name of the user
changeMyPasswordDtobodystringA JSON object that encapsulates the users old password and the new password.
11.5.5.1.1. Example changeMyPasswordDto
{
  "oldPassword": "admin",
  "newPassword": "myN3wP4ssw0rd"
}

curl -X PUT "http://localhost:8080/information-discovery/rest/v1/users/admin/changeMyPassword" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"oldPassword\": \"admin\", \"newPassword\": \"myN3wP4ssw0rd\"}"
curl -X PUT "http://localhost:8080/health-discovery/rest/v1/users/admin/changeMyPassword" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"oldPassword\": \"admin\", \"newPassword\": \"myN3wP4ssw0rd\"}"
curl -X PUT "http://localhost:8080/patent-monitor/rest/v1/users/admin/changeMyPassword" -H "accept: */*" -H "Content-Type: application/json" -d "{ \"oldPassword\": \"admin\", \"newPassword\": \"myN3wP4ssw0rd\"}"

{
11.5.5.2. Response
{
  "payload": null,
  "errorMessages"
}
: [] }

11.6. Project Management

11.6.1. Create Project

This function is used to create new projects.


POST /v1/projects
11.6.1.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
descriptionquerystringThe project description.
namequerystringThe project name.


curl -X POST "http://localhost:8080/information-discovery/rest/v1/projects?description=Some%20meaningful%20project%20description&name=NewProject" -H "accept: application/json;charset=UTF-8" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"
curl -X POST "http://localhost:8080/health-discovery/rest/v1/projects?description=Some%20meaningful%20project%20description&name=NewProject" -H "accept: application/json;charset=UTF-8" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/projects?description=Some%20meaningful%20project%20description&name=NewProject" -H "accept: application/json;charset=UTF-8" -H "api-token: 1746847c8798c4eada1008eab95efc56b9acaef1ee1505ed6a0deb6ec0a90914"

11.6.1.2. Response
{
  "payload": {
    "id": 1009,
    "name": "NewProject",
    "description": "Some meaningful project description"
  },
  "errorMessages": []
}

11.7. Terminology Management

11.7.1. Get All Terminologies

This function is used to get all terminologies in a project.

GET /v1/terminology/projects/{projectName}/terminologies
11.7.1.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.

curl -X GET "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X GET "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X GET "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"

11.7.1.2. Response
{
  "payload": [
    {
      "terminologyName": "MyTerminology",
      "label": "My Terminology",
      "version": "1.0",
      "allowedLanguageCodes": [
        "de",
        "en"
      ],
      "hierarchical": true,
      "conceptType": "de.averbis.extraction.types.Concept"
    }
  ],
  "errorMessages": []
}


11.7.2. Create Terminology

This function is used to create a new terminology.

POST /v1/terminology/projects/{projectName}/terminologies
11.7.2.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstring

The name of the project.

webserviceTerminologyDtobodystringTerminology properties in JSON
11.7.2.2. Example webserviceTerminologyDto
{
  "conceptType": "de.averbis.extraction.types.Concept",
  "hierarchical": true,
  "terminologyName": "MyTerminology",
  "label": "My Terminology",
  "version": "1.0",
  "allowedLanguageCodes": [
    "en",
	"de"
  ]
}

curl -X POST "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/json" -d "{ \"conceptType\": \"de.averbis.extraction.types.Concept\", \"hierarchical\": true, \"terminologyName\": \"MyTerminology\", \"label\": \"My Terminology\", \"version\": \"1.0\", \"allowedLanguageCodes\": [ \"en\",\t\"de\" ]}"

curl -X POST "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/json" -d "{ \"conceptType\": \"de.averbis.extraction.types.Concept\", \"hierarchical\": true, \"terminologyName\": \"MyTerminology\", \"label\": \"My Terminology\", \"version\": \"1.0\", \"allowedLanguageCodes\": [ \"en\",\t\"de\" ]}"
curl X POST "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/json" -d "{ \"conceptType\": \"de.averbis.extraction.types.Concept\", \"hierarchical\": true, \"terminologyName\": \"MyTerminology\", \"label\": \"My Terminology\", \"version\": \"1.0\", \"allowedLanguageCodes\": [ \"en\",\t\"de\" ]}"

11.7.2.3. Response
{
  "payload": {
    "terminologyName": "MyTerminology",
    "label": "My Terminology",
    "version": "1.0",
    "allowedLanguageCodes": [
      "en",
      "de"
    ],
    "hierarchical": true,
    "conceptType": "de.averbis.extraction.types.Concept"
  },
  "errorMessages": []
}



11.7.3. Import Terminology

This function is used to import content into an existing terminology. Existing terminology content will be replaced.


POST /v1/terminology/projects/{projectName}/terminologies/{terminologyName}/terminologyImports
11.7.3.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
terminologyNamepathstringThe name of the terminology
requestBodybodystring

The terminology content.

terminologyImportImporterNamequerystringThe importer name. Currently only OBO Importer is supported.


curl -X POST "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports?terminologyImportImporterName=OBO%20Importer" -H "accept: application/json;charset=UTF-8" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/octet-stream" -d "[Term]id: Aname: Vehiclessynonym: \"Vehicles\" EXACT PREF [][Term]id: Bname: Autosynonym: \"Auto\" EXACT PREF []synonym: \"Automobile\" EXACT []synonym: \"Car\" EXACT PREF []is_a: A ! Vehicles[Term]id: Cname: Boatsynonym: \"Boat\" EXACT PREF []is_a: A ! Vehicles[Term]id: Dname: Aircraftsynonym: \"Aircraft\" EXACT PREF []is_a: A ! Vehicles"
curl -X POST "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports?terminologyImportImporterName=OBO%20Importer" -H "accept: application/json;charset=UTF-8" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/octet-stream" -d "[Term]id: Aname: Vehiclessynonym: \"Vehicles\" EXACT PREF [][Term]id: Bname: Autosynonym: \"Auto\" EXACT PREF []synonym: \"Automobile\" EXACT []synonym: \"Car\" EXACT PREF []is_a: A ! Vehicles[Term]id: Cname: Boatsynonym: \"Boat\" EXACT PREF []is_a: A ! Vehicles[Term]id: Dname: Aircraftsynonym: \"Aircraft\" EXACT PREF []is_a: A ! Vehicles"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports?terminologyImportImporterName=OBO%20Importer" -H "accept: application/json;charset=UTF-8" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b" -H "Content-Type: application/octet-stream" -d "[Term]id: Aname: Vehiclessynonym: \"Vehicles\" EXACT PREF [][Term]id: Bname: Autosynonym: \"Auto\" EXACT PREF []synonym: \"Automobile\" EXACT []synonym: \"Car\" EXACT PREF []is_a: A ! Vehicles[Term]id: Cname: Boatsynonym: \"Boat\" EXACT PREF []is_a: A ! Vehicles[Term]id: Dname: Aircraftsynonym: \"Aircraft\" EXACT PREF []is_a: A ! Vehicles"

11.7.3.2. Response
{
  "payload": null,
  "errorMessages": []
}

The terminology import is executed asynchronously and may take some time depending on the size of the terminology.

11.7.4. Retrieve Terminology Import Information

The status and progress of a terminology import can be retrieved with the following function:

GET /v1/terminology/projects/{projectName}/terminologies/{terminologyName}/terminologyImports
11.7.4.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
terminologyNamepathstringThe name of the terminology

curl -X GET "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X GET "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X GET "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyImports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"

11.7.4.2. Response
{
  "payload": {
    "id": 601,
    "terminologyId": 600,
    "state": "COMPLETED",
    "totalNumberOfConcepts": 4,
    "numberOfProcessedConcepts": 4,
    "numberOfSkippedConcepts": 0,
    "numberOfProcessedConceptsWithRelations": 4,
    "startDate": 1584718172887,
    "endDate": 1584718173218,
    "messageDtos": []
  },
  "errorMessages": []
}

11.7.5. Export Terminology

This function is used to export a terminology to be used in a text analysis pipeline.

POST /v1/terminology/projects/{projectName}/terminologies/{terminologyName}/terminologyExports
11.7.5.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
terminologyNamepathstringThe name of the terminology
terminologyExporterNamequerystringThe exporter name. Currently only Concept Dictionary XML Exporter is supported.

curl -X POST "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports?terminologyExporterName=Concept%20Dictionary%20XML%20Exporter" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X POST "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports?terminologyExporterName=Concept%20Dictionary%20XML%20Exporter" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X POST "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports?terminologyExporterName=Concept%20Dictionary%20XML%20Exporter" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"

11.7.5.2. Response
{
  "payload": null,
  "errorMessages": []
}

The terminology export is executed asynchronously and may take some time depending on the size of the terminology.

11.7.6. Retrieve Terminology Export Information

The status and progress of a terminology export can be retrieved with the following function:

GET /v1/terminology/projects/{projectName}/terminologies/{terminologyName}/terminologyExports
11.7.6.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
terminologyNamepathstringThe name of the terminology

curl -X GET "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X GET "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"

curl -X GET "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology/terminologyExports" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"

11.7.6.2. Response
{
  "payload": {
    "id": 620,
    "terminologyId": 600,
    "state": "COMPLETED",
    "totalNumberOfConcepts": 4,
    "numberOfProcessedConcepts": 4,
    "startDate": 1584719372703,
    "endDate": 1584719372921,
    "messageDtos": [],
    "exporterName": "Concept Dictionary XML Exporter",
    "stateMessage": "Submitted terminology to text analysis ( 4 / 4 )",
    "oboDownloadAvailable": false
  },
  "errorMessages": []
}

11.7.7. Delete Terminology

This function is used to delete a terminology.

DELETE /v1/terminology/projects/{projectName}/terminologies/{terminologyName}
11.7.7.1. Request Parameters
NameParameter TypeData TypeDescription
api-tokenheaderstringThe API token for your user.
projectNamepathstringThe name of the project.
terminologyNamepathstringThe name of the terminology

curl -X DELETE "http://localhost:8080/health-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"
curl -X DELETE "http://localhost:8080/information-discovery/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"

curl -X DELETE "http://localhost:8080/patent-monitor/rest/v1/terminology/projects/MyProject/terminologies/MyTerminology" -H "accept: */*" -H "api-token: 7116d2bb104c635d379ccca286f2cc9b5ddb4664829922148f7e882c004a6c0b"

11.7.7.2. Response
{
  "payload": null,
  "errorMessages": []
}



  • No labels