Welcome to the tutorial on how to combine Power BI Desktop with Health Discovery to turn text mining results into meaningful dashboards.
- The tutorial assumes that you already have Health Discovery and Power BI Desktop installed. In this tutorial, we use Health Discovery V6.1 and Power BI Desktop Version: 2.88.1144.0 64-bit
- In this tutorial, we will connect Power BI Desktop to the Text Analysis Export API of Health Discovery for which you need a valid API token. If you do not have one, contact your Health Discovery Administrator.
- We use an n2c2 NLP data set (a set of around 1.200 deidentified English medical records, formerly known as i2b2 NLP data set). Due to licensing restrictions, we are unable to provide this data set for this tutorial. More info about the dataset can be found here: https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/
- In Health Discovery, we created a project named intern and imported the dataset under the document source name i2b2. We then analyzed the data set using our preconfigured Discharge pipeline. The associated text analysis process is named i2b2_x_discharge.
Connect to data
With Power BI Desktop installed, you're ready to connect to Health Discovery. To see the many types of data sources available, select Get Data > More... in the Power BI Desktop Home tab, and in the Get Data window, search for "Web". In this tutorial, we connect to Health Discovery using the Web data source.
In the From Web dialog box, switch to Advanced and paste the address of your Search API (in this tutorial, https://health-discovery.averbis.com/health-discovery/rest/v1/textanalysis/projects/intern/documentSources/i2b2/processes/i2b2_x_discharge/export) into the URL field. Under HTTP request header parameters, enter the parameter "api-token" and the API token you want to use. Finally, select OK.
How to find out your URL
To figure out which URL to use, you need three pieces of information in addition to the Health Discovery base URL:
- the project name in Health Discovery (in this tutorial: intern)
- the document source name (in this tutorial: i2b2)
- the text analysis process you used to process the documents (in this tutorial: i2b2_x_discharge)
Then you can compose the URL according to the following scheme:
Now the Power Query Editor should open:
Click on the "Record" link to the right of payload. The following table should appear.
Click on List to the right of textAnalysisResultDtos to obtain a list of Records.
Now that you're connected to a data source, you can shape the data to meet your needs. Shaping doesn't affect the original data source, only this particular view of the data.
As a first step, click on the To Table button in the upper left and click Ok in the window that pops up.
Then, click on the little Expand icon in Column1:
Select both columns in the following drop down:
Again, click on the little Expand icon in Column1.annotationDtos:
In the window that pops up, select type and dictCanon. It soon becomes clear why we select these two fields.
The table should expand and now look like this:
By clicking on the down arrow to the right of Column1.annotationDtos.type, you can filter the column to show only diagnoses. To do this, select the entry en.averbis.types.health.Diagnosis.
And finally, give the query and the three columns more descriptive names. In our case we choose Diagnoses for the query and DocumentName, AnnotationType and Diagnosis for the columns.
Now we want to create a second query where we query the drugs. This is slightly more complex since the data model of a drug is a slightly more complex than that of a diagnosis.
Duplicate the query Diagnoses by right-clicking and rename the duplicated query to Drugs.
On the duplicated query Drugs, remove the bottom three Applied Steps that you see on the right side of the page (from bottom to top):
Expand Column1.annotationDtos and click on Load more.
Then, select type and drugs:
Filter the Column1.annotationDtos.type column for de.averbis.types.health.Medication and expand the column Column1.annotationDtos.drugs:
In the window that pops up, select ingredient.
Expand the ingredient column again and select dictCanon:
Then, rename the three columns to Document Name, AnnotationType and Drug. The result looks like this:
For now, we finish Shape Data activities and click on Close & Apply in the upper left:
Now the Power Query Editor should close and the Power BI Desktop window should be visible again. Here, click on the Model icon on the left.
Via drag and drop you can connect the Document Name of Diagnoses with the Document Name of Drugs. The following popup window should appear. Close this window with Ok.
By clicking on the Report icon on the left, switch to the Report view where you can build visualizations and reports. The Report view has six main areas:
- The ribbon at the top, which displays common tasks associated with reports and visualizations.
- The canvas area in the middle, where visualizations are created and arranged.
- The pages tab area at the bottom, which lets you select or add report pages.
- The Filters pane, where you can filter data visualizations.
- The Visualizations pane, where you can add, change, or customize visualizations, and apply drill-through.
- The Fields pane, which shows the available fields in your queries. You can drag these fields onto the canvas, the Filters pane, or the Visualizations pane to create or modify visualizations.
You can expand and collapse the Filters, Visualizations, and Fields panes by selecting the arrows at the tops of the panes. Collapsing the panes provides more space on the canvas to build cool visualizations.
To create a simple visualization, just select any field in the fields list, or drag the field from the Fields list onto the canvas. Let's drag the Diagnosis field from Diagnoses table and the Drug field from the Drugs table onto the canvas, and see what happens.
Now, select the Drug table in the Canvas and convert it to a Clustered Column Chart under Visualizations (1). Drag the Drug field in the Drugs table into the Values field in the Visualizations pane (2). Enlarge the Drug chart in the Canvas and position it to the lower right corner (3). The drugs are sorted alphabetically - if you want to sort them by frequency, use the menu at the top right of the visualization and click Sort By, Count of Diagnoses, and Sort descending (4).
Repeat these 4 steps for Diagnoses so that it looks like this:
We now want to calculate the average number of diagnoses per document. For that, we click on New measure in the ribbon at the top.
We then enter the following formula into the formula editor (1) and drag the measure from the Fields pane into the Canvas (2).
Avg Number of Diagnoses per Doc = COUNT(Diagnoses[Diagnosis])/DISTINCTCOUNT(Diagnoses[DocumentName])
Measures in Power BI are used for defining powerful calculations. This is done using DAX Language (Data Analysis Expression). DAX syntax goes beyond the scope of this tutorial. For more information , see the full Power BI documentation
Change the size and position of this measure in the Canvas (1) and set the visualization to a Card.
Finally, add a logo by clicking Insert (1) and Image (2) and position the logo in the upper right of the Canvas (3).
As we linked the field document name of both tables Drugs and Diagnoses with each other in a previous step, selecting diagnosis values in the left chart (Multiselect is possible by pressing SHIFT + Mouse clicks) (4) also drills down the drug chart on the right (5).
Congrats! You're now ready to create powerful charts using Health Discovery and Power BI Desktop.