Discngine

View Original

Streamlining Project Presentations with BIRT: A Chemical Library Enumeration KNIME Workflow Use Case

Every month each of us have at least one meeting where we have to illustrate the state of ongoing projects. These presentations often have the same structure, and yet, hours are spent to assemble them with copy-paste and screenshots, which is indeed quite inefficient. Thanks to KNIME’s integration of the business intelligence and reporting tool (BIRT), we can automate the creation of a PowerPoint with the help of BIRT for data visualization, dashboards and reports. This enables the organization and planning of meetings seamlessly in a few clicks, while maintaining the integrity of the data, reducing human errors, and increasing efficiency.  

To show how to create a report in KNIME, I have created a simple workflow for library enumeration. This workflow involves the enumeration of chemical structures based on two lists of reactants and a reaction schema.

Why did I choose a chemical library enumeration protocol? Simply, because enumeration is often used at the initial stages of drug discovery projects, as it enables professionals to generate – and then screen – libraries of potential hits. This ultimately minimizes the costs and time associated with drug development and makes tools like the one presented here essential to the scientists looking to accelerate their projects.  

Now just brace yourself and follow me while we dive into the KNIME workflow (Figure 1).  

Figure 1. General overview of the whole workflow. 


General Disclaimer 

The workflow presented here is optimized to run both as a standalone workflow on your KNIME Analytical Platform (some folks might call it “KNIME Desktop”) as well as from the Web Portal of the KNIME Server. The Web Portal is the web interface of the enterprise solution KNIME offers, which allows users to interact with the workflow(s) directly using their internet browser. Throughout the article I will show you both sides, what the user will see from the Web Portal, as well as what is under the hood. Just a small reminder: when a workflow is accessed from the Web Portal, every browser page corresponds to a component in the workflow. 


Step 1: Importing Reactant’s Molecular Data 

First things first, one needs to ask the user to provide data. But what, and how? 

Let’s start from the “what”. As we aim to enumerate libraries, we need molecules and reactions. In this first step, we will allow the user to import molecules – the reactants – which are the building blocks (A and B) in the reaction that the workflow will perform (A + B --> C). 

Figure 2 Input selection in the workflow. 

Moving on with the “how”. For this I gave the user two possibilities. Based on their preference in the selection widget, the flow is directed (Figure 2 – Left red rectangle) either towards a component configured to upload two SDF files containing one or more reactant molecules (Figure 2 – Lower part of the Middle orange rectangle), or towards a component with molecule sketchers (Figure 2 – Upper part of the Middle orange rectangle). In the first case, the Web Portal will show the user a page with an upload function (Figure 3A), in the latter, two sketchers’ interfaces (one per reactant A and B) are proposed to the user so that they can copy-paste their molecules (I.e. from a SMILES string they have) or to draw them on the spot (Figure 3B). 

Figure 3 Graphical interface for data input. A) SDF file loading page and B) Skechers input page. 

Step 2: Provide a Reaction 

Independently of the previous choice, we are now ready to define the reaction we want to perform.  The user can use a selection widget to direct the flow either towards a component that allows to upload an RNX file, or towards a component with a sketcher where the user will be able to draw the wanted reaction on the spot. It is important to notice that, for the program to work correctly, atoms must be mapped, and aromaticity defined if necessary (Figure 4).  

Figure 4 Input the wanted reaction. 

Step 3: Library Enumeration 

At this point, we are ready to enumerate our library using “RDKIT two Component Reaction” node (Figure 5).  

Figure 5 Enumeration. 

The workflow checks if the “RDKIT two Component Reaction” produced an empty table with “Empty Table Switch”. If this is the case, an error message is created to inform the user. Otherwise, the generated products are then redrawn using “RDKit Generate Coords” node, since the enumeration keeps the 2-D coordinates of the inputs (Figure 5). 

Step 4: Calculate Molecular Properties  

I have added a metanode to calculate some chemical properties of the reaction products such as: Molecular Formula, Molecular Weight, LogP prediction, the number of H bond donors and acceptors for the enumerated structures (Figure 6). These properties are calculated to generate a profile for the molecules, so that they are nicely described when added to the BIRT report. 

Figure 6 Calculate molecular properties. 

Step 5: Prepare Data for Reporting 

Once all the data is generated, we can start building up the input for our reporting. To exploit the data in the reporting, we need to use the “Image to Report” and “Data to Report” nodes in association with a “Renderer to Image” and chart nodes.  For that I have prepared the following inputs:  

Figure 7 Prepare data for the reporting. 

  1. Renderer to Image 1: I converted the reactant SDF molecules to an Image using INDIGO Renderer. These Images are sent to the report using the “Data to Report” node and then used to populate the first slide of the report (Figure 7 point 1). 

  2. Render to Image 2: I converted Product molecules and send them with all the calculated properties to the report using the “Data to Report” node (Figure 7 point 2). 

  3. Image to Report: The last two images to prepare are i) a BoxPlot summarizing the LogP, the number of H bond donors and acceptors for the enumerated structures and ii) a Barchart showing the molecular weight of each product. These plots are sent to the reporting interface using the node “Image to report” (Figure 7 point 3). 

Tip: When setting the “Image to Report” and “Data to Report” nodes, remember to provide meaningful naming because they will be used in the report editor interface to identify each Dataset. 

The report editor interface can be opened clicking on the “data analysis” icon (see adjacent figure) in KNIME toolbar and looks like what is shown in Figure 8. 

In the upper right part of the interface (Figure 8 area 1), we have the pages that you can start populating with the data provided by the workflow. These are shown in the Dataset View tab (Figure 8 area 2). One can use other elements such as Labels, Grids, Pictures and more, all that can be found in the Report Item Tab (Figure 8 area 3). 

To add an element, it is sufficient to drag and drop it on the pages. To customize the appearance of an element, we can then use the Property Editor Report Tab (Figure 8 area 4) which contains all the customizable parameters to modify the appearance of a Report Item, i.e. Size, Font, Colour etc. 

Tip 2: When creating a report, I like to use a Grid that will contain all the elements that are included in one page, this way you have better control over page organization, and it is easier for the report creator to imagine the final layout. 

Figure 8 BIRT editor interface. 

When running this protocol locally, the report can be generated from report editor interface clicking on the “player” icon (see adjacent figure) and select the desired formats (Figure 9).  

Figure 9 BIRT - Available formats. 

From the Web Portal, the report will be automatically shown at the end of the workflow execution. It is also possible to send it automatically as an email attachment, increasing efficiency as well as communication within the teams.  

The result of this workflow can be found here: Download the exported presentation (in .pdf)

Conclusion 

In this blog article, we showcase an example of automated report built on top of a workflow for the enumeration of chemical libraries, an essential first step in the procedure to identify active molecules in drug discovery research projects. The results of the enumerations were used as an input for the automatic creation of a PowerPoint presentation. This workflow is just a minimal working example, and it is now up to developers (potentially you) to implement new features. This solution provides seamless workflows, data-driven reporting and end-to-end analytics solution while maintaining the businesses flexibility of the report customization. The time previously spent in copy-pasting and screenshotting can now be dedicated to deriving actionable insights from the data, driving better decision-making and outcomes, all while increasing effective communication to the stakeholders. 


If you are working on a KNIME related project and you are seeking professional assistance, feel free to contact Discngine