Visualize your chemical space by connecting KNIME and TIBCO Spotfire®: Introduction to the SWAK

I recently joined Discngine, and as any newcomer, I'm getting familiar with the different technologies used here by working on small projects. In this article, I will tell you the story of my encounter with the Spotfire Web Application for KNIME (a.k.a. SWAK), a Discngine-built connector between two widely used tools in data analysis: KNIME and TIBCO Spotfire®. The purpose was to visualize my dataset’s chemical space, something every cheminformatician does (in some form or another) at least once in their lifetime. To do so, I first built a KNIME workflow to project my dataset onto the chemical space before visualizing it efficiently with TIBCO Spotfire® through the SWAK.  

Introduction 

First, let me introduce to you KNIME and TIBCO Spotfire® in case you haven’t heard of those tools. 

KNIME Analytics Platform is an open-source software used for data science. In a very intuitive fashion, you build workflows as linked "nodes" that represent tasks, either built-in ones such as filtering, sorting, merging, and so on, or ones you create that you can then share with an ever-growing community. Additionally, you can get access to KNIME Server to enhance team-based collaboration and deploy your work organization-wide. KNIME Server also allows you to execute long-running jobs remotely and access their results later, without impacting your local resources. 

TIBCO Spotfire® is a powerful analytics solution optimized for visualization. It has an interactive dashboard and enables real time analysis. Its intuitive usage is perfect for non-developers. 

Finally, both tools can be used jointly with the SWAK. With this arrangement, it is possible to engage any KNIME Workflow previously built and deployed in your KNIME Server, with the results visualized in TIBCO Spotfire®. 

Data preparation with KNIME 

Chemical space is a key concept in cheminformatic fields, such as drug discovery. As more and more available compounds are stored in datasets, one can investigate how they compare to each other in the chemical space. By visualizing the projection of a dataset onto the chemical space, one can analyze the diversity of their data, perform an in-silico property analysis of the compounds and perform many other operations. 

In a typical graphical representation of a dataset in the chemical space, the closer the molecules, the more similar they are. One way to compute similarity between molecules is comparing their fingerprints. But fingerprints are high dimensional objects. As low dimensional spaces, such as 2D or 3D, are necessary for us humans to visualize what is going on, one needs to reduce the number of dimensions. The most common method used for dimensional reduction is principal component analysis (PCA). PCA combines sets of correlated variables in higher dimensional space in order to produce a collection of variables in a lower-dimensional space. 

The first step of my project was to build a workflow with KNIME Analytics Platform, to perform the following operations: 

  • Read my dataset: the Malaria Dataset of the L1 Course for Life Science by KNIME (https://www.knime.com/courses

  • Enable input parameters. 

  • Add Discngine Connector Nodes to prepare the visualization in TIBCO Spotfire®. 

KNIME Workflow developed to visualize the chemical space of the Malaria Dataset with the Discngine SWAK.

KNIME Workflow developed to visualize the chemical space of the Malaria Dataset with the Discngine SWAK.

Let’s have a detailed look at the data manipulation part of my workflow. I start by computing the MACCS molecular fingerprints of each compound in my set with an RDKit node. Then I reduce the number of dimensions using PCA to keep only 2 for visualization purposes. Because we want a 2D representation of the molecules to be rendered in TIBCO Spotfire®, I also preprocessed the images and encoded them in base64.

Expended view of the Compute fingerprint and PCA metanode.

Expended view of the Compute fingerprint and PCA metanode.

Once the workflow is done, you simply need to deploy it on your KNIME Server. 

Visualization with TIBCO Spotfire® 

You can find a video of the whole process at the end of this post. 

Usage of the SWAK is easy and very straightforward. Let’s dive into the process step by step: 

The SWAK interface is divided in two parts. On the left side you have the interface which connects you to KNIME’s REST API, and on the right side, a classical Spotfire interface. 

SWAK interface with the connector on the left, and TIBCO Spotfire® interface on the right.

SWAK interface with the connector on the left, and TIBCO Spotfire® interface on the right.

You need to add the workflow you deployed on your KNIME Server in this interface to have access to it in the menu. After that, save the settings.

SWAK-demo-gif2.gif

Once you have selected the workflow of your choice, you can specify the required input parameters and submit the job. Next, it will run on your KNIME Server.

The input parameters were added in the connector interface, and the job is now running.

The input parameters were added in the connector interface, and the job is now running.

After the job is fully executed on the server, the resulting visualization will appear on the Spotfire interface, and you will have access to all of TIBCO Spotfire®’s functionalities.

Visualization of Malaria Dataset within the chemical space.

Visualization of Malaria Dataset within the chemical space.

You can change the column on which the coloring is based, the color code, the size of the marker, the column represented on the x or y axis and so on. Depending on how you preprocess your data in your KNIME workflow, you can compute any molecular property and use it for coloring purpose in the SWAK. Of course, every visualization of the Spotfire page is interactive and dynamic. Thus, when you select a molecule on the table, it will be highlighted on the scatterplot. I can also decide (as shown in the image) that my tooltips will show the molecule ID and its 2D representation.  

The limit is your imagination. And with so many possibilities, you can efficiently travel through your chemical space searching for the information you are looking for. 

Visualization of Malaria Dataset within the chemical space after some editing of the visualization.

Visualization of Malaria Dataset within the chemical space after some editing of the visualization.

If simply changing the coloring and sizing of our initial result is interesting, we could also, as suggested at the beginning of this post, use visualization personalization in order to compare how two different datasets are projected on the same chart.

Visualization of Malaria Dataset (red) in comparison to Leishmaniasis Dataset (blue). Drugs targeting two different parasitic diseases do not necessarily cover the same part of the chemical space.

Visualization of Malaria Dataset (red) in comparison to Leishmaniasis Dataset (blue). Drugs targeting two different parasitic diseases do not necessarily cover the same part of the chemical space.

One last word before the end 

Of course, chemical space visualization is just a small example of what you can accomplish. You could also run much more complex workflows combining the flexibility of KNIME and the visualization power of TIBCO Spotfire®. Such workflows can belong to any field not only cheminformatics, so long as you have the data. 

If you want to perform a similar job combining KNIME and TIBCO Spotfire® here are the prerequisites: 

  • KNIME Analytics Platform version 4.1 or later 

  • KNIME server 

  • SWAK Discngine Connector 

  • TIBCO Spotfire® version 10.3 or later 

I don't know about you, but I can't wait to see what my data looks like! 

 

For additional information on the SWAK connector please visit this KNIME blog post, and feel free to contact us to try it out. 

Moreover, as Discngine is an official KNIME Partner and an official TIBCO Spotfire Partner, don’t hesitate to contact us in case you want to get more info on these products and on related services Discngine could offer you.