BIOVIA Pipeline Pilot in Docker – Good or Bad Idea? Discngine’s journey of containerizing Pipeline Pilot in Docker

Discngine is developing software solutions for Research organizations since 2004 and now counts more than 70 customers worldwide. Our developers use BIOVIA Pipeline Pilot (PP) almost every day to develop tailor-made solutions as well as SaaS offerings. PP is a very appreciated scientific backend for the speed it brings to development activities,  especially when developing Minimum Viable Products (MVPs) or Proof Of Concepts (POCs). Pipeline Pilot allows Discngine Developers to build these in a matter of days instead of weeks. In R&D and especially in the pharmaceutical industry, speed is extremely valued, it is the key to getting ahead of the competition. 

Along with this clear advantage of using BIOVIA’s solution, there was a big downside: repeatability. 

With our growth, we started seeing this downside more clearly. Because of the low repeatability, every installation or upgrade of all pipeline pilot servers had to be done manually, a very time-consuming task, prone to human errors.

From Developer to DevOps 

To solve this challenge, Discngine had to evolve from a “developer’s mindset” and adopt a “DevOps” approach.

Exploring different opportunities, we could have used a configuration management tool or a native cloud configuration, but we retained the option of using containers for their versatility and flexibility while maintaining backups and auditability.  

In this article, we take a look back and explain how containerizing BIOVIA Pipeline Pilot in Docker allowed us to transform our deployment strategy and develop our new Discngine Cloud Infrastructure.

We will tell the story of our journey through containerization, from building our first images to performing non-regression tests and release into production.

Since we implemented this DevOps strategy in 2017, Discngine launched ~ 2000 BIOVIA Pipeline Pilot containers, ran 7 docker – BIOVIA PP versions, and executed ~ 5000 PP Collections Commits.

The important milestones  

The concept of containerizing PP in docker is quite straightforward and simple. On the one hand, you can have the PP runtime inside a container, which brings upgradability and easy monitoring, and on the other hand, you keep all your data, logs, licences and XMLDB on external volumes which brings repeatability.

This is achieved in 2 simple steps: build your image in docker, pull it – download it – and then run it on your server.

 

From

To


Warning

1. Support: there is no official BIOVIA support for running Pipeline Pilot in Docker images (yet), so run it at your own risk.

2. Discngine does not provide Docker images, only Dockerfiles; the PP code source is the property of BIOVIA and can only be distributed by BIOVIA.


We started our journey toward containerization in 2017.  It took us approximately 9 months, but it could be done quicker (we believe it can be done in a matter of weeks).

Our journey to a complete “migration” followed 5 stages:

  1. First image: it took half a day for a skilled Docker developer to create the Pipeline Pilot Dockerfile. Adjusting the build recipes to PP can take a bit more time and observation, but this is already covered by the Dockerfile we provide.

  2. Collections development with Git: using Source Control Versioning (SVC) during development is a common best practice now. Using git to version your PP protocols and components should be no exception. That requires 2 things:

    • Being able to push to a git repository your collections modifications easily. This can be achieved with a dedicated protocol that runs git commit and git push commands for you.

    • Training your developer to manage merging conflicts on large (several MB) files. You will need good git hygiene if you don’t want to get into trouble and end up with non-functional components/protocols due to badly resolved merged conflicts.

  3. Dedicated development containers: this is a big advantage of running PP into containers. Developers can safely test/break whatever they want without suffering any consequences. You can easily automate the run of Docker containers for all your developers and manage the lifecycle of the containers with a script (choose the ports, choose the container names, etc)

  4. Non-regression test: you can easily run your non-regressions tests on several PP containers that run several PP versions. This can be done in a single docker-compose file which is executed by a single command line.

  5. Use in production: although a few restrictions exist, we have been running PP in production without major issues for years now.



While working on this project, four advantages appeared instantly.

  • First, the possibility to build containers with different collections and versions of PP.

  • Second, the configuration is automatic in the image (some commands exist to change the configuration of data sources, and you can use different entry points. See Blog article).

  • Third, provisioning is fast. In a matter of minutes, you can create your image and have it running on a server. (It is faster than creating a Pipeline Pilot server every time.)

  • Finally, using containers is OS-agnostic; it works on Linux or Windows. However, we haven't built nor tested the Dockerfile for Microsoft Windows; and remember, Linux hosts can usually only run Linux-based containers and vice-versa.

The outcomes of using Pipeline Pilot in Docker 

Despite some compliance drawbacks regarding Docker’s best practices, running BIOVIA Pipeline Pilot in Docker containers also brings a lot of advantages to the development team.

Nevertheless, keep in mind that using Pipeline Pilot in Docker remains a non-supported operation!

It is not necessarily compliant with your company's Docker development best practices:

Images are heavy

Depending on your development use cases, you will require the installation of many PP collections, therefore the produced images are heavy (approximately 5GB).  Consequently, it won’t fit into systems that have “cold start” functionalities like Azure Serverless, Fn... If the control planes need to redownload the image each time a developer asks for a container, we usually consider it too slow. Using Kubernetes is also something we don’t advise: a lot of small issues can arise from the heaviness of the image. Kubernetes was not designed to run with heavy monolithic “microservices” – although it will work-ish… In the end, scaling might not be efficient (because of the size of the images), and it is still longer than running standard HPC jobs on a dedicated server.

Not suitable for microservices ecosystem

Pipeline Pilot is designed to be run on a server, and therefore, it uses the host RAM and CPU, and if it reaches the RAM or CPU limits (not the limits of Docker), you encounter a hard crash. When using Docker (or any other container orchestrator), the PP server will still take the host CPU and RAM limits as a reference although docker will give him less. This was the main cause of Pipeline Pilot crashes at the beginning of our PP in Docker usage.

One could say that running Pipeline Pilot in Docker doesn’t fit with the microservices ecosystem.

Nevertheless, it enables running PP as a non-root user. Containers are ephemeral, and they allow the creation of multi-stage builds, which is a tremendous time-saving feature.  

Versatility

The overall technical compliance to the best practices of using docker is quite poor, but you still get a lot of advantages.

The biggest one is versatility.

You can use as many versions of PP as you want, running them in parallel, and even creating as many servers as you need – within the limits of physical resources and network availability. Moreover, PP containers can run either on Laptops or VMs, which brings a lot of flexibility into the development.

Flexibility

This practice can also be production compliant if the workloads are not too big and if they don’t require too many resources.

Along with rapid provisioning, it allows the DevOps team to adjust the shapes of the service that are running in dockers, which lowers the costs and adjust the performance of the platform depending on need.

Finally, it reduces the sysadmin tasks a lot and all the costs associated with them.

The lessons learned along the way

The first lesson learned is more of a disclaimer: containerizing Pipeline Pilot is not for everyone! First of all, this practice is not officially supported by BIOVIA. Secondly, technical compliance with the best practices of Docker is poor. Therefore, before initiating a similar project, you need to ask yourself, are the expected benefits worth going in that direction?

For Discngine, the advantages of conducting this journey clearly outweighed the drawbacks.

The biggest values for us are:

  1. The versatility of being able to run any version of PP.

  2. The flexibility of deployment (multistage build & OS-agnostic), and provisioning.

  3. It is production compliant (for smaller workloads)

  4. It reduces Sysadmin tasks (and costs)

Finally, it allowed Discngine to develop a new deployment solution for our customers. The Discngine Cloud Infrastructure, which is a Cloud platform managed and maintained by Discngine to host at scale all research software applications and to benefit from the most advanced capabilities of the Cloud.

If you want to learn more about the Discngine Cloud Infrastructure, visit the dedicated webpage or contact us.

Similarly, if you want more detailed explanations or guidance about the protocol we used, feel free to reach out!

 

Credit image: Flaticon.com