Chapter 3 KNIME

KNIME is a comprehensive tool for analytics and data mining using an intuitive drag and drop workflow. While KNIME can be (and is) used by professional data analysts, it is an excellent “low code” platform for learning predictive analytics and data mining. KNIME features a drag and drop interface which provides a graphic representation of the steps taken in an analysis. The workflow in this sense is self-documenting, so it is easy to see how to reproduce your analyses.

KNIME was developed by a group at the University of Konstanz in Germany beginning in 2004 and the first release was in 2006. As noted on its web site, KNIME is committed to open source:

“Unlike other open-source products, KNIME Analytics Platform is not a cut-down version and there are no artificial limitations on execution environment or data size: If you have enough local or cloud-based space and compute power, you can run projects with billions of rows, as many KNIME users currently do.” ref: https://www.knime.com/knime-open-source-story

There is a commercial server version of KNIME which is needed for deploying KNIME on the web, but this is not needed for learning and everything else is available with the open-source version.

There are several features which make KNIME stand out from its many competitors:

  1. KNIME is free to use on your machine.
  2. KNIME runs on Windows, Mac, and Linux machines.
  3. KNIME has over 4,000 nodes for data source connections, transformations, machine learning, and visualization.
  4. While KNIME includes a broad array of data processing and analysis capabilities, it is fully extensible by creating custom nodes using Python or R.
  5. Many of the capabilities of H2O and WEKA also integrated and work seamlessly in the drag and drop workflow.
  6. A variety of data file types can be used including csv, Excel, and, using an easily installed extension, databases.
  7. Data can be exported to Excel, Tableau, Spotfire, Power BI, and other reporting platforms.
  8. There is a large active community of users that can answer questions and provide help.
  9. Many ready-to-use workflows are available which can be easily installed in your own work environment by dragging and dropping from the KNIME site.
  10. Extensive documentation, learning modules, videos, and training events are available.

Learning to use KNIME takes some time, but extensive, free written and video resources are available. A “Getting Started Guide” is available at Getting started with KNIME

A series of self-paced courses are online and free. The courses include exercises and solutions. KNIME Self-Paced Courses There are four levels of self-paced courses:

  • Level 1 courses
    • KNIME Analytics Platform for Data Scientists: Basics
    • KNIME Analytics Platform for Data Wranglers: Basics
  • Level 2 courses
    • KNIME Analytics Platform for Data Scientists: Advanced
    • KNIME Analytics Platform for Data Wranglers: Advanced
  • Level 3 course
    • KNIME Server Course: Productionizing and Collaboration
  • Level 4 courses
    • Introduction to Big Data with KNIME Analytics Platform
    • Introduction to Text Processing

This is a link to all the documentation which can be read or downloaded. KNIME Documentation

Individual documents for specific topics are available as follows:

A free “bootcamp” for KNIME is available from UDEMY with 50 video instructional lectures running over four hours. The course starts with installation and setup and proceeds to demonstrate practical applications in machine learning. Bootcamp for KNIME Analytics Platform