Machine Learning - Data Mining (Software, Library and Framework)


This sections contains software library or framework that contains the implementation of machine learning algorithm. See Data Mining

Data science can't be point and click

List of tools, software for data miner, machine learner.

See also: Natural Language - (Processing|Processor) (NLP)


Data Mining Tool 2

Data Mining Tool

Data Mining Tool 3

Data Tools Oreilly Survey 2013


  • H2o - Open Source Fast Scalable Machine Learning Platform For Smarter Applications (Deep Learning, Gradient Boosting, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles, Automatic Machine Learning (AutoML), …) - written in Java binding for R , Python


  • Matlab was built for matrix calculations (linear algebra).
  • The R language is meant for statistics.
  • Python are good general purpose languages

But they don’t run as quickly as languages like C and Java


Python is an incredible open source ecosystem. Package:




R Nice interactive data analysis tool through things like RStudio.

R - Machine Learning Package and Method (Views)


Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily. If this sounds like Theano or TensorFlow, it's because the idea is quite similar. Specifically, the library is pretty low-level, like Theano, but has higher goals like Tensorflow.



Machine Learning Center


  • MATLAB/Octave
  • Julia: New language
  • KNIME: KNIME [naim] is an opensource workbench for the entire analysis process
  • Rapid Miner


  • jq is a lightweight and flexible command-line JSON processor
  • scrape (Python) (HTML extraction using XPath or CSS selectors),
  • xml2json Command that converts an XML input to a JSON output, using xml-mapping npm module


  • Uber Michelangelo consists of a mix of open source systems and components built in-house. The primary open sourced components used are HDFS, Spark, Samza, Cassandra, MLLib, XGBoost, and TensorFlow.

Documentation / Reference

Task Runner