Machine Learning - Data Mining (Software, Library and Framework)


This sections contains software library or framework that contains the implementation of machine learning algorithm. See data_mining

Data science can't be point and click

List of tools, software for data miner, machine learner.

See also: Natural Language - (Processing|Processor) (NLP)



  • H2o - Open Source Fast Scalable Machine Learning Platform For Smarter Applications (Deep Learning, Gradient Boosting, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles, Automatic Machine Learning (AutoML), …) - written in Java binding for R , Python


  • Matlab was built for matrix calculations (linear algebra).
  • The R language is meant for statistics.
  • Python are good general purpose languages

But they don’t run as quickly as languages like C and Java


Python is an incredible open source ecosystem. Package:

  • Machine learning - Python scikit-learn (Machine learning)
  • Numpy (for maths and arrays),
  • SciPy (Scientific Tools) SciPy provides a lot of scientific routines that work on top of NumPy
  • Pandas (Python Data Analysis Library) - handy to manipulate financial data
  • matplotlib enable to plot graphics
  • Keras: The Python Deep Learning library




R Nice interactive data analysis tool through things like RStudio.

R - Machine Learning Package and Method (Views)


Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily. If this sounds like Theano or TensorFlow, it's because the idea is quite similar. Specifically, the library is pretty low-level, like Theano, but has higher goals like Tensorflow.




  • MATLAB/Octave
  • Julia: New language
  • KNIME: KNIME [naim] is an opensource workbench for the entire analysis process
  • Rapid Miner


  • jq is a lightweight and flexible command-line JSON processor
  • scrape (Python) (HTML extraction using XPath or CSS selectors),
  • xml2json Command that converts an XML input to a JSON output, using xml-mapping npm module


  • Uber Michelangelo consists of a mix of open source systems and components built in-house. The primary open sourced components used are HDFS, Spark, Samza, Cassandra, MLLib, XGBoost, and TensorFlow.

Documentation / Reference

Powered by ComboStrap