(Open|Free) Data Set

About

A page about data set

A data set is generally available through:

one or several files
and a optional metadata describing the content of the files

Syntax

See:

Metadata format

On the web:
- Schema.org:
  - https://schema.org/Dataset
  - https://schema.org/DataCatalog - A collection of datasets.
- Google Search:
  - https://developers.google.com/search/docs/data-types/dataset
  - In order to power Data Set Search

Search|Repository

nationaalgeoregister.nl (atom)
UCI Machine Learning Repository - 327 data sets as a service to the machine learning community
http://ropengov.github.io/eurostat/ Eurostat as a R package
https://registry.opendata.aws/ - Registry of Open Data on AWS
https://datahub.io/ - https://github.com/datasets in a data package format
Google Cloud Public Datasets

Nederlands Directory

het open dataportaal van de Nederlandse overheid

Register	Description	Data Set
Personen	BSN en Gemeentelijke Basisadministratie Persoonsgegevens (GBA)
Auto's	Kentekens en Kentekenregister
Bedrijven	KVK nummer en Handelregister	kvk.nl, rechtspraak.nl, faillissementenregister (faillissementen.com) en het ANBI-register (Belastingdienst)

Kadaster Open Data

Adresse en gebouwen

BAG: Basisregistraties Adressen en Gebouwen (BAG)

De BAG Leveringsbestanden (totaal plm 1.2 GB .zip) worden iedere maand ververst here: Bag Data Set (1.2 Gb)
BAG Layers download link via the nationaalgeoregister catalogus
Voor testen: BAG Amstelveen (5.6 MB)

CBS data

CBS data:

Provincie
Vierkantstatistieken 100m : RestConnection
Vierkantstatistieken 500m
Wijken en Buurten

Bedrijven

D-U-N-S staat voor Data Universal Numbering System, en is ontwikkeld door Dun & Bradstreet. Het gaat om een uniek negencijferig identificatienummer, gegeven aan meer dan 204 miljoen bedrijven over de hele wereld. Intussen is het wereldwijd een standaard geworden, onder meer gebruikt door de Europese Commissie, de Verenigde Naties en in de Verenigde Staten.

http://opencorporates.com/

List

mtcars

mtcars - The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

nycflights13

install.packages(c("nycflights13"))

Set	Description
airlines	Airline names.
airports	Airport metadata
flights	Flights data
planes	Plane metadata.
weather	Hourly weather data

MAAS

library(MASS)
?Boston

The Boston dataset contains 506 rows and 14 columns. Available information includes median home price, average number of rooms per dwelling, crime rate by town, etc. More information about this dataset can be found by typing ?Boston or help(Boston) in an R terminal, or at this UCI page.

MovieLens User Ratings

data files from MovieLens 100k on the GroupLens datasets page (which also has a README.txt file and index of unzipped files):

wget http://files.grouplens.org/datasets/movielens/ml-100k.zip
#or
curl --remote-name http://files.grouplens.org/datasets/movielens/ml-100k.zip

DataSet used in Hive