Easy Ways to Get and Create Datasets
Data Analytics is very much about data, and you need to get them from somewhere. Your organization’s systems are full of them, but for comparison and for telling a bigger story, external or self-made datasets may help.
This page will be updated frequently
Datasets and Tools for Capturing and Generating Data
Publicly available datasets, and methods, tools, and techniques for creating your own sets.

Datasets
Many datasets are available for data analysts, and they cover all kinds of topics. The best thing is that almost everything is free to download and use.
Hundreds of thousands of available datasets can definitely help you find and show valuable insights, also for comparing/benchmarking with your own company’s internal data.
The links are sorted after their access structure, showing which datasets are available for free, and which require some kind of payment or permission. And no matter if the data are free or not, there might be some kind of restriction on their use, if only to use a proper citation, so check each site and dataset carefully when downloading.

Active, Updated Data Sources
Providers of current data for research, benchmarking, or other real life purposes in business or research contexts.

Free
- Data at WHO
Public data from the World Health Organization - Datahub
Curated collection of thousands of free datasets - European Data
Public data from the European Union - Google Dataset Search
Will find datasets from many different sites - NASA Earthdata
Public data - National Bureau of Statistics of China
Public data, mostly in PDF format - OCHR Humanitarian Data Exchange (HDX)
Public data from UN organisation for global emergency response - Open Government
Public data from the government of Canada - Our World in Data
“Research and data to make progress against the worldโs largest problems” - StatBank Denmark by Statistics Denmark
Public data - Statistics Sweden Open data
Public data - Tableau Free Public Datasets for Analysis
A collection of links to sites with datasets - UN data
Public data from the United Nations - U.S. Government’s Open Data
Public data - World Bank Open Data
Public data

Requires payment or permission
- Brigh Data Dataset Marketplace
Scraped data from popular websites with products, reviews, etc. - Datarade
Online dataset marketplace, business-relevant categories - Techsalerator
Business datasets - UK Data Services
Public data from the UK, most are “safeguarded” and require a university login

Historical Data
Older datasets, or sets about historical topics. Useful mostly for comparisons over time, or for general research.

Free
- Awesome Public Datasets
Index of hundreds of categorized websites with datasets - BERD
Curated research data for researchers - BuzzFeedNews
Datasets used for their articles - FiveThirtyEight (ABC News)
Datasets used for their articles - Harvard Dataverse
Datasets by researchers from many universities, using the Dataverse platform - Leipzig Corpora Collection
Downloadable corpora and various tools for text analysis of a large selection of languages - Rdatasets
3485 datasets originally distributed with R and add-ons

Requires payment or permission
- Cesssda, Consortium of European Social Science Data Archive
Research archive, containing both restricted and free data and links to such data elsewhere - English-Corpora.org
Search front-end for several English language corpora, including COCA and COHA. It should be possible to download data as well

Demo and Educational
Datasets used for practicing data analytics, or for showing the features of tools. Usually not used for commercial applications.

Free
- Data Is Plural
Newsletter with interesting datasets - Gigasheet Sample Big Data Files
Online big data AI analytics app with public sample files - Kaggle
Datasets, courses, competitions, and community - Stanford Network Analysis Project
Datasets and tools for network analysis - Tableau Community Projects
Many streams of regular training challenges, with datasets - TidyTuesday
Weekly project by the Data Science Learning Community

Requires payment or permission
- Coming soon
Tools for Capturing and Generating Data
Even if there are hundreds of thousands of available datasets for download and use, you may need to work with a different set of data made specifically for your project.
Different tools, methods, and services, exist to help you generating your datasets. The most simple, of course, is to just key in some data in a spreadsheet or other list, but many other ways exist.

Copy, Extract, Scrape
Getting data from existing databases or other sources.

Copy
- Coming soon

Extract
- Coming soon

Scrape
- Data Miner
Chrome browser extension - Instant Data Scaper
Chrome browser extension - Scrapy
Open source Python web crawling and web scraping framework

Calculate, Create, Measure
Getting data from sensors, manual or automatic generation methods.

Calculate
- Coming soon

Create / Simulate
- Generatedata.com
Generate test data in various formats - Mockaroo
Generate test data in various formats - SAS DataMaker
Synthetic data generator

Measure and Sample
- balenaCloud
IoT cloud platform connected to an app marketplace, balenaHub, with apps suchs as balenaSense. The platform can be self-hosted through OpenBalena - ThingsBoard
Open-source IoT platform

Interview, Observe, Survey
Asking or monitoring people, animals, or the universe.

Interview
- Coming soon

Observe
- Coming soon

Survey
- Enalyzer Platform
Online survey platform with AI guidance - Google Forms
Surveys - Microsoft Forms
Surveys, polls, and quizzes - Qualtrix XM
Surveys taylored to customer experience, employee experience, or strategy and research
