Datasets


Key FRED data

Federal Reserve Economic Data is a great resource for all government datasets. Some key FRED series:

Census

The FactFinder is dead. Explore Census data instead. Other Census data:

  • Small Area Income and Poverty Estimates (SAIPE)
  • Business Dynamics Statistics – BDS provides annual measures of business dynamics (such as job creation and destruction, establishment births and deaths, and firm startups and shutdowns) for the economy and aggregated by establishment and firm characteristics.
  • Nonemployer statistics – NES is an annual series that provides subnational economic data for businesses that have no paid employees and are subject to federal income tax. This series includes the number of businesses and total receipts by industry.
  • Economic Census – The economic census serves as the foundation for the measurement of U.S. businesses, including the Island Areas, and their economic impact.
  • A complete list of the Census' economic surveys – The U.S. Census Bureau business surveys and censuses measure the pulse of the U.S. economy, businesses, and governments. They provide data for businesses in the economic sectors such as manufacturing, construction, retail trade, health care, and services industries, as well as for state and local governments, and on imports and exports.
  • Computer and Internet Use – In recent decades, computer usage and Internet access has become increasingly important for gathering information, looking for jobs, and participation in a changing world economy. See also: The NTIA’s Digital Nation Data Explorer which includes raw data for the Internet and Computer Use Survey
  • Code Lists, Definitions, and Accuracy – View the detailed codes and definitions for variables, statistical testing, and an explanation of sample design, methodology, and accuracy for the American Community Survey.
  • Surveys & Programs – The U.S. Census Bureau conducts more than 130 surveys and programs each year. This is a list of all of them.
  • Are Millennials making less than their parents? When you check recent Census data, you get a completely different interpretation. Table P-10 located here lays out median/mean income for those 25 to 34 going back to 1974, adjusted to constant 2018 dollars. It starts at line 104. Quite clearly those aged 25-34 in 2018 are doing better than any group in the 1980s since the median income of \$37,133 is higher than the 1980s peak of \$33,356.
  • The National Survey of Children’s Health (NSCH) examines the physical and emotional health of children ages 0-17 years of age. Special emphasis is placed on factors related to the well-being of children. These factors include access to - and quality of - health care, family interactions, parental health, neighborhood characteristics, as well as school and after-school experiences.
  • Old Census data - HathiTrust has a lot of old Census data available.

The Bureau of Labor Statistics

Bureau of Economic Analysis

Other government data

  • Data.gov – The home of the U.S. Government’s open data.
  • Rural Urban Continuum Codes – The Rural-Urban Continuum Codes form a classification scheme that distinguishes metropolitan counties by the population size of their metro area, and nonmetropolitan counties by degree of urbanization and adjacency to a metro area.
  • The Rural Atlas – If you need to do analysis of counties, the Rural Atlas is your go to. It is the most up-to-date document pulling together population and economic data from the 5 year ACS.
  • O*NET OnLine – ONET contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated from input by a broad range of workers in each occupation.
  • USA Spending – USAspending.gov is the official source for spending data for the U.S. Government.
  • About Underlying Cause of Death, 1999-2019
  • A list of construction indices
  • Interagency Data Inventory – The Interagency Data Inventory is a product of the Data Committee of the Financial Stability Oversight Council (FSOC). The inventory catalogs the data collected by FSOC member organizations. The inventory contains information — metadata — about each data collection. It does not contain the underlying datasets. For each data collection, the inventory has basic information, such as a brief description of the collection, collecting organization, and the name and number of the form used to collect the data.
  • The Private School Survey produces data similar to that of the NCES Common Core of Data (CCD) for the public schools. The data are useful for a variety of policy- and research-relevant issues, such as the growth of religiously-affiliated schools, the length of the school year, the number of private high school graduates, and the number of private school students and teachers.
  • The primary purpose of the Common Core of Data (CCD) is to provide basic information on public elementary and secondary schools, local education agencies (LEAs), and state education agencies (SEAs) for each state, the District of Columbia, and the outlying territories with a U.S. relationship.
  • Total factor productivity – From Eli Dourado: “Total factor productivity captures how much output can be produced with a diverse but fixed basket of inputs. As technology and institutions improve, TFP goes up. As they deteriorate, it goes down. In the last decade, TFP has deeply stagnated. While there are numerous estimates of total factor productivity in the US, only the series maintained by the Federal Reserve Bank of San Francisco is quarterly and attempts to adjust for the business cycle. Since this series is not available in FRED, I am making it available here in graphical form.”
  • The Atlanta Fed’s Wage Growth Tracker

State and local government data

  • Doing Business North America – The Doing Business North America (DBNA) project annually provides objective measures of the scale and scope of business regulations in 134 cities across 92 states, provinces, and federal districts of the U.S., Canada, and Mexico. It uses these measures to score and rank cities in regard to how easy or difficult it is to set up, operate, and shut down a business.
  • Correlates of State Policy | IPPSR – The Correlates of State Policy Project aims to compile, disseminate, and encourage the use of data relevant to U.S. state policy research, tracking policy differences across the 50 states and changes over time. We have gathered more than 900 variables from various sources and assembled them into one large, useful dataset. We hope this project will become a “one-stop shop” for academics, policy analysts, students, and researchers looking for variables germane to the study of state policies and politics. R package Shiny App
  • The Fiscally Standardized Cities (FiSC) database allows users to create a custom table with fiscal information for selected cities. To create a table, select one or more cities, one or more years, and one or more fiscal variables. The default display options can be also adjusted, and users can choose whether to display data for FiSCs and/or one of the component governments (Cities, counties, school districts, and special districts).
  • State expediture report – This annual report examines spending in the functional areas of state budgets: elementary and secondary education, higher education, public assistance, Medicaid, corrections, transportation, and all other. It also includes data on capital spending by program area, as well as information on general fund and transportation fund revenue collections.
  • Annual Survey of State and Local Government Finances is the only source of nationwide, comprehensive local government finance information. It provides statistics on revenue, expenditure, debt, and assets for the 50 states and D.C.

Compendiums of data

  • Historical data library
  • asdfree – An archive of the analyze survey data for free website, that provides step-by-step instructions to analyze major public-use survey data sets from the website that’s easy to type
  • Cool Datasets – As they used to say, “A place to find cool datasets.” From Archive.org.
  • Awesome Public Datasets This list of topic-centric public data sources in high quality. They are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found in sindresorhus’s awesome list.
  • I³ Open Innovation Dataset Index – This is the web version of the I³ Open Dataset Index, a collection of innovation datasets, and related tools, platforms and resources used by the broader research community.
  • IPUMS – IPUMS provides census and survey data from around the world integrated across time and space. IPUMS integration and documentation makes it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community contexts.
  • Eurostat – Eurostat is the statistical office of the European Union.
  • NORC at U of Chicago – NORC experts conduct research in a wide range of subjects, bringing insight to topics including education, economics, global development, health, and public affairs. NORC is a solid alternative to Census and BLS surveys.
  • Data USA – Maintained by Deloitte and Datawheel, DataUSA has a lot of databases. I use this site to grab data about cities and states.
  • Damodaran’s finacial datasets | industry cap ex, risk premiums, etc – Aswath Damodaran is the GOAT: “Since I teach valuation and corporate finance, I am constantly collecting and analyzing data, and I have found that the data, once analyzed, can be used multiple times. Since I already have the processed data, I could not see any harm from sharing that data with others, thus saving us all some collective time, which we can spend far more productively not just on valuation but also with family and friends.”

Telecom and tech datasets

Other datasets

  • Vertical Farming - [link]
  • Twitterstream from Archive. A simple collection of JSON grabbed from the general twitter stream, for the purposes of research, history, testing and memory. This is the “Spritzer” version, the most light and shallow of Twitter grabs. Unfortunately, we do not currently have access to the Sprinkler or Garden Hose versions of the stream. [Archive]
  • The California Forest Observatory is a data-driven forest monitoring system that maps wildfire hazard drivers across California, including forest structure, weather, topography, and infrastructure.
  • OpenStreetMap provides a broad range of map data maintained by a worldwide community of geographers and cartographers.
  • The Registry of Open Data on AWS has empowered laboratories, research institutions, and various other organizations to deliver open datasets to developers, startups, and enterprises worldwide since its launch in 2018.
  • Nasa Earth Observations offer climate and environmental data for the globe. You can browse and download the satellite data from NASA’s constellation of Earth Observing System satellites. Over 50 different global datasets are represented with daily, weekly, and monthly images available in various formats.
  • A Google BigQuery public dataset is any dataset made available to the general public through the Google Cloud Public Dataset Program.
  • Koordinates is an emerging geospatial data management platform where you can host, manage, share, publish, and access geodata.
  • Natural Earth is a collection of public domain map datasets available in vector or raster formats and various scales.
  • Safegraph offers some open census data and neighborhood demographics.
  • The Canadian government has its own Open Data Portal.
  • An open dataset of electric vehicles and their specs.
  • Open Zone Map - the largest dataset and only interactive map of Special Economic Zones.