The Dismal State of Federal Agency Data Dissemination
In the heart of the current data
science revolution; the state of data dissemination and aggregation among the Federal
Agencies remains disorganized, disparate, and dismal. It wasn’t always this
way, and doesn’t have to be this way. The
Federal Agencies can do more to make their data useful and accessible to the
tax-paying public.
The Origin of the Problem
The origin of the problem must have been a structural change in the
thinking and funding of the economics profession. The
economics profession must have gotten much more confident in their
understanding of the world, to view data collection and aggregation as no
longer a worthy exercise. This idea can be witnessed in the slow transformation
in the publications of the National Bureau of Economic Research, the Federal
Reserve Statistics Division, and the Department of Commerce.
As the official dating organization
of the U.S. business cycle, the NBER put a tremendous amount of effort into understanding
the cycle; this effort manifested in multi-year projects to collect a wide
swath of real economic and financial data. The trail blazers at the NBER
believed that with more data and longer histories they could come up with
stable patterns of peaks and troughs. It was believed that the patterns
and their lags and leads would help them to untangle causation from the
correlations.
Macro researchers have never
stopped seeking out new data to understand gaps in business cycle theory and
knowledge. But this data collection comes and goes. Series are
aggregated and then no attempt is made to update them.
For academics and researchers, the
starting point was annual U.S. data, then it became imperative to tease out
correlations which required the collection of higher frequency measures of the
business cycle. Now we have regressed backwards from more monthly data to
the use of low-frequency cross country data-sets (note Pickettys’ database on inequality;
and the Òscar Jordà, Moritz Schularick, and Alan
M. Taylor Global Macro database.)
While it was the imperative of the
NBER to collect and study business cycle data, it was never really their
academic responsibility to maintain it. This contrasts with another
organization, The League of Nations, which collected country specific financial
and economic data from 1919 through 1946. Much of the data that the
League had collected was later picked up by the IMF in the International Financial Statistics.
Getting back to domestic
organizations
It was also the will of the
Department of Commerce and the Bureau of Economic Analysis to compile U.S.
economic and financial data from the respective U.S. Agencies and publish
it. The master compilations were the Biennial
Supplement to the Survey of Current Business (SCB) which saw its last issue
in June 1992. The other was the Banking
and Monetary Statistics (BMS) which was published in 1976. The BMS was produced by the Board of
Governors and was based on the monthly Federal Reserve Bulletin. The Bulletin was officially discontinued in
December of 2008. The discontinuation of the SCB and FRB Bulletin
created major voids.
Reporting Federal Agencies would
produce the monthly report, and the Federal Reserve or BEA would take those
monthly numbers and aggregate them into long time-series histories. The
resulting publications saved the end user a tremendous amount of time in the
library. Federal Agencies were still producing the monthly reports, but
didn't have the foresight to create the long-term series that the Fed and
Commerce Department had previously been aggregating. A black hole was
created, and new data was sucked right in.
It didn't take long for coverage
gaps to open up between where the Survey
of Current Business left off and archival publications begin. Some
Agencies stepped up to the plate and hit home-runs, others have clearly struck
out. The EIA and USDA are great examples of the former; the USGS (Bureau of
Mines) and Federal Housing Agencies represent the latter.
For example, both the EIA and USDA
have long-term data on much of the data compiled earlier by the NBER, and in
the SCB. This data is easy to access
- in csv format and through an easy to use API.
The housing Agencies have dropped
the ball. Try finding FHA gross mortgage endorsements from 1994 through
1997 - not a trivial task. The VA never even published their gross
monthly guarantees data publicly. Someone at the BEA had to call a guy in
some back office to run the numbers each month. The Federal Home Loan
Bank doesn't know how much debt they had outstanding in the months between
quarter ends from the discontinuance of the FRB Bulletin Agency Debt table through December 2007. The FRB Bulletin tables relied on monthly
summaries from Fannie Mae and Freddie Mac; there are clear coverage and
definition gaps between the archived versions of these releases and the Statistical Supplement to the Bulletin.
The housing Agencies and the USGS
data dissemination standards are particularly egregious. The FHA, FNMA,
FHLMC, and GNMA all offer limited archived pdfs of excel documents for their
monthly summaries. The USGS only offers the last few months for their
excel files; in fact, they actually remove old files on a
rolling basis!
The right course of action would be
to aggregate the data - as the USDA and EIA have - into continuous long-term
time series and offer updated values via a text file, csv, or API.
The solution to filling coverage
gaps is mostly trivial. But, if no monthly reports were published, you
have to find the data. The closest thing we have today to a modern
statistical compendium and our obvious savior for the distribution of Agency
collected data is the St. Louis Fed's FRED. But it is not
their job to aggregate monthly reports, and yet there are no government
agencies doing the important work of compiling and aggregating. That is where
companies like Capital Markets Data (CMD) step in. CMD is founded on
aggregating the data, and bridging the gaps. Clearly, this is the sort of
activity the BEA and Federal Reserve should get back to doing. It
requires a massive effort, even in the age of super computers. It takes
effort to manually enter numbers.
There are steps that the Agencies
should take to remedy the situation. Data collection and dissemination is
a shared burden. Clearly, the individual Agencies should be maintaining
their own long-term high frequency databases, but it is not the responsibility
of any individual agency to collect all of the data in one place. Instead
they should be feeding their long-term histories directly onto an online data
repository, and FRED is the clear choice.
Yet there is another outstanding issue
Aggregating and disseminating data is easy if it exists, but discontinued data presents another problem:
On May 29, 2019, FHFA published its final Monthly Interest Rate Survey (MIRS), due to dwindling participation by financial institutions.
A common form of eliminating a series is through combining it into another line item. This is a favorite tactic of the Board of Governors. In a notable case, the Board eliminated loans to carry securities a series which has existed in various forms since 1917.
July 1, 2009: Some line items are no longer shown separately: security loans are now included in all other loans and leases;
Aggregating and disseminating data is easy if it exists, but discontinued data presents another problem:
On May 29, 2019, FHFA published its final Monthly Interest Rate Survey (MIRS), due to dwindling participation by financial institutions.
A common form of eliminating a series is through combining it into another line item. This is a favorite tactic of the Board of Governors. In a notable case, the Board eliminated loans to carry securities a series which has existed in various forms since 1917.
July 1, 2009: Some line items are no longer shown separately: security loans are now included in all other loans and leases;
References:
https://www.nber.org/databases/macrohistory/contents/
Òscar Jordà, Moritz Schularick, and Alan M. Taylor. 2017. “Macrofinancial History and the New Business Cycle Facts.” in NBER Macroeconomics Annual 2016, volume 31, edited by Martin Eichenbaum and Jonathan A. Parker. Chicago: University of Chicago Press.