The Dismal State of Federal Agency Data Dissemination

In the heart of the current data science revolution; the state of data dissemination and aggregation among the Federal Agencies remains disorganized, disparate, and dismal. It wasn’t always this way, and doesn’t have to be this way.  The Federal Agencies can do more to make their data useful and accessible to the tax-paying public.

The Origin of the Problem

The origin of the problem must have been a structural change in the thinking and funding of the economics profession.  The economics profession must have gotten much more confident in their understanding of the world, to view data collection and aggregation as no longer a worthy exercise.  This idea can be witnessed in the slow transformation in the publications of the National Bureau of Economic Research, the Federal Reserve Statistics Division, and the Department of Commerce.

As the official dating organization of the U.S. business cycle, the NBER put a tremendous amount of effort into understanding the cycle; this effort manifested in multi-year projects to collect a wide swath of real economic and financial data.  The trail blazers at the NBER believed that with more data and longer histories they could come up with stable patterns of peaks and troughs.  It was believed that the patterns and their lags and leads would help them to untangle causation from the correlations. 

Macro researchers have never stopped seeking out new data to understand gaps in business cycle theory and knowledge.  But this data collection comes and goes.  Series are aggregated and then no attempt is made to update them.  

For academics and researchers, the starting point was annual U.S. data, then it became imperative to tease out correlations which required the collection of higher frequency measures of the business cycle.  Now we have regressed backwards from more monthly data to the use of low-frequency cross country data-sets (note Pickettys’ database on inequality; and the Òscar Jordà, Moritz Schularick, and Alan M. Taylor Global Macro database.)

While it was the imperative of the NBER to collect and study business cycle data, it was never really their academic responsibility to maintain it.  This contrasts with another organization, The League of Nations, which collected country specific financial and economic data from 1919 through 1946.  Much of the data that the League had collected was later picked up by the IMF in the International Financial Statistics.  

Getting back to domestic organizations  

It was also the will of the Department of Commerce and the Bureau of Economic Analysis to compile U.S. economic and financial data from the respective U.S. Agencies and publish it.  The master compilations were the Biennial Supplement to the Survey of Current Business (SCB) which saw its last issue in June 1992. The other was the Banking and Monetary Statistics (BMS) which was published in 1976.  The BMS was produced by the Board of Governors and was based on the monthly Federal Reserve Bulletin.  The Bulletin was officially discontinued in December of 2008.  The discontinuation of the SCB and FRB Bulletin created major voids.  

Reporting Federal Agencies would produce the monthly report, and the Federal Reserve or BEA would take those monthly numbers and aggregate them into long time-series histories. The resulting publications saved the end user a tremendous amount of time in the library.  Federal Agencies were still producing the monthly reports, but didn't have the foresight to create the long-term series that the Fed and Commerce Department had previously been aggregating.  A black hole was created, and new data was sucked right in. 

It didn't take long for coverage gaps to open up between where the Survey of Current Business left off and archival publications begin.  Some Agencies stepped up to the plate and hit home-runs, others have clearly struck out. The EIA and USDA are great examples of the former; the USGS (Bureau of Mines) and Federal Housing Agencies represent the latter. 

For example, both the EIA and USDA have long-term data on much of the data compiled earlier by the NBER, and in the SCB. This data is easy to access - in csv format and through an easy to use API.   

The housing Agencies have dropped the ball.  Try finding FHA gross mortgage endorsements from 1994 through 1997 - not a trivial task.  The VA never even published their gross monthly guarantees data publicly.  Someone at the BEA had to call a guy in some back office to run the numbers each month.  The Federal Home Loan Bank doesn't know how much debt they had outstanding in the months between quarter ends from the discontinuance of the FRB Bulletin Agency Debt table through December 2007.  The FRB Bulletin tables relied on monthly summaries from Fannie Mae and Freddie Mac; there are clear coverage and definition gaps between the archived versions of these releases and the Statistical Supplement to the Bulletin.  

The housing Agencies and the USGS data dissemination standards are particularly egregious.  The FHA, FNMA, FHLMC, and GNMA all offer limited archived pdfs of excel documents for their monthly summaries.  The USGS only offers the last few months for their excel files; in fact, they actually remove old files on a rolling basis!

The right course of action would be to aggregate the data - as the USDA and EIA have - into continuous long-term time series and offer updated values via a text file, csv, or API. 

The solution to filling coverage gaps is mostly trivial.  But, if no monthly reports were published, you have to find the data.  The closest thing we have today to a modern statistical compendium and our obvious savior for the distribution of Agency collected data is the St. Louis Fed's FRED.  But it is not their job to aggregate monthly reports, and yet there are no government agencies doing the important work of compiling and aggregating. That is where companies like Capital Markets Data (CMD) step in.  CMD is founded on aggregating the data, and bridging the gaps.  Clearly, this is the sort of activity the BEA and Federal Reserve should get back to doing.  It requires a massive effort, even in the age of super computers.  It takes effort to manually enter numbers.

There are steps that the Agencies should take to remedy the situation.  Data collection and dissemination is a shared burden.  Clearly, the individual Agencies should be maintaining their own long-term high frequency databases, but it is not the responsibility of any individual agency to collect all of the data in one place.  Instead they should be feeding their long-term histories directly onto an online data repository, and FRED is the clear choice.   

Yet there is another outstanding issue

Aggregating and disseminating data is easy if it exists, but discontinued data presents another problem: 

On May 29, 2019, FHFA published its final Monthly Interest Rate Survey (MIRS), due to dwindling participation by financial institutions.

A common form of eliminating a series is through combining it into another line item. This is a favorite tactic of the Board of Governors. In a notable case, the Board eliminated loans to carry securities a series which has existed in various forms since 1917.

July 1, 2009: Some line items are no longer shown separately: security loans are now included in all other loans and leases;



References:

https://www.nber.org/databases/macrohistory/contents/

 Òscar Jordà, Moritz Schularick, and Alan M. Taylor. 2017. “Macrofinancial History and the New Business Cycle Facts.” in NBER Macroeconomics Annual 2016, volume 31, edited by Martin Eichenbaum and Jonathan A. Parker. Chicago: University of Chicago Press.