FRED Overview#

In order to programatically request data from FRED, you first need to first request an api key from the FRED website. After you have done so, you’ll have the ability to request data from FRED via python. The python package I have always used is fredapi. There isn’t any special reason why I chose this package over the other ones I have seen, but it was the first one I encountered a long time ago and it hasn’t failed me yet, so I have felt no urge to change.

Establishing a FRED object#

Step one when using FRED is to import the package and establish a master “fred” object. The fred object is the primary work horse and will be doing most of the heavy lifting.

from fredapi import Fred

# Load in api from config file
with open('secrets.yml') as f:
    secrets = yaml.safe_load(f)


fred = Fred(api_key=secrets['fred_api_key'])

Requesting Data#

Now, let’s say we would like to see the US Nominal GDP growth over time. The steps to doing this are:

Find the data on FRED (Google tends to work better than directly searching in fred)
- My search: “fred us nominal gdp”
Verify this is the data you want.
Copy the series ID: found in the url or right after the title of the series.
Call the ‘get_series’ method on the fred object and pass the series ID as an argument:

FRED

Note: There are other arguments you can pass to the ‘get_series’ method, such as start and end dates, but I tend to just request all of the data and do any filtering as-needed later on.

# Pull Data:
us_nominal_gdp = fred.get_series('GDP')

# For some reason, FRED returns the values in 1946 as nan, so drop those:
us_nominal_gdp = us_nominal_gdp.dropna()

# Also, I think the data prints prettier in the notebook if it is converted to a DataFrame:
us_nominal_gdp = pd.DataFrame(data=us_nominal_gdp, columns=['GDP'])
us_nominal_gdp

	GDP
1947-01-01	243.164
1947-04-01	245.968
1947-07-01	249.585
1947-10-01	259.745
1948-01-01	265.742
...	...
2022-07-01	25994.639
2022-10-01	26408.405
2023-01-01	26813.601
2023-04-01	27063.012
2023-07-01	27644.463

307 rows × 1 columns

That’s basically all there is to it!

Helpful Functions#

There are a few tasks that I tend to do frequently when working with FRED data after pulling it from the api. Because of that, I created a few helper functions that I use as a wrapper around the api. These are what I primarily interact with. Some of the “enchancements” that I like:

Converting from series to a dataframe.
Providing a better name for the data, rather than the FRED series id.
Send in an iterable of ids to request data in bulk.
Merge the data together

So these are the functions I made:

def get_fred_data_series(series_id: str, name: Optional[str] = None) -> pd.DataFrame:
    '''This function pulls data from FRED and returns it as a DataFrame, if you input a name, it will use that as the
      column name, otherwise it will use the series_id as the column name. '''
    
    data_name = name if name is not None else series_id
    df = fred.get_series(series_id).to_frame()
    df.columns = [data_name]
    df.index.name = "date"
    return df

def get_fred_data(series_ids: Union[List[str],str], names: Optional[Union[List[str],str]] = None,merge_data:bool=True) -> Union[pd.DataFrame,dict]:

    # If the input was a string, we can just call the get_fred_data_series function once right now and return the result
    if isinstance(series_ids, str):
        return get_fred_data_series(series_ids, names)
    

    names = names if names is not None else series_ids
    ids_and_names = zip(series_ids, names)
    # Now pull the data
    all_data = [get_fred_data_series(series_id, name) for series_id, name in ids_and_names]

    if merge_data:
        result = pd.concat(all_data, axis=1,join='outer')
    else:
        result = dict(zip(series_ids,all_data))

    return result

To get a feel for it, and make sure it works, here are a few tests. First, single data series:

get_fred_data('GDP').tail()

	GDP
date
2022-07-01	25994.639
2022-10-01	26408.405
2023-01-01	26813.601
2023-04-01	27063.012
2023-07-01	27644.463

Now, try sending in multiple series ids that we want to combine - GDP, Real GDP, and cpi:

DATA_TO_REQUEST = ['GDP','GDPC1','CPIAUCSL']
DATA_NAMES = ['GDP','Real GDP','cpi']
data_df = get_fred_data(DATA_TO_REQUEST, DATA_NAMES)
data_df.tail()

	GDP	Real GDP	cpi
date
2023-06-01	NaN	NaN	303.841
2023-07-01	27644.463	22506.365	304.348
2023-08-01	NaN	NaN	306.269
2023-09-01	NaN	NaN	307.481
2023-10-01	NaN	NaN	307.619

Beautiful, now test how it looks if we don’t want to combine the data.

DATA_TO_REQUEST = ['GDP','GDPC1','CPIAUCSL']
DATA_NAMES = ['GDP','Real GDP','cpi']
# Below function will return a dictionary of DataFrames
data_dict = get_fred_data(DATA_TO_REQUEST, DATA_NAMES,merge_data=False)

# Show me the inflation data:
data_dict['CPIAUCSL']

	cpi
date
1947-01-01	21.480
1947-02-01	21.620
1947-03-01	22.000
1947-04-01	22.000
1947-05-01	21.950
...	...
2023-06-01	303.841
2023-07-01	304.348
2023-08-01	306.269
2023-09-01	307.481
2023-10-01	307.619

922 rows × 1 columns

Examples#

Finally, here are a few more examples of seeing how the data might be used in action to create some informative plots.

US GDP Growth#

We sort of already saw this above, but just to be thorough, let’s examine the quarterly growth rate of Real GDP in the US for as far back as FRED will let us look.

# Request the data and verify that it looks ok:
DATA_TO_REQUEST = ['GDPC1']
gdp_df = get_fred_data(DATA_TO_REQUEST)

# Calculate the Quarterly Growth Rate:
gdp_df['quarterly_growth'] = gdp_df['GDPC1'].pct_change()

# Drop NaNs that were introducted by the pct_change function (We also need to drop those NANs from 1926.) 
gdp_df = gdp_df.dropna()

gdp_df.tail()

	GDPC1	quarterly_growth
date
2022-07-01	21851.134	0.006586
2022-10-01	21989.981	0.006354
2023-01-01	22112.329	0.005564
2023-04-01	22225.350	0.005111
2023-07-01	22506.365	0.012644

Looks good, so let’s go ahead and plot the data, for good measure.

fig, ax = plt.subplots(figsize=(12,6))
ax.plot(gdp_df['quarterly_growth'])
ax.yaxis.set_major_formatter(FuncFormatter(percentage_formatter)) # Format the y-axis as percentages
ax.set_title('Quarterly Real US GDP Growth');

Display some summary stats as well:

gdp_df['quarterly_growth'].describe().to_frame()

	quarterly_growth
count	306.000000
mean	0.007717
std	0.011260
min	-0.078910
25%	0.003105
50%	0.007721
75%	0.012623
max	0.077592

What quarters had the lowest and highest GDP changes?

min_growth = gdp_df['quarterly_growth'].min()
max_growth = gdp_df['quarterly_growth'].max()


min_max_results_df = gdp_df[(gdp_df['quarterly_growth'] == min_growth) | (gdp_df['quarterly_growth'] == max_growth)].copy()

# Add a column to label the min and max values... Just to make it super easy on the eyes:
min_max_results_df.loc[min_max_results_df['quarterly_growth'] == min_growth, 'Label'] = 'Minimum'
min_max_results_df.loc[min_max_results_df['quarterly_growth'] == max_growth, 'Label'] = 'Maximum'

min_max_results_df

	GDPC1	quarterly_growth	Label
date
2020-04-01	19034.830	-0.078910	Minimum
2020-07-01	20511.785	0.077592	Maximum

The minimum and maximum GDP growth rates were when the economy was first entering and exiting COVID. Makes sense.

Inflation#

Inflation is another hot topic right now. Something interesting to note is that when most news articles or commentators are discussing inflation, they are usually referencing the CPI index. But the Federal Reserve Board of Governors has stated that they use the PCE index as their preferred gauge of inflation. That feels a little wierd, why the inconsistency? For now, I’ll ignore the why behind the discrepancy, and instead use FRED to show that they are basically the same thing:

INFLATION_SERIES = ['CPIAUCSL','PCEPI']
DATA_NAMES = ['CPI','PCE']
inflation_df = get_fred_data(INFLATION_SERIES, DATA_NAMES)

# YoY Inflation Rate:
inflation_df['CPI_YoY'] = inflation_df['CPI'].pct_change(12)
inflation_df['PCE_YoY'] = inflation_df['PCE'].pct_change(12)


# Drop nas:
inflation_df = inflation_df.dropna()
inflation_df

	CPI	PCE	CPI_YoY	PCE_YoY
date
1960-01-01	29.370	15.421	0.012410	0.016948
1960-02-01	29.410	15.437	0.014138	0.016997
1960-03-01	29.410	15.446	0.015188	0.016920
1960-04-01	29.540	15.502	0.019324	0.018595
1960-05-01	29.570	15.518	0.018251	0.019111
...	...	...	...	...
2023-06-01	303.841	120.221	0.030920	0.031984
2023-07-01	304.348	120.438	0.032991	0.033705
2023-08-01	306.269	120.877	0.037075	0.034109
2023-09-01	307.481	121.325	0.036899	0.034190
2023-10-01	307.619	121.385	0.032324	0.030066

766 rows × 4 columns

fig, ax = plt.subplots(figsize=(12,6))
ax.plot(inflation_df['CPI_YoY'], label='CPI ')
ax.plot(inflation_df['PCE_YoY'], label='PCE')
# Format the y axis as a percent:
ax.yaxis.set_major_formatter(FuncFormatter(percentage_formatter))
ax.set_title('CPI vs PCE')
ax.legend();

So yea, they track each other pretty closely! To be precise and quantitive about exactly how close “pretty close” is, we can calculate the correlation between the rates of inflation using each index:

correlation = inflation_df['CPI_YoY'].corr(inflation_df['PCE_YoY'])
print(f"The correlation between CPI and PCE is: {correlation:.2f}")

The correlation between CPI and PCE is: 0.98

VIX#

vix_df = get_fred_data('VIXCLS', 'VIX')

# For some reason, there are many missing values in the VIX data, so we will interpolate those:
vix_df['VIX'] = vix_df['VIX'].interpolate()

# Rolling 1 month average:
vix_df['VIX_1M'] = vix_df['VIX'].rolling(22).mean()

# Rolling 3 month average:
vix_df['VIX_3M'] = vix_df['VIX'].rolling(365).mean()

# Entire time period average:
vix_df['VIX_AVG'] = vix_df['VIX'].mean()


# Plot the data: (Not plotting the 1-Month because it follows the actualy number pretty close and the graph is too busy)
fig, ax = plt.subplots(figsize=(12,6))
ax.plot(vix_df['VIX'], label='VIX')
ax.plot(vix_df['VIX_3M'], label='3 Month Moving Average')
ax.plot(vix_df['VIX_AVG'], label='Average', linestyle='--')

ax.set_title('VIX')
ax.legend();

# Display the data:
vix_df

	VIX	VIX_1M	VIX_3M	VIX_AVG
date
1990-01-02	17.24	NaN	NaN	19.591738
1990-01-03	18.19	NaN	NaN	19.591738
1990-01-04	19.22	NaN	NaN	19.591738
1990-01-05	20.11	NaN	NaN	19.591738
1990-01-08	20.26	NaN	NaN	19.591738
...	...	...	...	...
2023-11-24	12.46	15.317273	19.990603	19.591738
2023-11-27	12.69	14.954091	19.949945	19.591738
2023-11-28	12.69	14.564091	19.909260	19.591738
2023-11-29	12.98	14.256364	19.871589	19.591738
2023-11-30	12.92	14.019091	19.835534	19.591738

8848 rows × 4 columns

# Show some basic summary statistics:
vix_df['VIX'].describe().to_frame()

	VIX
count	8848.000000
mean	19.591738
std	7.893034
min	9.140000
25%	13.900000
50%	17.790000
75%	23.010000
max	82.690000

Future To DO?#

Show how to get unrevised data

Apples

FRED Overview

Contents