Modelling Non-Linear Relationships in Excel

John Hansler
Jul 24
3 min read

You should probably have a basic understanding of statistics before reading this article, but I go over the main points and you can view the file below. Keep in mind, this article is about creating multiple regression models with non-linear independent variables.

Obviously, step 1 is to import your data. I took the data from the FRED here and you can use an API key to do so as well or just download the Excel files. Effectively, we'll use this data to look at relationships between disposable income (DSPI), the consumer price index (CPI), and domestic auto inventories (Auto_Inv), which were selected for some hypothetical relationship (do Auto_Inv and CPI help predict CPI?).

Step 2 is to clean the data. Auto_Inv only goes back as far as 1993, so we filter out just for 1993 using XLOOKUP. Keep in mind that I, for simplicity here, chose data with monthly sampling only (no back-filling, for example, required) and I do not split by regime (which you should do for clients and understanding how interactions change over time and with the macroeconomic environment).

Step 3 is straightforward- go to the data tab, go to data analysis in the right corner, then scroll to regression. We'll select our X and Y values from there and it will create a new sheet with the summary regression statistics. When you have the column titles highlighted, you will need to check "Labels" for this to work.

Spreadsheet showing data columns: observation date, CPI, Auto Inv, DSPI. A "Data Analysis" window with options like Regression is open.

Spreadsheet showing regression analysis summary with tables for Regression Statistics, ANOVA, and Coefficients, highlighting data values.

So, that's cool. Now we know how to create a multiple linear regression model in Excel. Looking at the P-Values, both linear factors are significant and can be included in a model. Next, we'll expand our data set to look for exponential relationships (non-linear relationships) and interactions (see below and it should be fairly clear what I did). Keep in mind, this is pretty much, aside from educational value, a pointless exercise here, as the R^2 value is ~0.83 from the MLR model and, as you'll see, it doesn't get much better.

Spreadsheet showing data columns: observation date, CPI, Auto_Inv, CPI^2, Auto_Inv^2, CPI*Auto_Inv, DSPI, from January 1993 to December 1995.

Excel sheet displaying statistical data, regression analysis, and plots. Conditional formatting options are visible in the toolbar.

Now, you can see the P-Values the relationships. I'm enabling conditional formatting to automatically highlight values under 0.05 (assuming alpha = 0.05, as our condition for significance). You can see below that all factors except Auto_Inv^2 are significant. I've also highlighted the approximated multiple regression model (only include significant factors).

Excel sheet showing regression statistics, ANOVA table, and residual plots. Plots display residuals for CPI, Auto_Inv, and others.

Now, I just want to draw attention to the graphs for a second. Basically, we have residuals checked. Residuals are the error terms and the residual plots show the spread of errors. Preferably there is no difference between the forecasted (or modelled) values for Y and the actual (historical) values for Y and this will be a straight line along the X-axis. The other comment is on the Normal Probability Plot (aka a Q-Q plot), which tells us if the distribution of Y values is normally distributed (historically). We can see the Q-Q plot produces enough deviation from a straight line to suggest this is not normally distributed.

Excel sheet showing regression analysis with coefficients, residuals, and probability outputs. Includes scatter plots and a normal probability plot.

I'm not going to do anything else with this today but you could also calculate skewness and kurtosis, split this into different regimes, and examine other relationships or calculate items such as Akaike information criterion to determine parsimoniousness of the model (reduce overfitting errors).

If you found this helpful or enjoyable, please consider subscribing to our newsletter for more.

You can reach out to us here for consulting services or here to discuss booking a workshop to learn useful skills and help you get more familiar with us.

Home

Modelling Non-Linear Relationships in Excel

Related Posts

Comments