Regression replaces much of the tedious pondering, comparing,
and fitting and justifying required by traditional methods.

– Guide to Appraisal Valuation Modeling

When you’re done with this guide, you’ll be able to create regression-based adjustments, evaluate their accuracy, tweak and refine your analyses, and explain your methodology.

Regression (also called “hedonic” regression) requires some time and experience to get used to, but in not too long you’ll find it to be a useful time-saver and an important supplement to your judgment and expertise.

With a bit of practice you’ll be able to drop in data, select a few variables to get adjustments for, and get data-based results like this:

Note: Much of the below applies to any kind of statistical software. For our examples we’ll use Statwing, which was specifically designed to make regression as straightforward as possible.

Note: This guide was heavily influenced by Guide To Appraisal Valuation Modeling, put out by the Appraisal Institute. For those looking to dive deeper into regression, it’s a great read.

## Step-by-step regression tutorial video

With the exception of the “Matched pairs on steroids” section below, all the content in this guide has been aggregated into a 30-minute step-by-step video tutorial. We’d highly recommend this video as the place to start with regression in Statwing.

## Matched pairs on steroids

Think of regression as “matched pairs on steroids”

- Presentation at 2012 Appraisal Institute Annual Meeting (Source)

This section will give you a bit of intuition about how regression works. Feel free to skip right to how to run a regression.

Imagine you’re using paired sales to analyze the following very similar property sales:

Property   Price   Age   Bathrooms   Square feet
A \$100,000 40 2 2000
B \$110,000 40 2 2400
C \$120,000 30 2 2400
D \$125,000 30 3 2400

From this table you could use paired sales to compare

• Property A to Property B to get a \$25 / square foot adjustment
• Property B to Property C to get a - \$1,000 / year of age adjustment
• Property C to Property D to get a \$5,000 / bathroom adjustment

If you were to stick that same dataset of four properties into a regression analysis, you’d reach the same conclusions:

The result is a mathematical formula that says that

Price     =       - \$1K * Age      +      \$25 * Square feet      +      \$5K * bathrooms      +      \$10K

That model gives us the same results that the paired sales analysis did (e.g., each additional square foot is worth \$25). Regression does the same math that you would do yourself to find a result, but its capable of analyzing many more properties at once.

The problem with paired sales is that it depends on very small sample sizes to make big conclusions. Accordingly, the Appraisal Institute’s Guide to Appraisal Valuation Modeling describes paired sales as “pretending to be objective” (p. 55).

Statistically, paired sales is the same as asking one random man and one random woman their height, finding the man to be 2.5 inches taller, and then concluding that men in general are 2.5 inches taller than women. It’s directionally correct, but if you’d asked even 10 more men and women, your conclusion would be much more likely to be near the true answer (~5 inches). And if you asked 50 more, you’d be really confident in your conclusion.

So we want to add more property sales to our analysis to be confident in our results. But doing the math by hand becomes pretty tricky once you’ve got 5 or 10 or 50 sales. Regression handles all that math for you, and the result is “paired sales on steroids.”

## Data Selection

In general you want to select every sale in the subject property’s market in the last five years.

If you don’t select enough property sales, your results won’t be very reliable statistically. And grabbing more and more data makes it easier and easier to get better adjustments for non-GLA variables like garage or bathroom.

In general, if you’d consider taking a comp from a neighborhood, you should include that neighborhood in the dataset. We’re looking for a bit of variety so regression can understand how that variety affects price. Ideally you’d grab 1,000 or even multiple thousands of sales.

You’ll end up filtering down in Statwing to properties that are more like your subject, but only insomuch as you still have 100 or multiple hundreds of sales left in the regression. So having a wide geographic filter allows you to try narrower filters on other things, like Sold Price, and see which works best.

If you’re skeptical about taking a wider geographic range, try it and compare it to a regression with a narrower geographic focus, and see which one gives you better results.

## Data Import

Statwing uses “templates” to clean up your MLS’s data, hide unnecessary variables, and to extract things like Distressed sales and or remodeled from the public remarks.

Go to Statwing, click “Apply template”…

… and see if there is a template yet for your MLS.

If there is, select it. If not, email us and we’ll set one up for you within 24 hours.

If you have a template available (or after we create one), make sure to follow its directions. It will often indicate you use a specific export from your MLS, or set up a custom export from your MLS. If you don’t follow these steps your data won’t work well in Statwing.

Make sure that your data include column headers in the first row (like “Square Foot” and “Pool”, describing the column of data).

## Check your data

At this point you’ll want to make sure your data is appropriate for a regression. Statwing makes it easy to understand one variable at a time by selecting it, then clicking Describe.

You can also Describe your whole dataset all at once by selecting everything and then Describe.

Scroll through the data and note three things:

### Incorrect data

MLS data frequently includes small dirty data, for example a 0 for a value when that value should be blank. For example, no property should have 0 for GLA:

### Outliers

If one of the property sales was for \$450K and the others were for \$150K-\$300K, that property should probably be excluded. It becomes a bit of a judgment call if your outlier is less extreme, like a \$350K property instead of \$450K, or if you have very few properties to work with.

Related, if you have only a few examples of a characteristic, like whether a property has a pool, you’ll probably want to filter those out (or at the very least you won’t be able to get a good adjustment for them).

You don’t need to worry about actually removing dirty data or outliers quite yet. Write down these issues, though. When you run your regression, you’ll immediately filter out the outliers and dirty data using Statwing’s filter functionality.

## Variable management

Sometimes the data you upload doesn’t quite have everything you need, so you need to change some settings or create a new variable off of the raw data you uploaded. Not everyone needs to do these steps, so click on them below if you’d like to learn more, or just skip on to the next section.

## Run regression

Select the Price variable first. You’ll see a little key icon  appear by it, indicating that that’s the outcome we’re trying to explain.

Then select the other variables that you’ve decided to enter into the model.

Select the “Regression button…

Before you even look at the results, click “Filters” in the upper right of the card…

Then filter out the outliers and dirty data you noticed before.

The last step before looking at your results is to filter down your Sold Price to a relatively narrow range. For all other variables you’re only filtering out outliers and bad data. But Sold Price you want to filter down to a narrower range, like “\$100,000 to \$150,000″ or “\$300,000 to \$500,000″ (if you have less than 100 datapoints after filtering, you should probably widen this filter.

Once you’ve put in place those filters, here’s your result:

If you put your mouse over a given variable, the appropriate adjustment for that variable will be described, like we did with Pool above.

So our adjustment for a pool is \$18,000 and our adjustment per square foot is \$82.

Those adjustments aren’t set in stone. They have confidence intervals around them, so if you put your mouse over the \$82 it might say that the math thinks the adjustment is somewhere between \$70 and \$94. You’d be statistically supported using an adjustment anywhere in that range.

That is, assuming we feel like this model is good enough. Typically regression models require a bit of iteration first, though.

## Evaluate and improve your model

There are two primary ways to evaluate your results.

### Sample size

As you filter your data, you’ll notice your sample size dropping.

### Generally implausible results

#### What is it

The Appraisal Institute’s Guide to Appraisal Valuation Modeling explains that “regression coefficients are usually larger than what an appraiser typically uses.” This is because if there’s an unincluded variable that correlates with both Price and one of the variables you’ve entered, the variable you have entered will to some extent “take credit” for the additional value created by the unincluded variable. This isn’t necessarily a problem, especially since Fannie Mae no longer has arbitrary limits on adjustment size.

But sometimes you’ll get numbers that just don’t feel realistic, like a very high GLA, or maybe a negative value for something like a Garage spaces that should be positive.

Usually the way to deal with this is to filter your data down more. So if Garage spaces is your problem, you might filter down to only 2 – 3 spaces. Or even filter down to only 2-car garages just to eliminate the variable altogether (you won’t be able to get an adjustment for it, but you’ll still have controlled for it, as far as your other variables are concerned.

Or you might instead choose to filter down “Sold Price” a bit more, which is often really effective, since it in effect controls for quality, the most important “missing” variable in the regression.

Or, if the variable isn’t statistically significant, you might just remove it from the equation.

### Other

Statwing provides a few other diagnostic tools that are relevant for other types of data analysis, but aren’t critical for appraisal regression. So you can safely ignore things not covered in this report, including the “AICR” metric, and the “Diagnostics” charts at the bottom of every regression result.

Similarly, Statwing’s regression results include the ability to input a few numbers into the model and get a result, for example setting GLA to 1900 and Age to 15 and Pool to “True” and getting the model’s estimate of what the average property with those values would sell for.

Feel free to play around with those values, but don’t take their results too seriously. Ultimately getting a valuation isn’t about sticking numbers into the model, it’s about taking the results of the model and applying them to your comps, as always.

Regression isn’t a one-click-you’re-done process. As seen above, you’ll frequently need to add variables, subtract variables, and filter down your data, maybe sometimes add more, etc. The first few times you try, it may feel a bit overwhelming, but you’ll very quickly get a feel for it.

## Applying the adjustments

The results of this process are then applied to your comparables, just as you’d apply an adjustment from paired sales, or any other source.

In our example we needed to make adjustments for Pool and GLA. In an ideal world we’d create a single regression model that included both of those. But that’s not always possible, and it’s fine to create your adjustments separately, from separate models with different property sales.

Once you’ve created a model, you can generate some language describing it by clicking the “Export” button in the top right of the analysis card.

That will download an Excel file; when you open the file you’ll see a cell of text describing the regression.

In addition to the exported specifics of the model, you can also use the below generic language to describe your process:

“The comps used in the sales grid are considered the most comparable to the subject property available at the time of this appraisal. Each of these comps were adjusted using a combination of Quantitative Analysis, Qualitative Analysis and Regression Analysis. First, differences are identified as being inferior/similar/superior to that of the subject (for items that appear to be driving value; this based on experience in the market place, and backed by regression analysis). Regression analysis is run on multiple variables in order to find a starting point for adjustments. After the starting point is determined, a sensitivity analysis is completed, “fine tuning” adjustments to fit the comps gridded, and that leave the sales grid with adjustments that make sense and adequately supported for the market.”

And if you’d like, you can print the output using your browser’s built in Print function, under the File menu.

## Step-by-step regression tutorial video

By far the best way to get started with Statwing is to follow along, using the below video. It’s tricky to get going with regression, initially you’ll almost always get really strange results, and it takes a bit of tweaking to get a really good regression model. The video walks you through that process.