That's Going on Your Permanent Record, Mister!

Apr 8, 2007 Michael Wurzer

Alternate Titles: “Why Zillow Is Old School” or “Why Listing Data Standards Are Important”


I’ve previously suggested that listing data standards are very important for the MLS community. My reason for suggesting standards are important is that they will make forming a distributed national listing repository faster and easier. A small but important step toward that effort is standardizing the data for uniquely identifying properties (e.g., address, parcel number, geo-codes, etc.). This is important for creating a distributed national repository but also for solidifying the “permanent database” already in MLS systems.

What is a “permanent database” you ask? Here’s Greg Swann from the Bloodhound Blog explaining the concept as he has envisioned it being created by Zillow in their attempt to take over the world of residential real estate:

Right under our noses, Zillow.com has effected a similarly radical shift in the way we think about real estate databases. All of the dead-bots-walking I named above (Trulia, PropSmart, Google Base, Realtor.com, and every local MLS system) every one of them treats data in the same way: There is an on-going application that will be effected using temporarily-stored data.

The Zillow.com paradigm is exactly the opposite: We are accumulating a database of permanently-stored data that can be deployed for any number of temporary, interchangeable applications.

The Zillow model is exactly the same paradigm you use with your own Contact Management system: The database is forever. The Christmas Card list is old news as soon as the cards are mailed.

This is a simple enough idea that all of us should be saying, “Duh!” But it remains that, of every kind of real estate database we can name, only Zillow’s , and those of its close competitors. is based on the premise that any particular house will still be the same house the next time it is sold. Want a Zestimate of that house? Done. Want to see how the owner disagrees with the Zestimate? Done. Want to put it up for sale? Done. Want to put a dream price on your home to see if someone wants it even more than you do? Done. A potentially infinite number of applications all from the same one, permanent database.

And it is this last aspect that seems to me to put Zillow so far ahead of everyone they’re competing against. The Realty.bots are toast, period, as is any vendor that treats its real estate database as a temporary nuisance. But the move they’re making into listing, and it is a very strong move, puts Zillow far out in front of the pack of recent AVM (Automated Valuation Method) entrants.

(Emphasis added.)

And here are a few further quotes from the same post that re-emphasize the idea of a “permanent” database:

Whereas “Realtor.com [and presumably all the dead-bots-walking and the local MLS system] has every MLS-listed home. Zillow.com has almost every listable home. They have an infinitely improvable franchise, and everyone else has popsicles melting in the sun.

[and . . . ]

“Zillow’s applications are on-going against a permanent database that will soon include almost every resellable residential property in the United States. The rationale for going anywhere else will get thinner and thinner with each incremental improvement in that database.

Let me try to summarize this even further:

  • MLS systems and the dead-bots-walking focus on listings, which are temporary, like popsicles, because they are sold, canceled, withdrawn, and expire.
  • Zillow is building a “permanent” record for each property and all data related to that property — listings, guesstimates, owner information, user questions, user answers, pictures, etc. — are all added to that permanent record to build a history of the property.

Are you still uncertain what a “permanent database” is? Since Zillow supposedly is the only example of such a database for real estate, perhaps examining how they started will help clarify the matter. Zillow started with data from local county assessors, property tax authorities, and county GIS departments. This data is the “base” on which the “permanent” record of Zillow is being built. Everything else in Zillow is added onto these records, including their price guesstimates and what they hope are many for sale advertisements (listings) from owners and agents (or anyone else, apparently).

So, what exactly is in those public records that form this permanent database? At the end of the day, there are only two or three pieces of information in the public records that are important to creating a permanent database: address, parcel number, and hopefully a geo-code of some sort such as latitude and longitude. All the other data (assessed value, tax amounts, owner name, etc.) are not permanent, though they can be stored permanently as part of the history.

I’ve written previously how important parcel numbers are in uniquely identifying the individual properties, because parcel number is the most reliable way for MLS systems to tie together listings, tax records, parcel maps, and other data about the property. In other words, the parcel number is a way to create a “permanent ” record of the property so that all data regarding that property can easily be brought together into whatever display or report is desired. This is already being done in many, if not most, of the MLS systems of which I’m aware, though it is not universal and could be done better.

This is why data standards are so important. They allow disparate data sources to work together. Zillow’s revolution, apparently, is that they are going to be the single source of data entry for all data relating to residential real estate, and so they will become the permanent record. This is one way of going about creating a permanent record: Insist that all data be entered into one database. However, that’s a pretty old school approach, not very in tune with what’s going on today in all corners of the world.

Today, data is being created everywhere, not a single place. The approach to making that data useful to the world (and not just to one company) is to create standards for the data, so that it can easily be shared and linked together. Blogs, which rely on RSS, URLs and HTML standards, are a great example. These standards allow for easy exchange and linking of blog posts, regardless of blog platform. Extending these same types of data standards to the richer data inside of databases is what web 3.0 (aka, the semantic or database web) and providing richer definition to the data so computers can process it more intelligently regardless of the source. An example is the recent announcement by Microsoft, Google and Yahoo! that they are adopting GeoRSS as a standard for representing location information. (Thanks to Matt Cohen for the link!)

The web has already proven that Zillow’s approach of insisting that everyone enter the data into their system won’t happen. People want control over their data and they don’t want to just surrender it to Zillow. They want the data to work well with other data but they still want control over their data. They want to be able to create data, mash it up with other data, push it here and there, pull it this way and that. Data standards enable this freedom and that’s why they are so important — so we can make sure everything goes on your “permanent” record.