Over on the RETS-Dev mailing list, there has been a discussion on-going about whether RETS should be trying to standardize the data or not. Those against standardizing the data say that the data is so non-standard now that if we try to standardize it, too much non-standard data will be lost in the process. Those for standardizing the data say that you have to start somewhere. Of course, this is a never-ending argument, as both things are true.
It is true that data in MLS systems too often is not validated. Some common examples mentioned in the discussion are fields like “age” that vary from a specific year to opinions like “old” and everything in between. Or a field like Approximate Square Feet that allows entries such as “HUGE” or “3000+” or other non-numeric data. These are just some of the many examples. It is tempting to say that this data should just be chucked, as, frankly, it is not very valuable. What is a “huge” house? What is an “old” house? On the other hand, there are cases where the data has some value but is just poorly structured. For example, there may be a phone number that says “123-456-7890 (call only after 5 p.m.)”. The “call only after 5 p.m.” is valid and important data but it isn’t part of a phone number. If a standard is rigidly enforced limiting data to the phone number, that data will be lost.
Also true, however, is that if we don’t start defining data standards at some point, these bad practices will simply continue. We cannot let the mistakes of the past define our future. This is not only true with regard to controlling the input on specific fields but defining a broad and deep set of fields. Many MLSs only track what they think about themselves and I think we, as MLS vendors, need to do a better job of helping our clients learn the “best practices” of the industry. We need to help our clients learn from each other, instead of having to continually oar their own ship.
But this is no easy task. First, we’re battling the adage that “the client is always right.” Second, we’re battling the truth that change is hard. Coupled together, these two forces present a strong barrier to getting new MLS clients to adopt a different way of tracking data. This is one of the reasons we’ve taken an active role in trying to establish national standards on data and that we’re insistent that the MLSs themselves be involved in this process. The client is always right and so the MLS clients need to ask for these changes. Similarly, change is less hard if done for compelling reasons, and nationwide data standards are more compelling than local standards.
Fortunately, some MLSs are already asking for data standards. There are some very large efforts on-going right now in California to merge some big data sets together. At the RETS Conference last week, Frank Tadman from REInfoLink showed us a very cool tool for comparing and linking data from the five or six MLS systems they are harmonizing in the San Francisco area. Similar efforts also are occurring in Southern California now and have been on-going for some time at large regionals like MRIS.
As important and large as these efforts are, however, we also need to recognize that these are only a few MLSs, maybe twenty or twenty-five out of 700. The other 650 or so MLSs undoubtedly have some knowledge and best practices contained in their data sets, too. We need to capture that knowledge and bring it to the table. To that end, we’ve been working on what we’re calling our “best design” at FBS, which involves the painstaking process of reviewing all 100 or so MLS meta-data structures we’ve set up to date and trying to harmonize them. This isn’t just an aggregation, effort, though. We’re also trying to refine the data into what is the best method of representing it most accurately. Our hope is that this will add to the collective knowledge and process of defining the data standards for the future.
Without a doubt, this is a transitional stage from lack of standards toward standardization. This transitional stage will be difficult and, historically, has proven to be a desert too far to cross. Not this time, though. The need is too great. We must define a path that allows the industry to move toward data standards. We need to define the standards now with MLSs so they can begin collecting better data. Until that time, we need to allow the disparate data (this house is “HUGE!”) to continue to be transported. If these two needs cannot be met simultaneously, then the need to progress with standards is more important than preserving the disparate data.