An almost perfect real-world hack
Almost immediately after we bought our house I received a letter in the mail. The local school district was appealing the tax assessment of my house. The issue in question was the value of my property in 2002 (don’t ask, it’s a long story, and a silly law). My property tax is some percentage of the 2002 value of my home, and they thought my assessed value was too low. Great.
Researching…
We bought our house in 2009 for almost $15,000 less than the previous owners paid for it in 2004. I was amazed that the school district thought our taxes should go up despite the fact that the house’s price had gone down. This was my first piece of evidence, but I needed to understand more about these tax hearings.
Turns out that most people don’t show up to their assessment. The school districts use your sale price in 2009, multiply it by some magical number (say 90%) and they argue that is what your house should be assessed at. This strikes me (and many others) as a ridiculous method of determining the value of the house. Since few people show up, they tend to win and people’s assessments tend to go up.
In order to counter argue, you need to establish what your house was worth in 2002 based on comparable sales that occurred in and around 2002. I decided to do some research. Lucky for me, my county has a website that lets you look up the history of each house (including when it was sold, and its measurements like sq.footage, number of beds and baths, etc.).
The Allegheny Assessment Website
So this website lets you look up details for each house by address. I tried looking at the history of each house in my neighborhood trying to find, in vain, houses that matched mine and that was sold in 2001 or 2002. Actually finding comparables with this website was going to be quite a tedious adventure.
It’s important to note that it’s only by address (or by parcel number). You can’t do things like “show me all the houses that sold in 2002″. That would be too convenient, of course. I wonder how many tax assessment appeals the county and the school districts have won precisely because the website was so difficult to search.
After about 10 minutes of wandering around aimlessly through nearby neighborhoods, I got an idea. This is why God invented computers. Let’s build a web scraper.
Scraping the detail pages
The very first thing I noticed was that if you knew the parcel number of a property, you could look up the property with a simple GET request. Here is an example page (this is Heinz Field, home of the Pittsburgh Steelers).
Ten minutes later, I had some scripts in python to download all the necessary pages, given a parcel number, and extract all the vitals: address, square footage, sales dates, sales prices, number of bathrooms & bedrooms, lot size, and so on. The script then dumped that information into a database.
So far so good, now I just needed to get all the parcel numbers for my area.
Getting the parcel number list information
I needed to find the parcel id for every single house in my town.
Their search box wouldn’t let me search for ‘*’. Searching for ‘A’ in the street address, however, gets you a paginated list of every house in that town on a street that began with the letter ‘A’. These searches were all conducted with a POST to a single url. All I had to do was figure out to form these POSTs and how to turn the page. This is where I hit a speed bump.
The “next page” button on the results page was a javascript call that formed a new POST. Their POSTs had some “features” (I believe from some windows web development tool — __VIEWSTATE, __EVENTVALIDATION were the arguments in the POST) that made crawling it non-trivial. The POST required some magical numbers that were formed in javascript.
It probably wouldn’t have been too difficult to grep out the magic numbers and emulate the javascript code in python to formulate a properly formed POST to get the data I needed. I had a quicker solution.
iMacros
iMacros is a nifty firefox plugin that I’ve used from time to time that turns your browser into a scraper. It’s a heck of a lot slower than doing it from python, but you can “scrape” a page as if you were the user. This works brilliantly for very simple stateful tasks that require things like cookies or javascript. In iMacros, I could tell it to record all the parcel numbers on a particular page and then click “next”, and repeat.
I was only about a half hour into my adventure and I already had the necessary scripts to scrape their site. Not bad. I let iMacros run for an hour, getting the parcel numbers of every property. One more hour later I had run my python scripts with the parcel number list and had a database filled with every single house in my town.
I had just ran about 2 hours of constant web requests on a county server to build a database of about 5000 homes. And now I just posted this information on the internet. I hope I don’t get arrested.
Comparables, Comparables
Once I had my database, it was time to search. One SQL statement later I had found every house that had as much square footage, as much lot size, as many bedrooms, as many bathrooms, that was sold in 2000-2002 for less than my current tax assessment. 84 results. Ha ha.
That’s probably too many, huh? My town is shaped funny, and so some of these houses on this list were actually quite far away. Luckily for me, the parcel numbering is fairly sane. I was quickly able to figure out which parcels were near me based on the first portion of the number. I removed anything that was too far away.
I ended up with 31 residential homes that were within 4 miles of my house that were sold in 2000-2002 that had all better measurements and sold for less than my assessment. I began printing. Every one of these houses was better than my house (on paper, at least), very close by, and sold for less than my assessment. I was going to walk into court and argue that my assessment was too high.
I picked the 7 best to state my case, and printed up detailed reports on each of those as well. I’d open with these 7, make my case that my assessment was too high. I’d say we bought our house for less than the previous owners, more proof it was too high. And if I needed it, I’d bust out the list of 31.
Court
I spent most of the night before rehearsing my speech for court. I wasn’t rehearsing the part about how they should lower my tax assessment. I was rehearsing the part where I explained how I got such detailed comparables, how it was totally legal, and how they shouldn’t throw me in jail for “hacking”.
I walked into the courtroom with about 100 pages of documentation in a nifty leather notebook. 31 comparables and 7 of those in detail. I had absolutely no idea what they would do. I imagine lawyers or real estate agents have access to databases like this, so they’d probably just assume I knew someone. There was the outside chance that no one had ever done anything like this before.
In the room sat two lawyers (my opponents) and the guy in charge (I presume he was a judge). It wasn’t in a court room but an office adjacent to the court room. This is basically what happened:
Judge: What’s your name?
Me: Louis Brandy.
The judge and opponent lawyers shuffled a lot of papers around. The two lawyers looked at the their papers for awhile.
Judge, to lawyers: Alright, what are we doing?
Lawyer: This appeal was been going on for 2 years, with the previous owners. They [me+my wife] are the new owners. They bought it for $15,000 less than the previous owners. That puts their house into agreement with our numbers. There’s no need to change this assessment. We will withdraw the appeal.
Judge, to me: You can go forward, if you want, and I’ll decide what to do with the assessment — up or down. They are willing to withdraw the appeal, is that ok with you?
Ah, crap. I had a decision to make. Should I push forward and risk being embarrassed? What if I printed up the wrong numbers? What if my comparables were garbage? What if they had equal comparables to keep my price at the same level? Accepting the the draw was the obvious choice. My only hesitation was that it would ruin my story.
I thought briefly but there wasn’t really a choice. I said yes, their withdrawal was fine by me. I signed the papers and walked out of my hearing.
It took all of five minutes and I said exactly three words. I showed no one the 100 pages of awesomeness in my little notebook. Such a good story was going to be wasted on such an anticlimax. C’est la vie.
August 3rd, 2009 at 12:12 pm
Not a waste! Most exhilarating programming read in who knows how long!
August 3rd, 2009 at 12:18 pm
so, how much did you end up saving?
August 3rd, 2009 at 12:28 pm
In Houston, the process is similar. The database site is easier to use though. http://www.hcad.org/
I did what you did in excel in about an hour of searching and sorting. Knocked about 12k off my appraised value. The appointment took about 15 minutes.
August 3rd, 2009 at 12:32 pm
An interesting addition would be to use google maps API to get the actual distance from your house to all the other houses in the db.
August 3rd, 2009 at 12:41 pm
Great story! I know about iMacros but haven’t had a chance to use it yet, will have to add it to my toolkit.
August 3rd, 2009 at 12:48 pm
Very neat! I wrote a similar scraper for my county real estate records a while ago to analyze the accuracy of Zillow’s estimates in my neighborhood. I tried using iMacros, but I found it to be somewhat clunky when dealing with the structure of my county’s web site. My county’s system is completely session based, so editing the URL parameters doesn’t work.
I started using HtmlUnit (open source GUI-less browser package for Java) and found that to be my favorite solution. Java normally would not be my first choice for a project like that, but HtmlUnit’s functionality made it worth it. Check it out some time if you ever take on a similar project.
August 3rd, 2009 at 12:50 pm
Cool story, thanks for sharing!
You were also correct about the __VIEWSTATE and __EVENTVALIDATION fields. They’re variables part of the Microsoft web programing framework ASP.NET
Viewstate is an encrypted variable used by the server to persist changes to a web form (which in .NETs case is the whole page) across multiple requests.
Good guess
August 3rd, 2009 at 1:39 pm
Interesting. I’m doing almost the exact same thing. This past year I thought my appraisal seemed high. I jumped onto the local county appraisal website that shows similar results yours does and saw that my neighbor immediately next to me (same size house/lot/style) is assessed 13% lower than mine. How could this be?
I wrote a scraper to pull home data for all the homes in my “neighborhood code” (over 700!) and threw the data into a CSV file for importing as a spreadsheet. I sorted based on assessed values for and compared homes and plot size that were the same or larger than mine. I cherry picked about 15 homes that fit this and found they average just under 20% lower than my assessed value. I could pick more but I’m not sure how big of a dataset I really need to prove a point as the county gave me a list of 5 comps for homes sold “around” my assessed value. It seems 5 data points is good enough for them.
Haven’t had my appeal date set yet, but I hope it’s soon.
August 3rd, 2009 at 1:41 pm
Did you end up getting the appraisal you were aiming for? With your material, you could probably have done it.
I took an introductory geography class as an undergrad, and one of the topics we covered was home appraisals and how mapping can be very useful in building a case. Our book was ‘How to Lie with Maps’ (see the following review: http://www.mcwetboy.net/maproom/2006/05/review_how_to_l.php ). There is a whole chapter devoted to arguing for property appraisals. It’s also a pretty entertaining book overall.
I’m almost disappointed that you didn’t go through with the appeal… though I’m happy you feel pretty good about the outcome.
August 3rd, 2009 at 1:41 pm
I am guessing the fact that you showed up changed their minds.
August 3rd, 2009 at 1:55 pm
Awesome.
A long time ago I was a real estate agent. The MLS system didn’t have a way to query for expired listings, so I wrote a scraper that compared today’s listings to yesterday’s. That made me the only agent capable of calling all the expired listings to see if they’d like to try again.
Didn’t help me a bit. Eventually I figured out I should be writing software for a living instead of attempting to sell real estate.
August 3rd, 2009 at 1:57 pm
Next time give a try to http://scrapy.org/. It’s very easy and extensible and full of python power.
August 3rd, 2009 at 1:59 pm
using python? get the mechanize library.
August 3rd, 2009 at 2:04 pm
You should sell this service to others in your county.
August 3rd, 2009 at 2:18 pm
Any chance you’d post your Perl script? As a fellow Allegheny County citizen, I could see this being potentially useful in the future
August 3rd, 2009 at 2:56 pm
I have been doing the same thing since 2001. Only I live in Cook County Illinois, the second most populated county behind Orange County, California.
So my database has upwards of 1,000,000+ properties in it (residential, commercial, and industrial).
Funny thing…I took the experiment one step further. I fed every single house in the county through the algorithm, to see who else had a similar appeal case.
Lo and behold, 65% of the county was being overassessed.
So I had two choices:
1) Class action lawsuit on behalf of the county
2) Mass mailing. Gimme $25, I give you the appeal info.
I chose two, and have been mopping up ever since. If you don’t win the appeal (all public record) you get a refund.
August 3rd, 2009 at 3:05 pm
Good idea! I like it
August 3rd, 2009 at 4:14 pm
A less amusing fact is that the county legislature had frozen the assessment changes. The lawyers went ahead anyway.
I got a large hike in my assessment.
I should really do your level of research and appeal.
August 3rd, 2009 at 5:13 pm
Neat! I live in Allegheny County and used a similar tactic, except that I didn’t bother crawling the assessment site–I just looked around for some comparables on the market and dumped their info into a spreadsheet.. We challenged our 2002 assessment and got it knocked in half!
August 3rd, 2009 at 5:27 pm
Well done! I was expecting you to really give it to them though haha.
August 3rd, 2009 at 5:43 pm
Remember, the real-world hacker isn’t as interested in the outcome as he is in the success of the hack. You pulled off the data extraction/compilation/categorization from a system that did not want to surrender it easily. If you were actually able to use the data, well, that’s bonus points.
That said, well done! Well done!
August 3rd, 2009 at 6:07 pm
Whoa, lots of comments.
@name, no idea how much we saved. I don’t know what they wanted to raise our assessment to.
@francis, others,
I think I could probably get my assessment lowered with the data I have. I waffled on that a bit myself. Perhaps with a lawyers help, next year, I’ll pull the trigger on an appeal.
@python-people,
Alot of people have mentioned mechanize to me. I actually looked at mechanize but I (mistakenly) believed the magic numbers were coming from javascript which mechanize didn’t handle (according to their docs). The actual magic numbers were hidden fields and mechanize probably would have worked brilliantly.
@ilya,
Here’s the python code for half of it: http://pastebin.com/f7bf7a55b . You’ll need to get the parcel ids first though.
August 3rd, 2009 at 6:09 pm
@cook county,
That’s quite an interesting idea. My first thought is that shouldn’t appraisers/agents/lawyers who have access to the (real, professional) databases already be doing this?
I wonder if I could get the Allegheny assessment data entirely, without having to scrape, since it’s a matter of public record, and possibly build a better search engine for it.
August 3rd, 2009 at 6:35 pm
Could it be the clipboard effect – they saw you had documents and got worried that you had evidence that would knock down the assessment and so caved in order to avoid an embarassing loss (that could have set a precedent).
August 3rd, 2009 at 6:36 pm
I was once a frustrated programmer and studied law. It didn’t work out, so I’m back in IT. The best thing I learnt was to *Always Over-prepare*. For some magical reason the cards mostly fall in your favour. It is for that subliminal reason that
August 3rd, 2009 at 7:48 pm
Yes property taxes and Pennsylvania…
August 3rd, 2009 at 10:06 pm
Awesome Twice! For standing up the county, and for the iMacros link. Wish you’d have went for the reduction (great prep/technique!), but rarely a bad idea to quit while you’re ahead with the legal system.
Go Steelers.
August 3rd, 2009 at 10:29 pm
You really should have stated your case to the judge, although it may not have done much to reduce your assessment. I have dealt with similar cases here in South Florida and the appeals usually require you to present a property appraisal documenting the reduction in your home’s value in order to reduce your home’s tax assessed value. Property appraisals can be back dated in order to show the past value of a property and compare it to its present value; though it will cost you about $350 (depending on your location and property type, it can cost much more). Good luck with that man
August 3rd, 2009 at 10:45 pm
Scrape the web? Just go to http://www.zillow.com
August 4th, 2009 at 12:52 am
Please use the introduction as an abstract… Article is toooo long for someone sitting to read it in the morning just to find out that it sucks (I don’t know if it sucks because I don’t want to waste my time reading something that I don’t know what it is about from the introduction!)
August 4th, 2009 at 1:04 am
Hmmm… it sounds like the lawyers were the ones doing the “hacking” here… withdraw the claim if someone shows up and looks half sane. What a wonderful racket.
August 4th, 2009 at 1:38 am
Evidenced by the past, the Assessor’s office may not be able to reduce every property that is eligible for a reduction. Sometimes when they do reduce property taxes, it isn’t enough. The Property Tax Assessment Adjusters (ptaaonline.com) identifies properties that can be helped, and their complete service includes handling requests for review of assessed value, completing a detailed analysis on comparable home sales, as well as the formal appeals process, which includes attending and representing you at any hearings scheduled. Filing early is important because the Assessor’s office receives a high volume of requests and handles them as they come in. Customers who file early during the more informal review process are more likely to receive a revised tax bill before their Dec. 10 or Apr. 10 payments are due. Customers who file during the appeal period will most likely have to pay both installments and get a refund later, a process that can take up to 2 years. While the process is something that homeowners can do on their own for no fee, legitimate state licensed companies exist to save consumers time on research, paperwork and representation, and tend to be more successful than homeowners in their reductions due to a high level of experience and more accurate data. The service exists to get the best reduction for homeowners possible. In addition, PTAA is a member of the Better Business Bureau, meaning the company adheres to the morals and guidelines of running an ethical and legitimate business.
August 4th, 2009 at 4:38 am
In New Mexico they passed a law making, unlike all other 49 states, all assessment data private not public records. Then they started jacking up assessments like crazy. My property was increased 20 times in value in 1 year, despite the fact there had been not a single sale in my block in over 30 years because the properties were useless desert land. Despite their blocks, I was able to get access to assessment data and compare it against deeds in the courthouse. There were massive errors in the computer records, with a large number of sales figures off by a factor of 10, multi-parcel transfers incorrectly entered at full value for each individual parcel, 14,000 entered as 41,000, and many other similar problems. Very little data entered into their computer system was accurate at all. In addition, their assessment method itself was fraudulent, not being based on comparable sales, but using sales data from several miles away where there were actual roads and houses and electricity. There were many other fraudulent aspects to their entire assessment process.
The assessment board’s number of challenges went from 50 a year to 25,000 a year, and the assessment review board then refused to hear challenges, canceling hearings the day before when people had come in from out of state, and refusing to reschedule hearings. This is life in New Mexico, real estate fraud capitol of the United States. I still haven’t been able to resolve this, New Mexico is run by criminals.
You got off pretty easy by comparison.
August 4th, 2009 at 7:57 am
“And the judge wasn’t gonna look at the 27 8-by-10 colour glossy pictures with the circles and arrows and a paragraph on the back of each one explainin’ what each one was to be used as evidence against us.”
http://www.arlo.net/resources/lyrics/alices.shtml
August 4th, 2009 at 9:09 am
A great read, thanks!
August 4th, 2009 at 9:11 am
Hi,
a year ago I had to collect all German schools by scrapping 16 different website. I found Cobra toolkit (Java) to be the most useful and flexible for handling sessions and posting forms:
http://lobobrowser.org/cobra/java-html-parser.jsp
August 4th, 2009 at 9:17 am
You have created a fantastic side business – selling 2002 appraisal values. With a little publicity, and perhaps a little advertising, you should be able to sell “comparables” to every homeowner in Allegheny County who wants to appeal their assessment — and plenty of people do!!
August 4th, 2009 at 9:43 am
I’d just like to say “fuck you!”.
how dare you build me up like that for an utter smack down and then end the story by agreeing with the opposition.
Most disappointed I’ve been since Indiana Jones 4.
Shame on you.
August 4th, 2009 at 9:55 am
I do the same thing — scraping the county website using custom python scripts, and dump the data into MySQL for analysis. It produces good results; but the best data comes from MLS records of recent home sales in our area. A realestate agent can pull those records (since up-to-date data is apparently secret.) This year, the recent-sales data has shown the best case for arguing against the supposed 20% value-increase that the county is asking for.
For us, property taxes are the primary funding source for public schools. And the schools need it, so taxes keep going up. Unfortunately, it puts an undue burden on home owners and drives the cost of living up. It also means that you’re arguing two different things — the schools/city wants the money, and you want a a fair market estimate.
August 5th, 2009 at 6:59 pm
HAHA! Half-way through your intro, I was thinking, \Gee, this sounds a lot like where I live. My neighbors just went through something like this.\ Then you went down and gave the like to the Allegheny county website! Welcome to the twisted world of Pittsburgh property taxes.
August 13th, 2009 at 8:16 am
You should have fought at that time to have your assessment lowered so you actually could end up paying less in taxes.
October 7th, 2009 at 9:34 am
Excellent blog this lbrandy.com well done and I am really pleased to see : this it’s just what I needed to know.
It’s taken me literally 3 hours and 21 minutes of searching the web to find lbrandy.com (joke)
But seriously I am really interested in Agent normally and so I shall be very pleased to become a regular visitor
Best Regards
October 21st, 2009 at 4:26 pm
Great article you got here. It would be great to read more about that topic.
May 14th, 2010 at 11:55 pm
cuughhyh