HomeHarvest is a real estate scraping library that extracts and formats data in the style of MLS listings.
Not technical? Try out the web scraping tool on our site at tryhomeharvest.com.
Looking to build a data-focused software product? Book a call to work with us.
- Source: Fetches properties directly from Realtor.com.
- Data Format: Structures data to resemble MLS listings.
- Export Flexibility: Options to save as either CSV or Excel.
Video Guide for HomeHarvest - updated for release v0.3.4
pip install -U homeharvest
Python version >= 3.9 required
from homeharvest import scrape_property
from datetime import datetime
# Generate filename based on current timestamp
current_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"HomeHarvest_{current_timestamp}.csv"
properties = scrape_property(
location="San Diego, CA",
listing_type="sold", # or (for_sale, for_rent, pending)
property_type='single_family',
past_days=30, # sold in last 30 days - listed in last 30 days if (for_sale, for_rent)
# date_from="2023-05-01", # alternative to past_days
# date_to="2023-05-28",
# foreclosure=True
# mls_only=True, # only fetch MLS listings
)
print(f"Number of properties: {len(properties)}")
# Export to csv
properties.to_csv(filename, index=False)
print(properties.head())
>>> properties.head()
MLS MLS # Status Style ... COEDate LotSFApx PrcSqft Stories
0 SDCA 230018348 SOLD CONDOS ... 2023-10-03 290110 803 2
1 SDCA 230016614 SOLD TOWNHOMES ... 2023-10-03 None 838 3
2 SDCA 230016367 SOLD CONDOS ... 2023-10-03 30056 649 1
3 MRCA NDP2306335 SOLD SINGLE_FAMILY ... 2023-10-03 7519 661 2
4 SDCA 230014532 SOLD CONDOS ... 2023-10-03 None 752 1
[5 rows x 22 columns]
Required
βββ location (str): The address in various formats - this could be just a zip code, a full address, or city/state, etc.
βββ listing_type (option): Choose the type of listing.
- 'for_rent'
- 'for_sale'
- 'sold'
- 'pending' (for pending/contingent sales)
Optional
βββ property_type (list): Choose the type of properties.
- 'single_family'
- 'multi_family'
- 'condos'
- 'condo_townhome_rowhome_coop'
- 'condo_townhome'
- 'townhomes'
- 'duplex_triplex'
- 'farm'
- 'land'
- 'mobile'
βββ radius (decimal): Radius in miles to find comparable properties based on individual addresses.
β Example: 5.5 (fetches properties within a 5.5-mile radius if location is set to a specific address; otherwise, ignored)
β
βββ past_days (integer): Number of past days to filter properties. Utilizes 'last_sold_date' for 'sold' listing types, and 'list_date' for others (for_rent, for_sale).
β Example: 30 (fetches properties listed/sold in the last 30 days)
β
βββ date_from, date_to (string): Start and end dates to filter properties listed or sold, both dates are required.
| (use this to get properties in chunks as there's a 10k result limit)
β Format for both must be "YYYY-MM-DD".
β Example: "2023-05-01", "2023-05-15" (fetches properties listed/sold between these dates)
β
βββ mls_only (True/False): If set, fetches only MLS listings (mainly applicable to 'sold' listings)
β
βββ foreclosure (True/False): If set, fetches only foreclosures
β
βββ proxy (string): In format 'http://user:pass@host:port'
β
βββ extra_property_data (True/False): Increases requests by O(n). If set, this fetches additional property data for general searches (e.g. schools, tax appraisals etc.)
β
βββ exclude_pending (True/False): If set, excludes 'pending' properties from the 'for_sale' results unless listing_type is 'pending'
β
βββ limit (integer): Limit the number of properties to fetch. Max & default is 10000.
Property
βββ Basic Information:
β βββ property_url
β βββ property_id
β βββ listing_id
β βββ mls
β βββ mls_id
β βββ status
βββ Address Details:
β βββ street
β βββ unit
β βββ city
β βββ state
β βββ zip_code
βββ Property Description:
β βββ style
β βββ beds
β βββ full_baths
β βββ half_baths
β βββ sqft
β βββ year_built
β βββ stories
β βββ garage
β βββ lot_sqft
βββ Property Listing Details:
β βββ days_on_mls
β βββ list_price
β βββ list_price_min
β βββ list_price_max
β βββ list_date
β βββ pending_date
β βββ sold_price
β βββ last_sold_date
β βββ price_per_sqft
β βββ new_construction
β βββ hoa_fee
βββ Location Details:
β βββ latitude
β βββ longitude
β βββ nearby_schools
βββ Agent Info:
β βββ agent_id
β βββ agent_name
β βββ agent_email
β βββ agent_phone
βββ Broker Info:
β βββ broker_id
β βββ broker_name
βββ Builder Info:
β βββ builder_id
β βββ builder_name
βββ Office Info:
β βββ office_id
β βββ office_name
β βββ office_phones
β βββ office_email
The following exceptions may be raised when using HomeHarvest:
-
InvalidListingType
- valid options:for_sale
,for_rent
,sold
,pending
. -
InvalidDate
- date_from or date_to is not in the format YYYY-MM-DD. -
AuthenticationError
- Realtor.com token request failed.