Intro to Iggy Place Data
This page gives an overview of the model-ready data and features that Iggy provides. This is meant to accompany the Iggy Data Dictionary.
How Iggy thinks about location features
At Iggy, we think about location-related features in terms of boundaries, data sources, and aggregations. These three components form the core of our data model. Put most simply, each Iggy feature is the result of an aggregation applied to an underlying data source within a boundary.
Many data sets have location fields that link a row of data to a real place on Earth. Depending on the particular location field, that may be a relatively general place (e.g. a metro area or county) or a very specific place (e.g. a quadkey or address). Traditionally, some of the challenge in dealing with location data involves conversion from specific to general places. For example, a dataset may have a field for address. But the available economic data only comes at the county level. How to link from the address to the relevant county, in order to add features from the economic dataset?
We use the term boundary to describe the geographic area over which some data is aggregated. Iggy pre-aggregates features to boundary levels ranging from general (metro area) to specific (quadkey) so that users can pull data at exactly the level they need. For example, if your data set includes a zip code field, Iggy provides features that have been pre-aggregated at the zip code level like count of restaurants per capita within each zip.
Currently Iggy provides features pertaining to the following boundaries, from general to specific:
metro– Census Core Based Statistical Area, identified by CBSA FIPS
county– County, identified by 5-digit FIPS
locality– City, identified by ID from the Who's on First gazetteer
zipcode– Zip Code, identified by 5-digit zip code
census_tract– Census Tract, identified by 11-digit census tract GEOID
cbg– Census Block Group, identified by 12-digit census block group GEOID
qk_isochrone_walk_10m– 10-min Walk Isochrone, identified by zoom-19 quadkey identifier
The most fine-grained boundary type we currently offer is the 10-min walk isochrone, which is the boundary that encompasses the walkable area within 10 min of a zoom 19 quadkey (a map file with side length ~75m). By providing features aggregated at this fine-grained level, users with addresses or geographic coordinates can add hyper-local features to their models.
A data source describes the underlying geographic data that is aggregated within a boundary. Each data source has rows that represent points, lines, or polygons with geographic coordinates.
Many different types of data can be construed as geographic, such as local businesses, demographics, and topography. Our demo dataset incorporates features computed from the following data sources:
Points of Interest (
- Points of interest are businesses and services with a physical presence including restaurants, manufacturing sites, and community centers.
poifeatures are aggregated from an underlying dataset of points, each representing a distinct point of interest and categorized based on the Iggy Ontology.
American Community Survey (
- The U.S. Census ACS data includes information about demographics, household composition, employment, commute patterns, and housing. Iggy currently relies on ACS data collected over the 5-year period 2014-2019. The primary advantage of using multi-year estimates is the increased statistical reliability for less populated areas and small population subgroups.
- Only census-designated boundaries (
cbg) incorporate features from
acs, as these are the levels at which ACS data is reported and provided.
- Iggy produces features that summarize the coastline, rivers, and lakes within a boundary.
waterfeatures are aggregated from an underlying dataset that represents coastline as lines, and rivers and lakes as polygons.
- We also provide features calculated based on national, state, and local parks within a boundary.
- Our underlying
parkdata represents each park as a polygon.
Each data source also has one or more attributes describing each row that can be used to filter aggregations and derive more interesting features:
poi data attributes indicate the POI category, and whether it is a brand/chain:
- Ontology Top-level Category Attributes (see Iggy Ontology)
- Ontology Sub-level Category Attributes (see Iggy Ontology)
is_brandnameindicates whether POI is a brand or chain (e.g. McDonald’s, Dollar Store, Pep Boys)
acs data attributes indicate a particular Census summary statistic about the relevant boundary (
cbg). They cover a variety of types of information:
Includes attributes related to age (e.g.
median_age), gender (e.g.
pop_sex_female_age_5_to_9), race/ethnicity (e.g.
pop_race_asian), and birthplace/citizenship (e.g.
Includes attributes surrounding household composition (e.g.
households_cohabiting_couple), education (e.g.
pop_adult_education_less_than_high_school), and veteran status (e.g.
Includes attributes indicating income (e.g.
pop_below_100_pct_poverty_level), employment status (e.g.
pct_in_labor_force_status_civilian_employed), and employment industry (e.g.
Includes attributes indicating (pre-2020) commute habits, including method (e.g.
pop_commutes_by_public_transport_rail), time (
pop_commute_departure_0630_to_0659), and duration (
Includes attributes dealing with housing units type (e.g.
housing_units_boat_rv_van), age (
housing_units_built_1939_or_earlier), ownership status (
housing_units_renter_occupied), size (
housing_units_10_to_19_in_structure), and value (
water data attributes indicate the type of water body.
- Type of water body
protected_area data includes parks, conservation areas, and other protected areas as designated by USGS.
protected_area data attributes indicate the type of protected area:
- Type of protected area
Note that a protected area may have a value of 1 for more than one attribute. For example, a state park would have
The full set of underlying data sources and attributes is detailed in the Iggy Data Dictionary.
We use the Iggy Ontology to categorize places in the poi data source. The ontology consists of two levels, detailed below with examples:
- Drinking establishments where alcoholic beverages are served
- Examples: Missfits Tavern, Barcelona Bar, The American Legion Post 30
- Child day care services
- Examples: KinderCare, Lakeview Headstart, Trinity Lutheran Church And Preschool
- Convenience stores, drug stores, and pharmacies
- Examples: Mini Mart, Walmart Pharmacy, Kwik Trip
- Barber shops, beauty salons, nail salons, diet centers, and other personal care retail services
- Examples: Nails by Betty, Family Salon, Mat Su Tattoo & Body Piercing
- Services pertaining to death care including cemeteries, crematoriums, funeral homes, and funeral services
- Examples: Park Cemetery, Baird Funeral Home, Old Chapel Burial Ground
- Dry cleaning and laundry services
- Examples: Big Springs Laundry, Champs Cleaners, Sam's Custom Cleaners
- Gas and electric vehicle fueling stations and other petroleum products wholesalers
- Examples: Amoco, Red Bank Municipal Court Charting Station, Speedy Cafe
- Casinos and other gambling locations
- Examples: Debbie's Slots Lounge, Texas Poker Supply, Papa Ray's Sports Bar, Legendary Waters Resort *& Casino
- Locations for amusement and recreation, including arcades, amusement parks, equestrian, and bowling alleys
- Examples: West End Bowling & Arcade, ESCAPE Alaska, Silver Wind Stables
- Locations for golf courses and country clubs
- Examples: Belvedere Golf Club, Indian Hills Country Club, Creekside Mini Golf
- Grocery stores and supermarkets
- Examples: Pathmark, Country Grocery Store, Nick's Supermarket
- Gyms and fitness centers
- Examples: Orangetheory Fitness, Jazzercise, Yoga Nest Venice
- Services for managing hazardous waste, including septic tank related services, hazardous waste treatment, and disposal
- Examples: Carlisle Trash Collection, Heartland Dredging, Dixie Dumpsters
- Marinas and yacht clubs
- Examples: Sugartree Marina, Vermilion Yacht Club, Johnny's Marina & RV Park
- Businesses that provide sports instruction including martial arts
- Examples: Excel Taekwondo Academy, React Elite Cheer and Tumble, Goldfish Swim School
- Recycling collection and processing centers, and auto or metal salvage
- Examples: Green Recovery Recycling, Fulton Auto Salvage, Montgomery County Solid Waste District
- Full-service, limited-service, and self-service restaurants including cafeterias, buffets, and snack bars
- Examples: China King, Trent City Pizzeria, Heavenly Hot Dogs, Toby's Supper Club
- Businesses that provide transportation for land or water *based sightseeing
- Examples: Beasley's Fishing Charters, Old Town Charters, Private Yacht Charters Florida
- Ski resorts, ski lifts, and ski parks
- Examples: Snow Creek Ski Area, Tri Town Ski Village, Springhill Winter Park, Mogul Buster Ski & Snowboard School
- Stores providing specialized food, including bakeries, confectionary, fish and seafood markets, produce markets, meat markets, and other specialty food
- Examples: Sage Baking Company, Dorothy's Candies, Snow Creek Meat Processing, Seafood America, Fiesta Empanada
- Specialized retail including book stores and florists
- Examples: Swan's Fine Books, Lenora's Flowers and Gifts, Barnes & Noble
- Sports clubs and spectator sports venues, including racetracks
- Examples: Saratoga Woods Swim Club, Charles Watson Stadium, Atlanta Dragway
- General waste disposal services, not solely focused on hazardous waste or recycling
- Examples: Junk King Reno, Ron's Tree Removal, Bobcat Disposal
- Designated historical sites
- Examples: Iowa State Capitol, Swing Around Rosie Mural, Deborah Sampson Monument
- Museums and historical foundations
- Examples: Brigham City Museum, Chester Gould Dick Tracy Museum, Pasadena Fire Museum
- Zoos and botanical gardens
- Examples: Penguin Encounter, Wildlife Safari, Bear Canyon Ranch and Petting Zoo, The Estate at Florentine Gardens, BVA Compass Roof Garden
- Venues for performing arts, sports, and similar events
- Examples: Nat Bailey Stadium, Sadler Ranch, Austin City Limits Studio, Fillmore Auditorium
- Places of worship and religiously-affiliated charitable organizations
- Examples: Maranatha Romanian Baptist Church, Holy Transfiguration Monastery, The Soul Factory Inc, St Andrew's Church
- Race tracks and other spectator sports
- Examples: Atlanta Motor Speedway, Verizon Wireless Center, K1 Speed, Super Rink National Sports Center
- Public and private elementary, middle, and high schools
- Examples: Springdale High School, Pacelli Catholic High School, San Marino Montessori
- Colleges, junior colleges, universities, and professional schools
- Examples: Mesalands Community College, Temple University, Concorde Career Institute Tampa
- Specialty schools including music schools, cooking schools, art schools, language schools, and technical or trade schools
- Examples: French For the Future, Kutenai Art Therapy Institute, Gemini School of Visual Arts & Communication
- Fire Departments
- Examples: Bossier City Fire Department Station 3, Manilla Volunteer Fire Department
- Correctional institutions where people are incarcerated
- Examples: Kirkland Correctional Institution, Oceana County. Jail, Wicomico County Detention Center
- Police stations and other justice, public order, and safety activities
- Examples: Federal Bureau of Investigation, Carroll County Sheriff, Flash Point Investigations
- Dentists, prosthodontists, and orthodontists
- Examples: Optimal Dental, Henson Orthodontics
- General medical and surgical hospitals, and general physicians
- Examples: Hampstead Primary Care, Trumann Medical Clinic, Marian Medical Center
- Offices of mental health practitioners and physicians, and outpatient substance abuse centers
- Examples: The Counseling Palette, Lifecare Family Services, Cindy Goldsmith LCSW
- other specialized health practitioners
- Examples: Eye Health Services, Reflection Ridge Chiropractic, Healthcore Physical Therapy, Lakeridge Recovery Long Beach
- Urgent care centers
- Examples: Baptist Health Urgent Care, Get Well Urgent Care, PhysicianOne Urgent Care
- equipment, and other types
- Examples: Premier Aircraft, Kelly's Appliances, FirstBuild, Gas Electric Parts
- Electric power generation facilities, including hydroelectric, solar, and wind
- Examples: St Charles Solar, Hamakua Energy Plant, FirstEnergy Springdale Power Station
- Environmental, conservation, and wildlife preservation organizations
- Examples: Bluegrass Doberman Rescue, Landmarks Preservation Society of Southeast, Lake Tahoe Wildlife Care
- Other social advocacy organizations
- Examples: American Legion, Fraternal Order of Eagles
- Youth, individual, and family support services
- Examples: Big Brothers Big Sisters, SequelCare, YMCA, Boys & Girls Club of Newport Beach
- Parking lots and garages
- Examples: Zion Park, Memorial Plaza Garage, Galaxy Valet Services
- Commuter and freight rail stops, terminals, and railroad administration
- Examples: MTA New York City Transit Astoria Blvd, Corona West Metrolink Station, SEPTA Kingsessing Av & 65th St
- Bus stops, terminals, and administration
- Examples: New Jersey Transit SUNSET RD AT BENTLY LN, Alameda Contra Costa Transit District 14th St Filbert St, Nantucket Regional Transit Authority End of Madaket Road
- Inland water passenger transportation including ferries
- Examples: Protection Island Ferry, MBTA Commuter. Boat, Carnival Cruise Lines
- Airports supporting fixed-wing aircraft
- Examples: Lewis and Clark Airstrip, Nettle Creek Landings, Chicago O'Hare International Airport
- Helipads and heliports
- Examples: Charlevoix Area Hospital, High Alpine, La National Guard
- Air fields supporting dirigibles, hot air balloons, and ultralights
- Examples: Flying Machines Airstrip, Ron's Ultralight Fld, Portage Lake Muni
- Mixed mode and other urban transit stops and administration
- Examples: Sunset Empire Transportation District Thousand Trail, Tri Rail Sheridan Street Station, Baltimore Water Taxi Canton
Aggregations and Normalizations
Given a boundary (like a zip code) and a data source (like POIs), Iggy produces features by running an aggregation of the data intersecting the boundary. Aggregations range from simple (i.e. counts of items intersecting a boundary) to more complex spatial functions (i.e. square km in the intersection between a boundary and a data source like lakes).
In addition to aggregations, Iggy also provides features that have additional normalization calculated on top of the aggregation, like dividing by the boundary population or area.
The following is a list of the various aggregations and normalizations that are used to produce Iggy features.
[none]Features with no aggregation are generated by taking the raw value from the boundary itself, or from a boundary-linked data source like acs
countCount of distinct rows from the underlying data source that intersect a boundary. If the count feature is associated with a data attribute, then the count indicates the number of distinct rows having that particular attribute. For example, the feature poi_is_education_count indicates the number of distinct rows from the poi dataset having the attribute is_education=True
intersectsA boolean feature indicating whether the boundary intersects any row within the underlying data source
intersecting_area_in_sqkmA float feature indicating the total area (in sq km) of the intersection between a boundary and any row in the underlying polygon data source. This can only be computed for data sources whose rows are polygons, like park and water.
intersecting_length_in_sqkmA float feature indicating the total length (in km) of the intersection between a boundary and any row in the underlying line data source. This can only be computed for data sources whose rows are lines, like water where is_coastline=True.
per_sqkmDivides the aggregated feature value by the boundary area, in sq km
per_capitaDivides the aggregated feature value by the boundary population
The Data Dictionary provides a complete listing of the available Iggy features at each boundary level.
In general, features are named using the following convention:
For example, the feature
poi_is_museum_count_per_capita is calculated for a particular boundary by taking the data source poi, filtering for rows where
is_museum=True, applying the aggregation count within the boundary, and finally applying the
per_capita normalization to divide the count by the boundary population.
Some feature names deviate slightly from this convention in order to make them more interpretable. For example, the feature
lake_pct_area_intersecting_boundary is an easier way of expressing the feature generated from lake data source where attribute
is_lake=True, applying the
intersecting_area_in_sqkm aggregation, and the
per_sqkm normalization. The Data Dictionary is searchable by data source, attribute, aggregation, and normalization as well as feature name.