Nrcs soil exploration

Tutorial: Exploring Soil Data from the NRCS Web Soil Survey (WSS)

This notebook demonstrates how to download, extract, and visualize soil data from the Natural Resources Conservation Service (NRCS) Web Soil Survey (WSS). We will focus on the STATSGO2 dataset for South Dakota.

The Web Soil Survey (WSS) is a valuable resource for accessing soil data and information compiled by the National Cooperative Soil Survey.

Here are the links to the data we will be using:

Link to Data: https://websoilsurvey.sc.egov.usda.gov/DSD/Download/Cache/STATSGO2/wss_gsmsoil_SD_[2016-10-13].zip
Description of Data: https://www.nrcs.usda.gov/sites/default/files/2022-08/SSURGO-Metadata-Table-Column-Descriptions-Report.pdf

The data is provided as a zip file containing spatial data (shapefiles) and tabular data (text files).

Installing dependencies

We need to install the necessary Python libraries to work with spatial data and dataframes. We will use pandas for data manipulation, geopandas for working with geospatial data, and matplotlib for plotting.

# uncomment the following code
#!pip install pandas==2.2.2 geopandas==0.14.3 fiona==1.9.6 matplotlib==3.8.4 requests==2.32.3 seaborn==0.13.2

Requirement already satisfied: pandas==2.2.2 in /usr/local/lib/python3.11/dist-packages (2.2.2)
Requirement already satisfied: geopandas==0.14.3 in /usr/local/lib/python3.11/dist-packages (0.14.3)
Requirement already satisfied: fiona==1.9.6 in /usr/local/lib/python3.11/dist-packages (1.9.6)
Requirement already satisfied: matplotlib==3.8.4 in /usr/local/lib/python3.11/dist-packages (3.8.4)
Requirement already satisfied: requests==2.32.3 in /usr/local/lib/python3.11/dist-packages (2.32.3)
Requirement already satisfied: seaborn==0.13.2 in /usr/local/lib/python3.11/dist-packages (0.13.2)
Requirement already satisfied: numpy>=1.23.2 in /usr/local/lib/python3.11/dist-packages (from pandas==2.2.2) (2.0.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas==2.2.2) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas==2.2.2) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas==2.2.2) (2025.2)
Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from geopandas==0.14.3) (24.2)
Requirement already satisfied: pyproj>=3.3.0 in /usr/local/lib/python3.11/dist-packages (from geopandas==0.14.3) (3.7.1)
Requirement already satisfied: shapely>=1.8.0 in /usr/local/lib/python3.11/dist-packages (from geopandas==0.14.3) (2.1.1)
Requirement already satisfied: attrs>=19.2.0 in /usr/local/lib/python3.11/dist-packages (from fiona==1.9.6) (25.3.0)
Requirement already satisfied: certifi in /usr/local/lib/python3.11/dist-packages (from fiona==1.9.6) (2025.7.9)
Requirement already satisfied: click~=8.0 in /usr/local/lib/python3.11/dist-packages (from fiona==1.9.6) (8.2.1)
Requirement already satisfied: click-plugins>=1.0 in /usr/local/lib/python3.11/dist-packages (from fiona==1.9.6) (1.1.1.2)
Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.11/dist-packages (from fiona==1.9.6) (0.7.2)
Requirement already satisfied: six in /usr/local/lib/python3.11/dist-packages (from fiona==1.9.6) (1.17.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib==3.8.4) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib==3.8.4) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib==3.8.4) (4.58.5)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib==3.8.4) (1.4.8)
Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib==3.8.4) (11.2.1)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib==3.8.4) (3.2.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests==2.32.3) (3.4.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests==2.32.3) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests==2.32.3) (2.4.0)

Import packages

Now, let's import the necessary libraries into our notebook.

import os
import zipfile
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import requests
import numpy as np
import matplotlib.patches as mpatches
import seaborn as sns
from IPython.display import display
import matplotlib as mpl

Read in Data from WSS

We will now download the data from the Web Soil Survey.

Set the URL of the zip file and the desired local path for the downloaded file and the extraction directory.

url = "https://websoilsurvey.sc.egov.usda.gov/DSD/Download/Cache/STATSGO2/wss_gsmsoil_SD_[2016-10-13].zip"
zip_path = "wss_gsmsoil_SD.zip"
extract_dir = "soil_data_sd"

Display the paths to confirm they are set correctly.

print(f"Downloading from: {url}")
print(f"Local zip file: {zip_path}")
print(f"Extract directory: {extract_dir}")

Downloading from: https://websoilsurvey.sc.egov.usda.gov/DSD/Download/Cache/STATSGO2/wss_gsmsoil_SD_[2016-10-13].zip
Local zip file: wss_gsmsoil_SD.zip
Extract directory: soil_data_sd

Download the zip file from the specified URL.

print("Downloading zip file...")

response = requests.get(url, stream=True)

with open(zip_path, 'wb') as f:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)

print(f" Downloaded: {zip_path}")

Downloading zip file...
 Downloaded: wss_gsmsoil_SD.zip

Extract the contents of the downloaded zip file to the specified directory.

print(f"\nExtracting to: {extract_dir}")

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)
print("Extraction complete")

Extracting to: soil_data_sd
✅ Extraction complete

List the files within the extracted directory to see the dataset structure.

print("\nFiles in extracted directory:")
print("-" * 40)
for root, dirs, files in os.walk(extract_dir):
    for file in files:
        file_path = os.path.join(root, file)
        rel_path = os.path.relpath(file_path, extract_dir)
        print(f"  {rel_path}")

Files in extracted directory:
----------------------------------------
  wss_gsmsoil_SD_[2016-10-13]/soil_metadata_us.txt
  wss_gsmsoil_SD_[2016-10-13]/readme.txt
  wss_gsmsoil_SD_[2016-10-13]/soildb_US_2003.mdb
  wss_gsmsoil_SD_[2016-10-13]/soil_metadata_us.xml
  wss_gsmsoil_SD_[2016-10-13]/spatial/gsmsoilmu_a_sd.dbf
  wss_gsmsoil_SD_[2016-10-13]/spatial/version.txt
  wss_gsmsoil_SD_[2016-10-13]/spatial/gsmsoilmu_a_sd.shp
  wss_gsmsoil_SD_[2016-10-13]/spatial/gsmsoilmu_a_sd.prj
  wss_gsmsoil_SD_[2016-10-13]/spatial/gsmsoilmu_a_sd.shx
  wss_gsmsoil_SD_[2016-10-13]/tabular/chstrgrp.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/mstab.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/sdvattribute.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cfprod.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cpmat.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/ctxfmoth.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/msrsdet.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/csmoist.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chfrags.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cerosnac.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cstemp.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/sainterp.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/csmorhpp.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/version.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chaashto.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/muareao.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/csfrags.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/sdvfolder.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cpwndbrk.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/ltext.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/csmorgc.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/crstrcts.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chydcrit.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chtexmod.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/ceplants.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/mucrpyd.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/sacatlog.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/mstabcol.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chtextur.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chunifie.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/ctxmoicl.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chdsuffx.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chtexgrp.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/sdvfolderattribute.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/ctext.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cinterp.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chconsis.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/distlmd.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cecoclas.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/ccancov.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/comp.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chstr.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/muaggatt.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/msidxmas.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/csmormr.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/distimd.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/lareao.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/csmorss.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/msidxdet.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cfprodo.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/legend.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/msrsmas.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/ctreestm.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/ccrpyd.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cgeomord.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/msdomdet.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cpmatgrp.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chpores.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chorizon.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/msdommas.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cmonth.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/distmd.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/sdvalgorithm.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/ctxfmmin.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/chtext.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/mutext.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/mapunit.txt
  wss_gsmsoil_SD_[2016-10-13]/tabular/cdfeat.txt

To get a little background on the files and data we can view the readme file

# Umcomment the following code to see the readme
# readme_file_path = os.path.join(extract_dir, "wss_gsmsoil_SD_[2016-10-13]", "readme.txt")

# with open(readme_file_path, 'r') as f:
#     readme_content = f.read()
# print(readme_content)

********************************************************************************
****  Index
********************************************************************************

Export Contents
Export Types
      Area of Interest (AOI)
      SSURGO
      STATSGO2
Unzipping Your Export
Importing the Tabular Data into a SSURGO Template Database
      Why Import the Tabular Data into a SSURGO Template Database?
      Microsoft Access Version Considerations and Security Related Issues
            Trusted Locations
            Macro Settings
      Importing Tabular Data
Spatial Data
      Spatial Data Format and Coordinate System
      Utilizing Soil Spatial Data
Terminology
      Area of Interest (AOI)
      Soil Survey Area
      SSURGO Template Database
      SSURGO
      STATSGO2
Obtaining Help

********************************************************************************
****  Export Contents
********************************************************************************

This export includes the following U.S. General Soil Map data:

Spatial extent:         South Dakota
SSA symbol:             US
SSA name:               United States
SSA version:            3
SSA version est.:       10/13/2016 12:28:22 PM
Spatial format:         ESRI Shapefile
Coordinate system:      Geographic Coordinate System (WGS84)

This export also includes the following MS Access SSURGO template database:

Template DB name:       soildb_US_2003.mdb
Template DB version:    36
Template DB state:      US
MS Access version:      Access 2003


********************************************************************************
****  Export Types
********************************************************************************

Three types of data exports are available.  See the "Terminology" section for 
descriptions of "Area of Interest," "SSURGO," and "STATSGO2."

    ****************************************************************************
    ****  Area of Interest (AOI)
    ****************************************************************************

    The data in an Area of Interest export include the following for a 
    user-defined area of interest:

          Soil Tabular Data
          Map Unit Polygons (where available)
          Map Unit Lines (where available)
          Map Unit Points (where available)
          Special Feature Lines (where available)
          Special Feature Points (where available)
          Special Feature Descriptions (where available)
          Soil Thematic Map Data (where available)

    An AOI export can be downloaded from the "Your AOI (SSURGO)" section under 
    the "Download Soils Data" tab in the Web Soil Survey.

    The name of the export zip file will be in the form 
    wss_aoi_YYYY-MM-DD_HH-MM-SS.zip, e.g. wss_aoi_2012-09-24_12-59-37.zip.

    Note that the data for an Area of Interest is always SSURGO data.  
    Currently, the Web Soil Survey does not include an option to create an area 
    of interest for STATSGO2 data.

    ****************************************************************************
    ****  SSURGO
    ****************************************************************************

    The data in a SSURGO export include the following for a soil survey area:

          Soil Tabular Data
          Soil Survey Area Boundary Polygon
          Map Unit Polygons (where available)
          Map Unit Lines (where available)
          Map Unit Points (where available)
          Special Feature Lines (where available)
          Special Feature Points (where available)
          Special Feature Descriptions (where available)

    A SSURGO export can be downloaded from the "Soil Survey Area (SSURGO)" 
    section under the "Download Soils Data" tab in the Web Soil Survey.

    The export zip file will be named in the form soil_**###.zip, where ** is 
    a two character state or territory Federal Information Processing Standard 
    (FIPS) code, in uppercase, and ### is a three digit, zero-filled integer, 
    e.g. soil_NE079.zip.

    ****************************************************************************
    ****  STATSGO2
    ****************************************************************************

    The data included in a STATSGO2 export include the following:

          Soil Tabular Data
          Map Unit Polygons

    A STATSGO2 export can be downloaded from the "U.S. General Soil Map 
    (STATSGO2)" section under the "Download Soils Data" tab in the Web Soil 
    Survey.

    If data for the entire STATSGO2 coverage is downloaded, the export zip file 
    will be named gmsoil_us.zip.

    If data for only a single state is downloaded, the export zip file will be 
    named in the form gmsoil_**.zip, where ** is a two character state FIPS 
    code in lowercase, e.g. gmsoil_ne.zip.

********************************************************************************
****  Unzipping Your Export
********************************************************************************

See the "Terminology" section for descriptions of a "Soil Survey Area" and a 
"SSURGO template database."

Each data export (see "Export Types" section) is provided in a single zip file.  
The file unzips to a set of directories and files.  The following example is 
typical (a copy of soil_metadata_*.txt and soil_metadata_*.xml will be present 
for each SSURGO soil survey area included in an export):

      spatial (a directory)
      tabular (a directory)
      thematic (a directory, only present in AOI exports)
      readme.txt (an instance of this document)
      soil_metadata_*.txt
      soil_metadata_*.xml
      soildb_*.mdb (a conditionally present SSURGO template database)

The spatial data files for your export, if any, will reside in a directory 
named "spatial."

The tabular data files for your export will reside in a directory named 
"tabular."

The thematic map data files for your export, if any, will reside in a directory 
named "thematic."

The readme.txt file is an instance of the file you are currently reading.  
Except for the "Export Contents" section, this file is identical for all 
exports.

The soil_metadata_*.txt file contains Federal Geographic Data Committee (FGDC) 
metadata (http://www.fgdc.gov/metadata) for each soil survey area, state, or 
country for which data was included in your export.  The file is in text 
(ASCII) format.  The "*" will be replaced by a soil survey area symbol 
(e.g. "ne079"), "us," or a two character U.S. state FIPS code in lowercase.  
Your export may include more than one of these files.

The soil_metadata_*.xml file contains Federal Geographic Data Committee (FGDC) 
metadata (http://www.fgdc.gov/metadata) for each soil survey area, state, or 
country for which data was included in your export.  The file is in XML format. 
The "*" will be replaced by either a soil survey area symbol (e.g. "ne079"), 
"us," or a two character U.S. state FIPS code in lowercase.  Your export may 
include more than one of these files.

The soildb_*.mdb file is an instance of a SSURGO template database.  The "*" 
will be replaced by either "US" or a two character state or territory FIPS code 
in uppercase.  This file will be present unless you specifically requested an 
export that doesn't include a SSURGO template database.

********************************************************************************
****  Importing the Tabular Data into a SSURGO Template Database
********************************************************************************

See the "Terminology" section for a description of "SSURGO template database."

    ****************************************************************************
    ****  Why Import the Tabular Data into a SSURGO Template Database?
    ****************************************************************************

    The tabular data is exported as a set of files in "ASCII delimited" format. 
    These ASCII delimited files do not include column headers.  Typically, it 
    is not feasible to work with the tabular data in this format.  Instead, you 
    should import the data from these files into the accompanying SSURGO 
    template database.

    Importing the data into a SSURGO template database establishes the proper 
    relationships between the various soil survey data entities.  It also 
    provides access to a number of prewritten reports that display related data 
    in a meaningful way and gives you the option to create your own queries and 
    reports.  Creating queries and reports requires additional knowledge by the 
    user.

    ****************************************************************************
    ****  Microsoft Access Version Considerations and Security Related Issues
    ****************************************************************************

    All SSURGO template databases are in Microsoft Access 2002/2003 format.  
    Although this doesn't prevent you from opening them in Access 2007 or 
    Access 2010, the default security settings for Access 2007 or 2010 may 
    initially prevent the macros in the template database from working.  If you 
    get a security warning when you open a SSURGO template database, e.g. a 
    warning that the database is read-only, you may need to change your 
    Microsoft Access security settings.

    To check and/or adjust your Microsoft Access security settings in Access 
    2007 or 2010, start Access, click the Office Button at the top left of the 
    Access window, and then click the button labeled "Access Options" at the 
    bottom of the form.  From the "Access Options" dialog, select 
    "Trust Center" from the options to the left, and then select the button 
    labeled "Trust Center Settings" to the right.

    After selecting "Trusted Center Settings," you can address a security issue 
    two different ways.  From the left side of the Trust Center dialog, select 
    "Trusted Locations" or "Macro Settings."

        ************************************************************************
        ****  Trusted Locations
        ************************************************************************

        You can move the SSURGO template database to an existing trusted 
        location (if a trusted location has already been created), or you can 
        add a new trusted location and move the SSURGO template database to 
        that new trusted location.

        ************************************************************************
        ****  Macro Settings
        ************************************************************************

        Selecting "Enable All Macros" will allow the macros in the SSURGO 
        template database to run, but not without hazard.  Note the associated 
        warning: "not recommended; potentially dangerous code can run."  The 
        SSURGO template database does not contain hazardous code, but other 
        databases might.

        If you have trouble using the SSURGO template database, see the 
        "Obtaining Help" section for information on how to contact the Soils 
        Hotline.

    ****************************************************************************
    ****  Importing Tabular Data
    ****************************************************************************

    When you open a SSURGO template database, the Import Form should display 
    automatically if there are no Microsoft Access security related issues.  
    To import the soil tabular data into the SSURGO template database, enter 
    the location of the "tabular" directory into the blank in the Import Form.  
    Use the fully qualified pathname to the "tabular" directory that you 
    unzipped from your export file.

    For example, if your export file was named wss_aoi_2012-09-24_12-59-37.zip 
    and you unzipped the file to C:\soildata\, the fully qualified pathname 
    would be C:\soildata\wss_aoi_2012-09-24_12-59-37\tabular.

    The pathname between C:\soildata\ and \tabular varies by export type.  It 
    also varies:

          for Area of Interest exports by export date and time,
          for SSURGO exports by the selected soil survey area, and
          for STATSGO2 exports by your selection of data for the entire U.S. or
          for a single state.

    After entering the fully qualified pathname, click the "OK" button.  The 
    import process will start.  The duration of the import process depends on 
    the amount of data being imported.  Most imports take less than 5 minutes, 
    and many take less than 1 minute.  The import for STATSGO2 data for the 
    entire United States takes longer.

    Once the import process completes, the Soil Reports Form should display.

********************************************************************************
****  Spatial Data
********************************************************************************

    ****************************************************************************
    ****  Spatial Data Format and Coordinate System
    ****************************************************************************

    All spatial data is provided in ESRI Shapefile format in WGS84 geographic 
    coordinates.

    ****************************************************************************
    ****  Utilizing Soil Spatial Data
    ****************************************************************************

    Utilizing soil spatial data without having access to Geographic Information 
    System (GIS) software is effectively impossible.  Even if you have access 
    to GIS software, relating the soil spatial data to the corresponding soil 
    tabular data can be complicated.

    For people who have access to supported versions of ESRI's ArcGIS software, 
    we provide a Windows client application that is capable of creating soil 
    thematic maps using ArcMap and the Windows client application.  The name of 
    the application is "Soil Data Viewer."  For additional information see 
    http://www.nrcs.usda.gov/wps/portal/nrcs/detailfull/soils/home/?cid=nrcs142p2_053620.

    An AOI export may contain thematic map data from Web Soil Survey. Each 
    thematic map (soil property or interpretation) that was created for the AOI 
    generates a set of files in the "thematic" directory of the export. An 
    experienced GIS user can join a ratings file from the export with the
    mapunits in the spatial data to reproduce the colored thematic map.

********************************************************************************
****  Terminology
********************************************************************************

    ****************************************************************************
    ****  Area of Interest (AOI)
    ****************************************************************************

    In the Web Soil Survey, you can create an ad hoc "area of interest" by 
    using the navigation map and its associated tools.  You can pan and zoom to 
    a desired geographic location and then use the AOI drawing tools to 
    manually select an "area of interest."  An "area of interest" must be a 
    single polygon and the maximum area of that polygon (measured in acres) is 
    limited.

    ****************************************************************************
    ****  Soil Survey Area
    ****************************************************************************

    The SSURGO soil data for the U.S. and its territories are broken up into 
    over 3,000 soil survey areas.  A soil survey area commonly coincides with a 
    single county but may coincide with all or part of multiple counties and 
    may span more than one state.

    A soil survey area is identified by a "survey area symbol."  The symbol is 
    a two character state or territory FIPS code combined with a zero-filled, 
    three digit number.  For example, "NE079" is the survey area symbol for 
    Hall County, Nebraska.

    Although the STATSGO2 soil data is not partitioned into soil survey areas, 
    STATSGO2 soil data can be downloaded for a particular state or territory.

    ****************************************************************************
    ****  SSURGO Template Database
    ****************************************************************************

    A SSURGO template database is a Microsoft Access database in which the 
    tables and columns conform to the current SSURGO standard.  Exported soil 
    tabular data can be imported into a SSURGO template database.

    A SSURGO template database includes a number of prewritten reports that 
    display related data in a meaningful way.  You also have the option of 
    creating your own queries and reports in the database.  Creating queries 
    and reports requires additional knowledge.

    In addition to the national SSURGO template database, many state-specific 
    SSURGO template databases are available.  They typically include additional 
    state-specific reports.

    Whenever you export data from the Web Soil Survey, the most appropriate 
    SSURGO template database is automatically included.

    ****************************************************************************
    ****  SSURGO
    ****************************************************************************

    The SSURGO standard encompasses both tabular and spatial data.  SSURGO 
    spatial data duplicates the original soil survey maps.  This level of 
    mapping is designed for use by landowners and townships and for 
    county-based natural resource planning and management.  The original 
    mapping scales generally ranged from 1:12,000 to 1:63,360.  The original 
    maps from soil survey manuscripts were recompiled to scales of 1:12,000 or 
    1:24,000 for digitizing into the SSURGO format.  SSURGO is the most 
    detailed level of soil mapping published by the National Cooperative Soil 
    Survey.

    ****************************************************************************
    ****  STATSGO2
    ****************************************************************************

    The U.S. General Soil Map consists of general soil association units.  It 
    was developed by the National Cooperative Soil Survey and supersedes the 
    State Soil Geographic (STATSGO) dataset published in 1994.  STATSGO2 was 
    released in July 2006 and differs from the original STATSGO in that 
    individual state legends were merged into a single national legend, 
    line-join issues at state boundaries were resolved, and some attribute 
    updates and area updates were made.  STATSGO2 consists of a broad-based 
    inventory of soils and nonsoil areas that occur in a repeatable pattern on 
    the landscape and that can be cartographically shown at the scale used for 
    mapping (1:250,000 in the continental U.S., Hawaii, Puerto Rico, and the 
    Virgin Islands and 1:1,000,000 in Alaska).

    The same tabular data model is used by both SSURGO and STATSGO2.  STATSGO2, 
    however, does not include soil interpretations.  The "cointerp" table in 
    STATSGO2 will therefore always be empty.

********************************************************************************
****  Obtaining Help
********************************************************************************

To learn about the capabilities of a SSURGO template database, open the 
database, select the Microsoft Access "Reports" tab, and then double click the 
report titled "How to Understand and Use this Database."

If you require additional assistance, or have any questions whatsoever, please 
contact the Soils Hotline (soilshotline@lin.usda.gov).

Find shapefiles in the extracted directory

We are interested in the spatial data, which is typically stored in shapefiles (.shp). Let's find the shapefiles within the extracted data.

You can run the below code to find the shapefiles (.shp) or you can look throught the outputed information above for files ending in (.shp)

shapefiles = []
for root, dirs, files in os.walk(extract_dir):
    for file in files:
        if file.endswith('.shp'):
            shapefiles.append(os.path.join(root, file))
print(f"Found {len(shapefiles)} shapefiles:")
for shp in shapefiles:
    print(f"  - {os.path.relpath(shp, extract_dir)}")

Found 1 shapefiles:
  - wss_gsmsoil_SD_[2016-10-13]/spatial/gsmsoilmu_a_sd.shp

Load the soil shapefile

The main soil data is usually contained in a shapefile with a name indicating the spatial extent (e.g., gsmsoilmu_a_sd.shp for South Dakota). We will load this shapefile into a GeoDataFrame using geopandas.

main_shp = os.path.join(extract_dir, "wss_gsmsoil_SD_[2016-10-13]", "spatial", "gsmsoilmu_a_sd.shp")
gdf = gpd.read_file(main_shp)

Examine the Data

Let's take a closer look at the data we loaded into the GeoDataFrame.

Display the first few rows of the GeoDataFrame to get a quick preview of the data.

pd.set_option('display.max_columns', None)  #Show all columns
display(gdf.head(5))  # Show only the first 5 rows

	AREASYMBOL	SPATIALVER	MUSYM	MUKEY	geometry
0	US	3	s6871	672359	POLYGON ((-96.75075 43.50416, -96.75365 43.504...
1	US	3	s6793	672281	POLYGON ((-103.49977 43.32849, -103.50432 43.3...
2	US	3	s6792	672280	POLYGON ((-102.95403 44.33908, -102.95738 44.3...
3	US	3	s6792	672280	POLYGON ((-102.69247 44.36311, -102.69413 44.3...
4	US	3	s6861	672349	POLYGON ((-97.37897 42.86556, -97.38019 42.868...

We can look throught this description of the data: https://www.nrcs.usda.gov/sites/default/files/2022-08/SSURGO-Metadata-Table-Column-Descriptions-Report.pdf to find what each column means

Visualize Soil Data

Let's create a basic map of the soil map units (MUSYM) in South Dakota.

# Create the map
soil_column = "MUSYM"
fig, ax = plt.subplots(1, 1, figsize=(16, 12))

# Get unique soil types and create color map
unique_soils = gdf[soil_column].unique()
colors = plt.cm.Set3(np.linspace(0, 1, len(unique_soils)))
color_dict = dict(zip(unique_soils, colors))

# Plot soil polygons
gdf.plot(ax=ax, legend=False,
            color=[color_dict.get(x, 'gray') for x in gdf[soil_column]],
            edgecolor='black', linewidth=0.3)

# Customize the map
ax.set_title('STATSGO2 Soil Map Units - South Dakota',
            fontsize=18, fontweight='bold', pad=20)
ax.set_xlabel('Longitude', fontsize=12)
ax.set_ylabel('Latitude', fontsize=12)
ax.grid(True, alpha=0.3)

# Add legend (top 25 soil types)
legend_elements = [mpatches.Patch(color=color_dict[soil], label=soil)
                    for soil in sorted(unique_soils)[:25]]
ax.legend(handles=legend_elements, loc='upper right',
            bbox_to_anchor=(1.15, 1), fontsize=8, title=f'{soil_column}')

plt.tight_layout()
plt.show()

png

Loading and Examining Tabular Data

In addition to the spatial data, the WSS provides tabular data with detailed soil properties. We'll load the muaggatt.txt file, which contains aggregated attribute data for the map units.

Since this file does not have a header row, we need to provide the column names manually. These names are based on the data description link provided earlier in the notebook.

# Define the column names for muaggatt.csv (no headers in file)
columns = [
    "musym", "muname", "mustatus", "slopegraddcp", "slopegradwta", "brockdepmin",
    "wtdepannmin", "wtdepaprjunmin", "flodfreqdcd", "flodfreqmax", "pondfreqprs",
    "aws025wta", "aws050wta", "aws0100wta", "aws0150wta", "drclassdcd", "drclasswettest",
    "hydgrpdcd", "iccdcd", "iccdcdpct", "niccdcd", "niccdcdpct", "engdwobdcd",
    "engdwbdcd", "engdwbll", "engdwbml", "engstafdcd", "engstafll", "engstafml",
    "engsldcd", "engsldcp", "englrsdcd", "engcmssdcd", "engcmssmp", "urbrecptdcd",
    "urbrecptwta", "forpehrtdcp", "hydclprs", "awmmfpwwta", "mukey"
]

# Load the muaggatt.csv file (update the path if needed)
muaggatt_path = r"soil_data_sd/wss_gsmsoil_SD_[2016-10-13]/tabular/muaggatt.txt"  # or .txt if that's your file extension
df = pd.read_csv(muaggatt_path, sep="|", engine="python", header=None, names=columns, dtype=str)

Now that the data is loaded, let's inspect the first few rows and check the column names to ensure everything loaded correctly.

# Show the first 5 rows of the DataFrame with all columns
print("First 5 rows of df with all columns:")
pd.set_option('display.max_columns', None)  #Show all columns
display(df.head(5))  # Show only the first 5 rows

First 5 rows of df with all columns:

	musym	muname	mustatus	slopegraddcp	slopegradwta	brockdepmin	wtdepannmin	wtdepaprjunmin	flodfreqdcd	flodfreqmax	aws025wta	aws050wta	aws0100wta	aws0150wta	drclassdcd	drclasswettest	hydgrpdcd	iccdcd	iccdcdpct	niccdcd	niccdcdpct	engdwobdcd	engdwbdcd	engdwbll	engdwbml	engstafdcd	engstafll	engstafml	engsldcd	engsldcp	englrsdcd	engcmssdcd	engcmssmp	urbrecptdcd	urbrecptwta	forpehrtdcp	hydclprs	awmmfpwwta	mukey
0	s8369	Water (s8369)	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	100	NaN	100	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	657964
1	s4003	Pring-Assinniboine-Archin (s4003)	NaN	5	7.5	NaN	NaN	NaN	NaN	NaN	3.83	7.21	12.48	16.89	Well drained	NaN	B	NaN	46	4	41	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	664040
2	s4360	Neldore-Marvan-Bascovy (s4360)	NaN	4	7.4	38	NaN	NaN	NaN	NaN	3.76	7.01	10.61	13.28	Well drained	Well drained	D	NaN	60	6	42	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	664397
3	s4653	Winler-Orella-Epsie (s4653)	NaN	17	18.4	38	NaN	NaN	NaN	NaN	2.64	4.66	5.33	5.69	Well drained	Well drained	D	NaN	98	6	54	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	664690
4	s4721	Peever-Overly-Nutley-Fargo-Dovray (s4721)	NaN	2	1.7	NaN	153	153	NaN	NaN	4.43	8.44	16.15	23.39	Well drained	Moderately well drained	C	NaN	100	2	82	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	36	NaN	665014

We can see from the output that the muaggatt.txt file has been loaded into a pandas DataFrame and the columns are named according to the list we provided. The mukey column is the unique identifier for each map unit and will be used to link this tabular data to the spatial data.

Before merging the tabular data with the spatial GeoDataFrame, let's standardize the column names in both DataFrames by converting them to lowercase and removing leading/trailing whitespace. This helps prevent issues during the merge operation.

Now, let's merge the muaggatt DataFrame (df) with the soil_gdf GeoDataFrame (gdf) based on the common mukey column. We will use a left merge to keep all the spatial features and add the corresponding soil attributes.

After merging, we will display the first few rows of the merged GeoDataFrame to see the added columns.

df.columns = df.columns.str.lower().str.strip()
gdf.columns = gdf.columns.str.lower().str.strip()
print("muaggatt columns before standardizing:", df.columns.tolist())
print("geojson columns before standardizing:", gdf.columns.tolist())

gdf = gdf.merge(df, on="mukey", how="left")

muaggatt columns before standardizing: ['musym', 'muname', 'mustatus', 'slopegraddcp', 'slopegradwta', 'brockdepmin', 'wtdepannmin', 'wtdepaprjunmin', 'flodfreqdcd', 'flodfreqmax', 'pondfreqprs', 'aws025wta', 'aws050wta', 'aws0100wta', 'aws0150wta', 'drclassdcd', 'drclasswettest', 'hydgrpdcd', 'iccdcd', 'iccdcdpct', 'niccdcd', 'niccdcdpct', 'engdwobdcd', 'engdwbdcd', 'engdwbll', 'engdwbml', 'engstafdcd', 'engstafll', 'engstafml', 'engsldcd', 'engsldcp', 'englrsdcd', 'engcmssdcd', 'engcmssmp', 'urbrecptdcd', 'urbrecptwta', 'forpehrtdcp', 'hydclprs', 'awmmfpwwta', 'mukey']
geojson columns before standardizing: ['areasymbol', 'spatialver', 'musym', 'mukey', 'geometry']

pd.set_option('display.max_columns', None)  #Show all columns
display(gdf.head(5))  # Show only the first 5 rows
print("geojson columns before standardizing:", gdf.columns.tolist())

	areasymbol	spatialver	musym_x	mukey	geometry	musym_y	muname	mustatus	slopegraddcp	slopegradwta	brockdepmin	wtdepannmin	wtdepaprjunmin	flodfreqdcd	flodfreqmax	aws025wta	aws050wta	aws0100wta	aws0150wta	drclassdcd	drclasswettest	hydgrpdcd	iccdcd	iccdcdpct	niccdcd	niccdcdpct	engdwobdcd	engdwbdcd	engdwbll	engdwbml	engstafdcd	engstafll	engstafml	engsldcd	engsldcp	englrsdcd	engcmssdcd	engcmssmp	urbrecptdcd	urbrecptwta	forpehrtdcp	hydclprs	awmmfpwwta
0	US	3	s6871	672359	POLYGON ((-96.75075 43.50416, -96.75365 43.504...	s6871	Graceville-Dempster (s6871)	NaN	1	1.5	NaN	NaN	NaN	NaN	NaN	5.21	10.22	19.29	26.29	Well drained	Well drained	B	NaN	96	2	50	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	8	NaN
1	US	3	s6793	672281	POLYGON ((-103.49977 43.32849, -103.50432 43.3...	s6793	Tilford-Nevee (s6793)	NaN	5	14	NaN	NaN	NaN	NaN	NaN	4.6	8.67	16.17	23.26	Well drained	Well drained	B	NaN	100	3	36	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1	NaN
2	US	3	s6792	672280	POLYGON ((-102.95403 44.33908, -102.95738 44.3...	s6792	Satanta-Pierre-Nunn (s6792)	NaN	4	6.8	NaN	NaN	NaN	NaN	NaN	4.16	7.94	14.91	21.24	Well drained	Well drained	C	NaN	81	3	61	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN
3	US	3	s6792	672280	POLYGON ((-102.69247 44.36311, -102.69413 44.3...	s6792	Satanta-Pierre-Nunn (s6792)	NaN	4	6.8	NaN	NaN	NaN	NaN	NaN	4.16	7.94	14.91	21.24	Well drained	Well drained	C	NaN	81	3	61	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN
4	US	3	s6861	672349	POLYGON ((-97.37897 42.86556, -97.38019 42.868...	s6861	Wentworth-Ethan-Egan (s6861)	NaN	4	3.6	NaN	NaN	NaN	NaN	NaN	5.01	9.74	19.04	28.13	Well drained	Well drained	B	NaN	92	2	36	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	13	NaN

geojson columns before standardizing: ['areasymbol', 'spatialver', 'musym_x', 'mukey', 'geometry', 'musym_y', 'muname', 'mustatus', 'slopegraddcp', 'slopegradwta', 'brockdepmin', 'wtdepannmin', 'wtdepaprjunmin', 'flodfreqdcd', 'flodfreqmax', 'pondfreqprs', 'aws025wta', 'aws050wta', 'aws0100wta', 'aws0150wta', 'drclassdcd', 'drclasswettest', 'hydgrpdcd', 'iccdcd', 'iccdcdpct', 'niccdcd', 'niccdcdpct', 'engdwobdcd', 'engdwbdcd', 'engdwbll', 'engdwbml', 'engstafdcd', 'engstafll', 'engstafml', 'engsldcd', 'engsldcp', 'englrsdcd', 'engcmssdcd', 'engcmssmp', 'urbrecptdcd', 'urbrecptwta', 'forpehrtdcp', 'hydclprs', 'awmmfpwwta']

As you can see in the output above, the GeoDataFrame now includes the attribute columns from the muaggatt.txt file, appended to the original spatial data. We can now use these attributes for mapping and analysis.

Visualizing Soil Attributes

With the merged data, we can now create maps that visualize specific soil attributes across South Dakota. Let's create a map showing the available water storage at 0-150 cm depth (aws0150wta). This attribute is important for understanding soil moisture availability for plants.

soil_column = 'aws0150wta'
gdf_overlay = gdf[~gdf[soil_column].isna()].copy()

# Bin the data into 25 ranges (optional, for discrete bins)
num_bins = 25
gdf_overlay['binned'] = pd.cut(
    gdf_overlay[soil_column].astype(float),
    bins=num_bins,
    include_lowest=True
)

# Set up colormap and normalization
cmap = plt.cm.viridis
norm = mpl.colors.Normalize(
    vmin=gdf_overlay[soil_column].astype(float).min(),
    vmax=gdf_overlay[soil_column].astype(float).max()
)

# Plot base map
fig, ax = plt.subplots(1, 1, figsize=(12, 8))
gdf.boundary.plot(ax=ax, color='black', linewidth=0.5)

# Plot overlay (use the original column for a smooth colorbar)
gdf_overlay.plot(
    column=soil_column,
    ax=ax,
    legend=False,  # We'll add a custom colorbar
    cmap=cmap,
    edgecolor='pink'
)

# Create ScalarMappable and add colorbar
sm = mpl.cm.ScalarMappable(cmap=cmap, norm=norm)
sm._A = []  # Required for older matplotlib versions

cbar = fig.colorbar(sm, ax=ax, fraction=0.03, pad=0.04)
cbar.set_label(soil_column)

ax.set_title(f'South Dakota Map with {soil_column} Overlay')
ax.set_axis_off()
plt.show()

png

Last update: 2025-09-24