Archive of Massachusetts ENvironmental Data

The Archive of Massachusetts Environmental Data

MA MS4 Municipal Stormwater Annual Reports

Data source

Municipal Separate Storm Sewer Systems (MS4s) are networks of pipes, ditches, and drains — owned by cities and towns — that collect stormwater runoff and discharge it directly to rivers, ponds, and coastal waters without treatment. Unlike combined sewers, MS4s are designed to carry only stormwater, but they remain a major pathway for pollutants including phosphorus, bacteria, metals, and road salt into Massachusetts waterways.

Under Section 402 of the Clean Water Act, operators of MS4s must obtain NPDES stormwater permits and implement a Stormwater Management Program (SWMP) covering six Minimum Control Measures (MCMs):

# MCM What permittees must do
1 Public Education & Outreach Distribute educational materials on stormwater impacts
2 Public Participation Involve the public in SWMP development and implementation
3 Illicit Discharge Detection & Elimination (IDDE) Map outfalls, screen for non-stormwater flows, eliminate illicit connections
4 Construction Site Runoff Control Inspect active construction sites and enforce erosion controls
5 Post-Construction Stormwater Management Require and inspect stormwater BMPs for new development
6 Pollution Prevention / Good Housekeeping Inspect and maintain municipal facilities and catch basin infrastructure

Approximately 316 Massachusetts municipalities and institutions operate under EPA Region 1’s Massachusetts Small MS4 General Permit. Each permittee submits an annual report to EPA documenting their SWMP activities. Reports are publicly available on the EPA Region 1 MA MS4 community page.

AMEND has archived 24 April 2026 and indexed 1787 annual report PDFs. The current permit cycle (Permit Years 1–7, FY2019–FY2025) is covered.

AI extraction methodology

MS4 annual reports are semi-structured government forms submitted as PDFs, with no machine-readable structured data source. AMEND uses the Google Gemini 2.5 Flash AI model with forced function calling to extract a standardized schema from each report.

Pipeline:

  1. Index scraping — EPA’s HTML listing page is scraped weekly for new report PDFs (same approach as the NPDES permits dataset).
  2. PDF archive — Each PDF is downloaded and archived on the AMEND backend for permanent public access.
  3. Portfolio detection — Some municipalities submit PDF portfolios (embedded-file containers) that cannot be directly read. These are detected and extracted for processing. Approximately 25–40% of recent-year reports (FY2024–FY2025) appear to use this format.
  4. Structured extraction — Each readable PDF is uploaded to the Gemini Files API and queried with a schema enforced via function calling. AMEND records source page references for every section, enabling manual verification of what data was extracted from which PDF page.
  5. Confidence rating — The AI model assigns high, medium, or low confidence based on completeness and document quality.

Known limitations:

  • Cumulative vs. period counts: The permit requires some counts (illicit discharges found/eliminated) to be cumulative since permit start; others are period-only. The mcm3_count_type field records which interpretation applies.
  • TMDL scope: Some municipalities list only the TMDLs applicable to their specific waterbodies (tmdl_municipality_specific = True); others reproduce the general permit’s full statewide TMDL list (False). Only municipality-specific entries are analytically meaningful for compliance tracking.
  • MCM6 catch basins vs. facilities: Many municipalities report catch basin inspection counts under MCM6; the mcm6_notes field clarifies what the count refers to.
  • Non-traditional MS4s: Universities and state agencies (permit prefix MAR042) operate under the same general permit but with different physical infrastructure. Their reports are included but MCM counts may not be comparable to municipal permittees.

Data currency

This data is indexed from the EPA Region 1 MA MS4 community page, last updated on 24 April 2026. AMEND checks weekly and will automatically incorporate new reports when EPA posts them.

Download

  • MS4 report index — one row per discovered PDF, with EPA URL, GCS archive URL, municipality, and year
  • MS4 extracted data — one row per extracted report, with all MCM fields, TMDL waterbodies (JSON), and traceability fields

Extraction failures

11 reports could not be extracted successfully and are excluded from the dataset. These are logged automatically each time the pipeline runs.

File Municipality Year Reason
brockton-ma-ar20.pdf City of Brockton 2020.0 Multiple MCM sections report 0 activities/inspections due to staffing issues and delays in adopting stormwater ordinance. Specifically, MCM1 (Public Education) reports 0 messages, contradicting text a
chatham_ma_ar22.pdf Town of Chatham nan MCM1 Public Education activities count not found.; MCM2 Public Participation activities count not found.; MCM3 Illicit discharges found/eliminated data not fully found.; MCM3 outfalls_total, outfalls_
nahant_ma_ar25.pdf nan nan Extraction failed: Gemini returned empty content (finish_reason=FinishReason.MALFORMED_FUNCTION_CALL). Possible safety block or recitation filter.
norton-ma-ar19.pdf nan nan Extraction failed: Gemini returned empty content (finish_reason=FinishReason.MALFORMED_FUNCTION_CALL). Possible safety block or recitation filter.
revere_ma_ar24.pdf nan nan Extraction failed: Gemini returned empty content (finish_reason=FinishReason.MALFORMED_FUNCTION_CALL). Possible safety block or recitation filter.
saugus_ma_ar21.pdf nan nan Extraction failed: Gemini returned empty content (finish_reason=FinishReason.MALFORMED_FUNCTION_CALL). Possible safety block or recitation filter.
wellesley-ma-ar19.pdf nan nan Extraction failed: Gemini returned empty content (finish_reason=FinishReason.MALFORMED_FUNCTION_CALL). Possible safety block or recitation filter.
west_boylston_ma_ar25.pdf nan nan Extraction failed: Gemini returned empty content (finish_reason=FinishReason.MALFORMED_FUNCTION_CALL). Possible safety block or recitation filter.
yarmouth_ma_ar22.pdf nan nan Extraction failed: Gemini returned empty content (finish_reason=FinishReason.MALFORMED_FUNCTION_CALL). Possible safety block or recitation filter.
dcr_ma_ar25.pdf nan nan Extraction failed: Gemini returned empty content (finish_reason=FinishReason.MALFORMED_FUNCTION_CALL). Possible safety block or recitation filter.
ma_army_national_guard_ma_ar21.pdf Camp Edwards 2021.0 The SWMP was not finalized at the time of the Year 3 Annual Report and the revocation process for the permit has been delayed due to challenges of COVID-19 and a backlog at the US Army Corps of Engine

Municipality count breakdown

The 316 municipalities in the report index differ from the 276 used in the analysis. Each filter stage is documented below.

Stage Reports Municipalities Notes
Report index (scraped) 1,787 316 Raw scraped EPA listing; un-normalized municipality names; all permittee types
Extracted CSV 1,787 440 Unique raw municipality strings from AI extraction; higher count due to name variants (“Town of X”, abbreviations)
DB after normalization 1,787 309 After stripping “Town of”/”City of” prefix and uppercasing municipality names
After MAR042 filter 1,634 270 Removes 39 institutional permittees: UMass campuses, military installations (Hanscom AFB, Camp Edwards, Fort Devens, USCG Cape Cod), DCAMM, and community colleges
After low-confidence filter 1,624 276 Removes 11 reports (3 municipalities) with extraction confidence = low; these are listed in the Extraction failures section above

The 276-municipality figure is what the analysis uses. The 40 excluded institutional permittees (permit prefix MAR042) are included in the raw download files but excluded from all municipal comparisons and dashboard charts.

Sample extracted data

The table below shows extracted records from the dataset. Click Source PDF to view the original EPA PDF; click GCS Archive for the AMEND archive copy.

Click on the table headers to re-sort by that field.

Municipality Year Permit # MCM1 Activities MCM2 Activities MCM3 Outfalls MCM3 Illicit Found MCM4 Sites MCM5 Sites MCM6 Facilities Confidence Source PDF GCS Archive
Town of Hatfield 2026.0 MAG580015 1.0     0.0 0.0   112.0 high PDF GCS
Town of Longmeadow 2025.0 MAR041013 9.0 4.0   0.0 3.0   8.0 medium PDF GCS
Town of Lincoln 2025.0 MAR041043 8.0 12.0 39.0 0.0 40.0   443.0 high PDF GCS
Town of Lexington, MA 2025.0 MAR041042 10.0 12.0 314.0 0.0 112.0 0.0 4538.0 high PDF GCS
City of Lawrence 2025.0 MAR041201 9.0 1.0   5.0 4.0 0.0 1200.0 high PDF GCS
Lanesborough 2025.0 MAR041012 5.0 3.0   0.0 0.0   182.0 high PDF GCS
Kingston 2025.0 MAR041041 17.0 4.0 83.0 1.0 22.0 0.0 508.0 high PDF GCS
Town of Ipswich 2025.0 MAR041199 23.0 1.0   0.0 0.0   200.0 high PDF GCS
Town of Hudson 2025.0 MAR041198 7.0 1.0   1.0 106.0   2847.0 medium PDF GCS
Town of Hopkinton 2025.0 MAR041124 11.0 7.0   1.0 144.0 0.0 2899.0 high PDF GCS