Predicting and Mapping Urban Fire Incidents in San Francisco

A Machine Learning Approach to Building Fire Classification

Date: April 9, 2025

Executive Summary

This report presents a comprehensive analysis of building fires in San Francisco using machine learning techniques. By analyzing data from the San Francisco Fire Department's incident reports, we developed a classification model that can predict building fire categories with 97.51% accuracy. The analysis also includes geospatial visualizations that reveal clear patterns in the distribution of building fires across the city.

The project focused exclusively on actual building fires, excluding false alarms and non-building fire incidents. We identified 17 distinct building fire categories, with cooking fires being the most common type. Our Random Forest classification model successfully learned patterns from temporal and spatial features to predict these fire categories.

The insights from this analysis can help fire departments allocate resources more effectively, target prevention efforts in high-risk areas, and develop strategies tailored to specific types of building fires. The high accuracy of our model demonstrates the potential of machine learning in enhancing fire safety and emergency response planning.

Introduction

Urban fire incidents pose significant risks to public safety, property, and infrastructure in densely populated cities like San Francisco. Understanding the patterns and factors associated with building fires can help fire departments develop more effective prevention strategies and optimize resource allocation.

This project aims to leverage machine learning techniques to analyze building fire incidents in San Francisco, with the following objectives:

Identify patterns in building fire incidents across time and space
Develop a classification model to predict building fire categories
Create geospatial visualizations to understand the distribution of building fires
Generate insights and recommendations for fire prevention and resource allocation

By focusing exclusively on actual building fires (rather than false alarms or other incident types), this analysis provides targeted insights into the most critical fire safety challenges facing San Francisco.

Data Acquisition and Preparation

Data Source

The data for this analysis was obtained from the San Francisco Fire Department's incident reports, available through the SF Open Data Portal. The dataset contains detailed information about emergency calls responded to by the San Francisco Fire Department, including:

Incident details (date, time, location)
Response information (response time, units dispatched)
Incident classification (primary situation, action taken)
Geographic coordinates

Filtering to Building Fires

From the original dataset containing over 700,000 fire department responses, we filtered to include only building fires, excluding false alarms and non-building fire incidents. We identified 17 distinct building fire categories, including:

Building fire
Cooking fire, confined to container
Chimney or flue fire, confined to chimney or flue
Fuel burner/boiler malfunction, fire confined
Commercial compactor fire, confined to rubbish
And other building-related fire categories

This filtering process resulted in a dataset of 31,365 building fire incidents, representing approximately 8% of actual fires and 4.5% of all incidents in the original dataset.

Exploratory Data Analysis

Temporal Patterns

Our analysis revealed distinct temporal patterns in building fire incidents:

Time of Day: Building fires show clear patterns throughout the day, with peaks during meal preparation times. The highest frequency occurs in the evening hours (5-8 PM), likely corresponding to dinner preparation activities.

Day of Week: Building fires are relatively evenly distributed across days of the week, with a slight increase on weekends.

Monthly Patterns: There is some seasonal variation in building fire incidents, with higher frequencies during winter months (December-February) and lower frequencies in spring and summer.

Yearly Trends: The data shows relatively stable patterns across years, with no dramatic increases or decreases in building fire incidents.

Building Fire Categories

The distribution of building fire categories revealed that cooking fires are by far the most common type of building fire in San Francisco:

Cooking fire, confined to container: 21,952 incidents (70.0%)
Building fire: 8,271 incidents (26.4%)
Chimney or flue fire, confined to chimney or flue: 289 incidents (0.9%)
Other categories: 853 incidents (2.7%)

This distribution highlights the importance of kitchen safety in fire prevention efforts, as cooking-related fires account for more than two-thirds of all building fires in San Francisco.

Data Preprocessing

Handling Missing Values

The dataset contained missing values in several columns. We implemented the following strategy to address this issue:

Dropped columns with more than 50% missing values
For remaining numeric columns with missing values, imputed with the median
For categorical columns with missing values, imputed with the mode

Feature Engineering

We created several new features to enhance the predictive power of our model:

Temporal Features: Extracted year, month, day, day of week, and hour from the incident date and time
Spatial Features: Extracted latitude and longitude coordinates from the location data
Target Variable: Created a grouped target variable based on the primary situation field

Encoding Categorical Variables

We encoded the categorical target variable (building fire category) using label encoding, creating a mapping between numeric codes and the original categories for interpretability.

Classification Model Development

Model Selection

After evaluating several classification algorithms, we selected Random Forest as our final model due to its:

Strong performance on imbalanced datasets
Ability to handle non-linear relationships
Feature importance capabilities
Robustness to overfitting

Handling Class Imbalance

The dataset exhibited significant class imbalance, with cooking fires being much more common than other categories. To address this, we:

Verified that all classes had at least 5 instances (sufficient for model training)
Applied class weights during model training to give higher importance to underrepresented classes

Model Training

We split the data into training (75%) and testing (25%) sets, ensuring that the distribution of classes was preserved. The Random Forest model was trained with the following parameters:

Number of estimators: 100
Maximum depth: 10
Minimum samples split: 5
Minimum samples leaf: 2
Class weight: balanced

Model Evaluation

Overall Performance

The Random Forest classification model achieved impressive performance on the test set:

Accuracy: 97.51%
Macro-average F1 Score: 0.92
Weighted-average F1 Score: 0.97

These metrics indicate that the model can reliably predict building fire categories based on temporal and spatial features.

Feature Importance

The analysis of feature importance revealed that the most predictive features for building fire classification were:

Hour of day: This aligns with our exploratory analysis showing distinct patterns in fire incidents throughout the day
Latitude and longitude: Spatial location is strongly associated with fire types
Day of week: Different days show different patterns of fire incidents
Response time: The time it takes for fire units to arrive correlates with fire types

Performance by Class

The model performed exceptionally well on the most common categories (cooking fires and general building fires), with F1 scores above 0.95. Performance on less common categories was still strong but slightly lower, with F1 scores ranging from 0.85 to 0.92.

Geospatial Analysis

Spatial Distribution of Building Fires

Our geospatial analysis revealed that building fires are not evenly distributed across San Francisco. The heat maps show clear hot spots in several neighborhoods, including:

The Tenderloin
Mission District
South of Market (SoMa)
Bayview-Hunters Point

These areas have higher densities of building fires compared to other parts of the city, suggesting they should be prioritized for fire prevention efforts.

Spatial Patterns by Fire Type

Different types of building fires show distinct spatial patterns:

Cooking fires are concentrated in residential areas with high population density
General building fires show more dispersion across the city but with concentrations in older neighborhoods
Chimney fires are more common in neighborhoods with older housing stock

These patterns suggest that prevention strategies should be tailored to the specific risk profiles of different neighborhoods.

Key Findings and Insights

Cooking fires dominate: Over 70% of building fires in San Francisco are cooking fires confined to containers, highlighting the importance of kitchen safety education.
Temporal patterns are significant: Building fires show clear patterns by time of day, with peaks during meal preparation times, especially dinner hours.
Spatial clustering is evident: Building fires are not randomly distributed across the city but show clear hotspots in specific neighborhoods.
Predictive modeling is effective: Our Random Forest model achieved 97.51% accuracy in classifying building fire types based on temporal and spatial features.
Hour of day is the most predictive feature: The time of day is strongly associated with the type of building fire, reflecting daily activity patterns.

Recommendations

Based on our analysis, we recommend the following strategies for fire prevention and resource allocation:

For Fire Departments

Targeted Prevention Campaigns: Focus cooking safety education campaigns in neighborhoods with high densities of cooking fires.
Temporal Resource Allocation: Adjust staffing levels based on the temporal patterns identified, with increased resources during peak fire hours (5-8 PM).
Spatial Resource Allocation: Position resources strategically in high-risk neighborhoods identified in the heat maps.
Predictive Deployment: Use the classification model to predict likely fire types in different areas and times, enabling more effective resource deployment.

For Building Owners and Residents

Kitchen Safety: Implement and promote kitchen safety practices, especially during peak cooking hours.
Building Maintenance: Ensure regular maintenance of heating systems and chimneys, particularly in older buildings.
Fire Prevention Education: Participate in community fire prevention education programs, especially in high-risk neighborhoods.

For Policy Makers

Building Codes: Review and update building codes based on the patterns of fire incidents identified in this analysis.
Targeted Inspections: Implement more frequent fire safety inspections in high-risk neighborhoods.
Community Programs: Fund community-based fire prevention programs tailored to the specific needs of different neighborhoods.

Conclusion

This analysis demonstrates the power of machine learning in understanding and predicting building fire incidents in San Francisco. By focusing exclusively on actual building fires, we were able to identify clear patterns and develop a highly accurate classification model.

The insights from this analysis can help fire departments allocate resources more effectively, target prevention efforts in high-risk areas, and develop strategies tailored to specific types of building fires. The high accuracy of our model (97.51%) shows that temporal and spatial features can reliably predict building fire categories.

Future work could expand on this analysis by incorporating additional data sources, such as building age, construction type, and socioeconomic factors, to further enhance the predictive power of the model. Additionally, developing a real-time prediction system could help fire departments respond even more effectively to building fire incidents.

By leveraging data science and machine learning techniques, we can contribute to making San Francisco a safer city with more effective fire prevention and response strategies.

References

San Francisco Fire Department Incident Reports, SF Open Data Portal. Retrieved from https://data.sfgov.org/Public-Safety/Fire-Incidents/wr8u-xric
National Fire Protection Association. (2021) . NFPA reports - Fires in the U.S. Retrieved from https://www.nfpa.org/News-and-Research/Data-research-and-tools/US-Fire-Problem
Breiman, L. (2001) . Random forests. Machine Learning, 45(1), 5-32.
Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
Folium: Python Data. Leaflet.js Maps. (2022). Retrieved from https://python-visualization.github.io/folium/