Abstract

I’ve always pondered, "How can we leverage historical data to make AI more useful?"" That question stuck with me for a long time. It wasn’t until taking Stanford’s TECH16 course that things finally clicked. I came to terms with how much I didn’t know about the large language model (LLM) space, and just as importantly, what makes these models genuinely useful.

In reality, we don’t use LLMs for what they know, we use them for what they can do. Their strength lies in their skills: reasoning, summarizing, synthesizing, and generating structure from ambiguity.

This post explores the paradigm shift introduced by Retrieval-Augmented Generation (RAG). Rather than relying on static, pre-trained data, RAG allows us to ground models in our own trusted sources (technical manuals, internal documents, regulatory standards) through the use of vector databases and semantic search. It bridges the gap between general AI capabilities and domain-specific expertise.

To see this in action, I built a small-scale RAG pipeline using a NIST fire test report. While simple, the experiment showed how even a single report can become a dynamic knowledge interface, capable of delivering precise, context-aware responses. It confirmed something powerful: AI becomes trustworthy not through the data it was trained on, but through the data you connect it to.

Understanding RAG

The diagram above, introduced during Stanford’s TECH16 course, captures the essence of Retrieval-Augmented Generation (RAG). Rather than relying solely on what a language model was trained on, RAG enables AI to draw from curated, up-to-date sources. Making responses far more relevant and reliable.

Here’s how it works:

  1. User Query: The user asks a question via the app interface.
  2. Retrieval Query: The app sends a retrieval query to a vector database or external data source.
  3. Retrieved Chunks: Relevant content is returned based on semantic similarity.
  4. LLM Input: The app forwards both the original query and the retrieved chunks to the large language model (LLM).
  5. Generated Answer: The LLM uses the provided context to generate a domain-specific, accurate response.
  6. Response Delivered: The app returns the answer to the user.

This architecture is what enables LLMs to become trustworthy tools, not because of what they were trained on, but because of how they can be connected to your own authoritative knowledge sources.

Understanding the NIST Technical Note 2102: Full-Scale Furniture Flammability Tests

Before diving into how we used Retrieval-Augmented Generation (RAG) to extract insights from this document, it’s important to understand the context and purpose of the report itself. NIST Technical Note 2102, titled “Full-Scale Room Burn Pattern and Fire Gas Analysis Tests Using Real Furnishings”, documents a series of controlled fire experiments conducted by the National Institute of Standards and Technology (NIST).

Background and Objective

The report presents data from full-scale room burn experiments designed to replicate realistic fire conditions using contemporary furnishings and configurations. These tests aim to support fire investigators and researchers in understanding fire development, burn patterns, and toxic byproduct generation in realistic room-scale fire scenarios.

This study moves beyond earlier limited-scale tests by incorporating upholstered furniture, synthetic materials, and multiple ignition points to simulate modern residential and storage-type fire environments.

Overview of Test Series

Six full-scale fire tests were conducted. Each test differed by ignition method, fuel package, ventilation, and room geometry. The data collected includes:

  • Heat release rates (HRR)
  • Temperature profiles and burn duration
  • Fire gas composition (CO, CO₂, O₂)
  • Visual and photographic burn pattern documentation

Focus Test: Test 27 – Four Boxes, Plastic Commodity

Test 27 involved a plastic commodity fire scenario that simulated a typical warehouse or storage configuration. Specifically, four cardboard boxes were filled with plastic cups and stacked in a 2x2 grid. The ignition source was applied at the base of one of the vertical stacks, simulating a plausible accidental ignition scenario in a storage facility.

The test setup aimed to investigate how plastic commodities contribute to rapid heat release, the formation of toxic byproducts, and observable burn patterns. Heat release rate data, toxic gas profiles, and visual flame development were all captured and analyzed in the study.

Why This Test Matters

Plastic commodities such as those used in Test 27 represent a significant fire risk due to their high fuel content and tendency to burn intensely. Insights from this test are useful for:

  • Improving sprinkler and fire suppression design in warehouses
  • Validating fire modeling tools like FDS
  • Training investigators in identifying commodity-specific burn patterns

Next Step: Turning PDFs into Conversations

Given the technical density of the report, manually analyzing and comparing test results like those in Test 27 is time-consuming. To streamline this process, we used Retrieval-Augmented Generation (RAG) powered by LlamaIndex and OpenAI to convert the document into a searchable and conversational tool. This allows us to ask questions like:

  • "What was the peak heat release rate of Test 27?"
  • "What were the byproduct yields?"
  • "Can you summarize the key findings in simpler terms?"

What’s exciting is that this entire pipeline was built in fewer than five lines of code. The barrier to entry is incredibly low, if you can upload a PDF and run a few cells in Colab, you can start building your own AI-powered tools. No need for complex infrastructure or advanced ML knowledge. This makes RAG not just powerful, but accessible.

Access my code here: https://colab.research.google.com/drive/1cqMuuo1aBgWqPKTbpXQd2M_-VTFeYAn3?usp=sharing

Conversational PDF QA with OpenAI and LlamaIndex

I'm always looking for ways to bring cutting-edge AI into real-world fire protection workflows. This guide shows how anyone, even without a deep tech background, can build a conversational PDF assistant in just a few lines of code.

Using OpenAI and LlamaIndex, I walk through how to upload a technical PDF, extract insights with semantic search, and ask follow-up questions that feel natural and contextual. While this version is siloed to a single report, the pipeline is designed to eventually scale across multiple documents and databases, making it a powerful foundation for future tools.

Why This Matters

Fire protection engineers frequently work with lengthy technical documents, test reports, modeling studies, standards, and manufacturer data sheets. Extracting meaningful answers from these sources is often manual and slow.

This notebook turns your PDF into a conversational interface. You can ask:

  • What was the max heat release rate in Test 27?
  • What were the byproduct yields?
  • Explain that more simply.

Toolchain Overview

  • OpenAI (GPT-3.5-Turbo) – the language model for answering questions
  • LlamaIndex – converts PDFs into searchable, chunked content with embeddings
  • ChatMemoryBuffer – maintains memory so follow-up questions have context
  • Google Colab – free Jupyter environment to run everything interactively

Step-by-Step Breakdown

1. Install Dependencies

!pip install llama-index llama-index-llms-openai pypdf

2. Configure OpenAI

import os
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

os.environ["OPENAI_API_KEY"] = "sk-..."  # Replace with your API key

Settings.llm = OpenAI(
    model="gpt-3.5-turbo",
    api_key=os.environ["OPENAI_API_KEY"]
)

3. Upload the PDF

from google.colab import files
uploaded = files.upload()
filename = next(iter(uploaded))

4. Load and Index the Document

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

documents = SimpleDirectoryReader(input_files=[filename]).load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

5. Ask Questions

response = query_engine.query("Summarize Test 27.")
print(response)

6. Add Memory

from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer(token_limit=2000)
chat_engine = index.as_chat_engine(chat_mode="context", memory=memory)

7. Ask Follow-ups with Context

response = chat_engine.chat("Summarize the key findings in Test 27")
print(response)

response = chat_engine.chat("Explain that in simpler terms.")
print(response)

Use Cases for Fire Protection Engineers

  • Extract HRR, ignition, and yield data from NIST reports
  • Summarize modeling assumptions from large design documents
  • Validate code compliance across sections of NFPA and IBC
  • Interact with test results without flipping through 100+ pages

Conclusion

To the curious engineer or student: I challenge you to explore this framework. Learn how to ask the right questions, and stay open to the creative potential of this technology. While enterprises are still figuring out how to scale these systems, there's tremendous value in applying them locally - to supercharge learning, enhance decision making, and rethink how we engage with technical knowledge. The technology will continue to evolve, but there’s no better time to begin harnessing it.