Abstract
I’ve always pondered, "How can we leverage historical data to make AI more useful?"" That question stuck with me for a long time. It wasn’t until taking Stanford’s TECH16 course that things finally clicked. I came to terms with how much I didn’t know about the large language model (LLM) space, and just as importantly, what makes these models genuinely useful.
In reality, we don’t use LLMs for what they know, we use them for what they can do. Their strength lies in their skills: reasoning, summarizing, synthesizing, and generating structure from ambiguity.
This post explores the paradigm shift introduced by Retrieval-Augmented Generation (RAG). Rather than relying on static, pre-trained data, RAG allows us to ground models in our own trusted sources (technical manuals, internal documents, regulatory standards) through the use of vector databases and semantic search. It bridges the gap between general AI capabilities and domain-specific expertise.
To see this in action, I built a small-scale RAG pipeline using a NIST fire test report. While simple, the experiment showed how even a single report can become a dynamic knowledge interface, capable of delivering precise, context-aware responses. It confirmed something powerful: AI becomes trustworthy not through the data it was trained on, but through the data you connect it to.
Understanding RAG
The diagram above, introduced during Stanford’s TECH16 course, captures the essence of Retrieval-Augmented Generation (RAG). Rather than relying solely on what a language model was trained on, RAG enables AI to draw from curated, up-to-date sources. Making responses far more relevant and reliable.
Here’s how it works:
- User Query: The user asks a question via the app interface.
- Retrieval Query: The app sends a retrieval query to a vector database or external data source.
- Retrieved Chunks: Relevant content is returned based on semantic similarity.
- LLM Input: The app forwards both the original query and the retrieved chunks to the large language model (LLM).
- Generated Answer: The LLM uses the provided context to generate a domain-specific, accurate response.
- Response Delivered: The app returns the answer to the user.
This architecture is what enables LLMs to become trustworthy tools, not because of what they were trained on, but because of how they can be connected to your own authoritative knowledge sources.
Understanding the NIST Technical Note 2102: Full-Scale Furniture Flammability Tests
Before diving into how we used Retrieval-Augmented Generation (RAG) to extract insights from this document, it’s important to understand the context and purpose of the report itself. NIST Technical Note 2102, titled “Full-Scale Room Burn Pattern and Fire Gas Analysis Tests Using Real Furnishings”, documents a series of controlled fire experiments conducted by the National Institute of Standards and Technology (NIST).
Background and Objective
The report presents data from full-scale room burn experiments designed to replicate realistic fire conditions using contemporary furnishings and configurations. These tests aim to support fire investigators and researchers in understanding fire development, burn patterns, and toxic byproduct generation in realistic room-scale fire scenarios.
This study moves beyond earlier limited-scale tests by incorporating upholstered furniture, synthetic materials, and multiple ignition points to simulate modern residential and storage-type fire environments.
Overview of Test Series
Six full-scale fire tests were conducted. Each test differed by ignition method, fuel package, ventilation, and room geometry. The data collected includes:
- Heat release rates (HRR)
- Temperature profiles and burn duration
- Fire gas composition (CO, CO₂, O₂)
- Visual and photographic burn pattern documentation
Focus Test: Test 27 – Four Boxes, Plastic Commodity
Test 27 involved a plastic commodity fire scenario that simulated a typical warehouse or storage configuration. Specifically, four cardboard boxes were filled with plastic cups and stacked in a 2x2 grid. The ignition source was applied at the base of one of the vertical stacks, simulating a plausible accidental ignition scenario in a storage facility.
The test setup aimed to investigate how plastic commodities contribute to rapid heat release, the formation of toxic byproducts, and observable burn patterns. Heat release rate data, toxic gas profiles, and visual flame development were all captured and analyzed in the study.
Why This Test Matters
Plastic commodities such as those used in Test 27 represent a significant fire risk due to their high fuel content and tendency to burn intensely. Insights from this test are useful for:
- Improving sprinkler and fire suppression design in warehouses
- Validating fire modeling tools like FDS
- Training investigators in identifying commodity-specific burn patterns
Next Step: Turning PDFs into Conversations
Given the technical density of the report, manually analyzing and comparing test results like those in Test 27 is time-consuming. To streamline this process, we used Retrieval-Augmented Generation (RAG) powered by LlamaIndex and OpenAI to convert the document into a searchable and conversational tool. This allows us to ask questions like:
- "What was the peak heat release rate of Test 27?"
- "What were the byproduct yields?"
- "Can you summarize the key findings in simpler terms?"
What’s exciting is that this entire pipeline was built in fewer than five lines of code. The barrier to entry is incredibly low, if you can upload a PDF and run a few cells in Colab, you can start building your own AI-powered tools. No need for complex infrastructure or advanced ML knowledge. This makes RAG not just powerful, but accessible.
Access my code here: https://colab.research.google.com/drive/1cqMuuo1aBgWqPKTbpXQd2M_-VTFeYAn3?usp=sharingConversational PDF QA with OpenAI and LlamaIndex
I'm always looking for ways to bring cutting-edge AI into real-world fire protection workflows. This guide shows how anyone, even without a deep tech background, can build a conversational PDF assistant in just a few lines of code.
Using OpenAI and LlamaIndex, I walk through how to upload a technical PDF, extract insights with semantic search, and ask follow-up questions that feel natural and contextual. While this version is siloed to a single report, the pipeline is designed to eventually scale across multiple documents and databases, making it a powerful foundation for future tools.
Why This Matters
Fire protection engineers frequently work with lengthy technical documents, test reports, modeling studies, standards, and manufacturer data sheets. Extracting meaningful answers from these sources is often manual and slow.
This notebook turns your PDF into a conversational interface. You can ask:
- What was the max heat release rate in Test 27?
- What were the byproduct yields?
- Explain that more simply.
Toolchain Overview
- OpenAI (GPT-3.5-Turbo) – the language model for answering questions
- LlamaIndex – converts PDFs into searchable, chunked content with embeddings
- ChatMemoryBuffer – maintains memory so follow-up questions have context
- Google Colab – free Jupyter environment to run everything interactively
Step-by-Step Breakdown
1. Install Dependencies
!pip install llama-index llama-index-llms-openai pypdf
2. Configure OpenAI
import os
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
os.environ["OPENAI_API_KEY"] = "sk-..." # Replace with your API key
Settings.llm = OpenAI(
model="gpt-3.5-turbo",
api_key=os.environ["OPENAI_API_KEY"]
)
3. Upload the PDF
from google.colab import files
uploaded = files.upload()
filename = next(iter(uploaded))
4. Load and Index the Document
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
documents = SimpleDirectoryReader(input_files=[filename]).load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
5. Ask Questions
response = query_engine.query("Summarize Test 27.")
print(response)
6. Add Memory
from llama_index.core.memory import ChatMemoryBuffer
memory = ChatMemoryBuffer(token_limit=2000)
chat_engine = index.as_chat_engine(chat_mode="context", memory=memory)
7. Ask Follow-ups with Context
response = chat_engine.chat("Summarize the key findings in Test 27")
print(response)
response = chat_engine.chat("Explain that in simpler terms.")
print(response)
Use Cases for Fire Protection Engineers
- Extract HRR, ignition, and yield data from NIST reports
- Summarize modeling assumptions from large design documents
- Validate code compliance across sections of NFPA and IBC
- Interact with test results without flipping through 100+ pages
Conclusion
To the curious engineer or student: I challenge you to explore this framework. Learn how to ask the right questions, and stay open to the creative potential of this technology. While enterprises are still figuring out how to scale these systems, there's tremendous value in applying them locally - to supercharge learning, enhance decision making, and rethink how we engage with technical knowledge. The technology will continue to evolve, but there’s no better time to begin harnessing it.