Build Data Driven Apps With Streamlit

In this blog post Build Data Driven Apps With Streamlit That Users Love To Use we will explore how to turn data and Python into interactive apps that your team can use immediately. Whether you are a data scientist, engineer, or technical leader, Streamlit makes it simple to go from prototype to a reliable internal tool.

What Streamlit is and why it matters

Streamlit is a Python framework that lets you build data-driven web apps with minimal boilerplate. You write a regular script, add a few UI widgets, and Streamlit handles the layout, state, and live updates. No JavaScript, no complex front-end builds—just Python.

Streamlit shines for analytics apps, model demos, and internal dashboards where speed of delivery and close proximity to data/ML code matter. Instead of handing over a static report, you can let users filter, experiment, and decide faster.

The technology behind Streamlit

Under the hood, Streamlit runs your Python script as a reactive app. Every user interaction (like moving a slider or choosing a filter) triggers a lightweight re-run of the script. Widgets store values and st.session_state preserves state across runs. The front-end is a modern React app that communicates with the Python backend over a WebSocket. Heavy work lives in Python, close to your data and libraries such as pandas, NumPy, scikit-learn, and SQLAlchemy.

Two caching layers keep apps fast and cost-efficient:

st.cache_data memoizes the results of data computations (e.g., SQL queries, pandas transforms) with automatic invalidation on code or input changes.
st.cache_resource persists expensive resources (e.g., database engines, model objects) across reruns and sessions.

This reactive model removes most of the plumbing typical in web frameworks, letting you focus on the data logic while still delivering a polished UI.

When to use Streamlit

Internal analytics tools and self-serve insights
ML model prototypes, demos, and monitoring dashboards
Data exploration interfaces for operations teams
Event-driven or batch data reviews, not high-traffic consumer apps

If you need fine-grained routing, complex multi-tenant auth, or extreme scale, consider pairing Streamlit with a gateway and standard auth provider—or moving to a full web framework once requirements outgrow the prototype.

Set up your environment

Install Python 3.9+ and create a virtual environment.
pip install streamlit pandas sqlalchemy psycopg2-binary altair (adjust drivers for your database).
Run streamlit hello to verify the installation.

Build a minimal data app

The example below loads sales data from a database, caches queries, and provides interactive filters, KPIs, and charts.

import os
import pandas as pd
import streamlit as st
from sqlalchemy import create_engine, text

st.set_page_config(page_title="Sales Explorer", page_icon="📈", layout="wide")
st.title("📈 Sales Explorer")

# 1) Configure secrets and connections
DB_URL = st.secrets.get("DB_URL", os.getenv("DB_URL", "postgresql+psycopg2://user:pass@host:5432/db"))

@st.cache_resource
def get_engine():
    return create_engine(DB_URL, pool_pre_ping=True)

@st.cache_data(ttl=600)
def load_data(start_date, end_date, region):
    # Parameterized SQL to avoid injection
    with get_engine().connect() as conn:
        query = text(
            """
            SELECT order_date, region, product, revenue, quantity
            FROM sales
            WHERE order_date BETWEEN :start AND :end
              AND (:region = 'All' OR region = :region)
            """
        )
        df = pd.read_sql(query, conn, params={"start": start_date, "end": end_date, "region": region})
    return df

# 2) Sidebar filters
st.sidebar.header("Filters")
start_date = st.sidebar.date_input("Start date")
end_date = st.sidebar.date_input("End date")
region = st.sidebar.selectbox("Region", ["All", "APAC", "EMEA", "NA", "LATAM"], index=0)

if start_date > end_date:
    st.sidebar.error("Start date must be before end date")
    st.stop()

# 3) Load data with caching
with st.spinner("Loading data..."):
    df = load_data(str(start_date), str(end_date), region)

if df.empty:
    st.warning("No data for the selected filters.")
    st.stop()

# 4) KPIs
c1, c2, c3 = st.columns(3)
with c1:
    st.metric("Total Revenue", f"${df['revenue'].sum():,.0f}")
with c2:
    st.metric("Total Quantity", f"{df['quantity'].sum():,.0f}")
with c3:
    st.metric("Avg Order Value", f"${(df['revenue'].sum() / max(len(df),1)) :,.2f}")

# 5) Chart
import altair as alt
chart = (
    alt.Chart(df)
    .mark_bar()
    .encode(
        x=alt.X("product:N", sort='-y'),
        y=alt.Y("sum(revenue):Q", title="Revenue"),
        color="region:N",
        tooltip=["product", "region", alt.Tooltip("sum(revenue)", title="Revenue", format=",")]
    )
    .properties(height=400)
)

st.subheader("Revenue by product")
st.altair_chart(chart, use_container_width=True)

# 6) Details table with download
st.subheader("Order details")
st.dataframe(df, use_container_width=True)

st.download_button(
    "Download CSV",
    data=df.to_csv(index=False).encode('utf-8'),
    file_name="sales.csv",
    mime="text/csv",
)

# 7) Session state example
if "favorites" not in st.session_state:
    st.session_state.favorites = set()

product_pick = st.selectbox("Add a product to favorites", sorted(df["product"].unique()))
if st.button("Add to favorites"):
    st.session_state.favorites.add(product_pick)
    st.success(f"Added {product_pick}")

if st.session_state.favorites:
    st.info("Favorites: " + ", ".join(sorted(st.session_state.favorites)))

Key patterns for robust apps

Secrets and configuration

Use st.secrets for credentials and config; never hardcode secrets in code or repos.
Parameterize SQL to prevent injection, as shown above.

State and navigation

Use st.session_state for user-specific state (filters, selections, temporary data).
Structure multi-page apps by placing scripts under a pages/ directory. Streamlit builds the navigation automatically.

Performance

Cache queries and expensive transforms with st.cache_data and set sensible ttl values.
Cache connections and models with st.cache_resource to avoid reinitialization.
Prune data early with WHERE clauses, column selection, and aggregation in SQL.
Defer heavy work until needed using st.expander and conditional blocks.

UI and UX

Use the sidebar for global filters and the main area for content.
Group related content with st.tabs and st.columns.
Provide clear empty states and helpful error messages using st.warning and st.error.
Add downloads with st.download_button for users who need to export data.

Connecting to real data sources

Relational databases: PostgreSQL, MySQL, SQL Server via SQLAlchemy drivers.
Data warehouses: Use official drivers (e.g., Snowflake, BigQuery) and cache results.
Object storage: Read from S3, GCS, or Azure Blob with signed URLs and pandas readers.
APIs: Use requests/httpx. Cache responses to reduce rate limits and latency.

Testing and quality

Extract data transformations into pure functions and unit test them with pytest.
Mock database calls in tests; integration tests can run against a small seeded database.
Lint with ruff or flake8 and format with black to keep scripts clean and readable.

Deployment options

Streamlit Community Cloud

Connect a Git repo, set secrets, and deploy—ideal for prototypes and small teams.

Containers and cloud runtimes

Package with Docker and run on AWS ECS/Fargate, EKS, Azure App Service/Container Apps, or GCP Cloud Run.
Expose port 8501, set health checks, and enable autoscaling based on CPU/memory.
Use sticky sessions if your load balancer supports it to reduce session churn; otherwise keep session state minimal.

Security

Terminate TLS at a reverse proxy (e.g., Nginx, ALB) and restrict access with an identity provider (Azure AD, Okta, Auth0) using the proxy’s auth features.
Scope database credentials to least privilege; rotate secrets and use managed secret stores.

Observability and operations

Log structured events for load times, cache hits, and query durations.
Emit metrics to your monitoring stack and set alerts for error rates and latency.
Profile hotspots locally with cProfile or py-spy and optimize queries or caching.

Advanced features worth exploring

Real-time feel: Periodically refresh sections with st.empty() and loops, or use shorter cache TTLs for fresh data.
Editable data: st.data_editor to collect user edits, with validation before persisting to the database.
Custom components: If needed, integrate specialized JS visualizations via Streamlit Components without rewriting your app.
Large models: Cache model objects with st.cache_resource and batch requests where possible.

Wrap up

Streamlit compresses the app-building lifecycle for data teams. With a reactive Python-first approach, strong caching, and a rich widget set, you can move from notebook to an interactive, reliable tool in days—not months. Start with a focused use case, ship a slice, and iterate with your users. That is how you build data-driven apps that people actually love to use.

Discover more from CPI Consulting

Subscribe to get the latest posts sent to your email.