In this blog post Build Data Driven Apps With Streamlit That Users Love To Use we will explore how to turn data and Python into interactive apps that your team can use immediately. Whether you are a data scientist, engineer, or technical leader, Streamlit makes it simple to go from prototype to a reliable internal tool.
What Streamlit is and why it matters
Streamlit is a Python framework that lets you build data-driven web apps with minimal boilerplate. You write a regular script, add a few UI widgets, and Streamlit handles the layout, state, and live updates. No JavaScript, no complex front-end builds—just Python.
Streamlit shines for analytics apps, model demos, and internal dashboards where speed of delivery and close proximity to data/ML code matter. Instead of handing over a static report, you can let users filter, experiment, and decide faster.
The technology behind Streamlit
Under the hood, Streamlit runs your Python script as a reactive app. Every user interaction (like moving a slider or choosing a filter) triggers a lightweight re-run of the script. Widgets store values and st.session_state
preserves state across runs. The front-end is a modern React app that communicates with the Python backend over a WebSocket. Heavy work lives in Python, close to your data and libraries such as pandas, NumPy, scikit-learn, and SQLAlchemy.
Two caching layers keep apps fast and cost-efficient:
st.cache_data
memoizes the results of data computations (e.g., SQL queries, pandas transforms) with automatic invalidation on code or input changes.st.cache_resource
persists expensive resources (e.g., database engines, model objects) across reruns and sessions.
This reactive model removes most of the plumbing typical in web frameworks, letting you focus on the data logic while still delivering a polished UI.
When to use Streamlit
- Internal analytics tools and self-serve insights
- ML model prototypes, demos, and monitoring dashboards
- Data exploration interfaces for operations teams
- Event-driven or batch data reviews, not high-traffic consumer apps
If you need fine-grained routing, complex multi-tenant auth, or extreme scale, consider pairing Streamlit with a gateway and standard auth provider—or moving to a full web framework once requirements outgrow the prototype.
Set up your environment
- Install Python 3.9+ and create a virtual environment.
pip install streamlit pandas sqlalchemy psycopg2-binary altair
(adjust drivers for your database).- Run
streamlit hello
to verify the installation.
Build a minimal data app
The example below loads sales data from a database, caches queries, and provides interactive filters, KPIs, and charts.
import os
import pandas as pd
import streamlit as st
from sqlalchemy import create_engine, text
st.set_page_config(page_title="Sales Explorer", page_icon="📈", layout="wide")
st.title("📈 Sales Explorer")
# 1) Configure secrets and connections
DB_URL = st.secrets.get("DB_URL", os.getenv("DB_URL", "postgresql+psycopg2://user:pass@host:5432/db"))
@st.cache_resource
def get_engine():
return create_engine(DB_URL, pool_pre_ping=True)
@st.cache_data(ttl=600)
def load_data(start_date, end_date, region):
# Parameterized SQL to avoid injection
with get_engine().connect() as conn:
query = text(
"""
SELECT order_date, region, product, revenue, quantity
FROM sales
WHERE order_date BETWEEN :start AND :end
AND (:region = 'All' OR region = :region)
"""
)
df = pd.read_sql(query, conn, params={"start": start_date, "end": end_date, "region": region})
return df
# 2) Sidebar filters
st.sidebar.header("Filters")
start_date = st.sidebar.date_input("Start date")
end_date = st.sidebar.date_input("End date")
region = st.sidebar.selectbox("Region", ["All", "APAC", "EMEA", "NA", "LATAM"], index=0)
if start_date > end_date:
st.sidebar.error("Start date must be before end date")
st.stop()
# 3) Load data with caching
with st.spinner("Loading data..."):
df = load_data(str(start_date), str(end_date), region)
if df.empty:
st.warning("No data for the selected filters.")
st.stop()
# 4) KPIs
c1, c2, c3 = st.columns(3)
with c1:
st.metric("Total Revenue", f"${df['revenue'].sum():,.0f}")
with c2:
st.metric("Total Quantity", f"{df['quantity'].sum():,.0f}")
with c3:
st.metric("Avg Order Value", f"${(df['revenue'].sum() / max(len(df),1)) :,.2f}")
# 5) Chart
import altair as alt
chart = (
alt.Chart(df)
.mark_bar()
.encode(
x=alt.X("product:N", sort='-y'),
y=alt.Y("sum(revenue):Q", title="Revenue"),
color="region:N",
tooltip=["product", "region", alt.Tooltip("sum(revenue)", title="Revenue", format=",")]
)
.properties(height=400)
)
st.subheader("Revenue by product")
st.altair_chart(chart, use_container_width=True)
# 6) Details table with download
st.subheader("Order details")
st.dataframe(df, use_container_width=True)
st.download_button(
"Download CSV",
data=df.to_csv(index=False).encode('utf-8'),
file_name="sales.csv",
mime="text/csv",
)
# 7) Session state example
if "favorites" not in st.session_state:
st.session_state.favorites = set()
product_pick = st.selectbox("Add a product to favorites", sorted(df["product"].unique()))
if st.button("Add to favorites"):
st.session_state.favorites.add(product_pick)
st.success(f"Added {product_pick}")
if st.session_state.favorites:
st.info("Favorites: " + ", ".join(sorted(st.session_state.favorites)))
Key patterns for robust apps
Secrets and configuration
- Use
st.secrets
for credentials and config; never hardcode secrets in code or repos. - Parameterize SQL to prevent injection, as shown above.
State and navigation
- Use
st.session_state
for user-specific state (filters, selections, temporary data). - Structure multi-page apps by placing scripts under a
pages/
directory. Streamlit builds the navigation automatically.
Performance
- Cache queries and expensive transforms with
st.cache_data
and set sensiblettl
values. - Cache connections and models with
st.cache_resource
to avoid reinitialization. - Prune data early with WHERE clauses, column selection, and aggregation in SQL.
- Defer heavy work until needed using
st.expander
and conditional blocks.
UI and UX
- Use the sidebar for global filters and the main area for content.
- Group related content with
st.tabs
andst.columns
. - Provide clear empty states and helpful error messages using
st.warning
andst.error
. - Add downloads with
st.download_button
for users who need to export data.
Connecting to real data sources
- Relational databases: PostgreSQL, MySQL, SQL Server via SQLAlchemy drivers.
- Data warehouses: Use official drivers (e.g., Snowflake, BigQuery) and cache results.
- Object storage: Read from S3, GCS, or Azure Blob with signed URLs and
pandas
readers. - APIs: Use requests/httpx. Cache responses to reduce rate limits and latency.
Testing and quality
- Extract data transformations into pure functions and unit test them with pytest.
- Mock database calls in tests; integration tests can run against a small seeded database.
- Lint with ruff or flake8 and format with black to keep scripts clean and readable.
Deployment options
Streamlit Community Cloud
- Connect a Git repo, set secrets, and deploy—ideal for prototypes and small teams.
Containers and cloud runtimes
- Package with Docker and run on AWS ECS/Fargate, EKS, Azure App Service/Container Apps, or GCP Cloud Run.
- Expose port 8501, set health checks, and enable autoscaling based on CPU/memory.
- Use sticky sessions if your load balancer supports it to reduce session churn; otherwise keep session state minimal.
Security
- Terminate TLS at a reverse proxy (e.g., Nginx, ALB) and restrict access with an identity provider (Azure AD, Okta, Auth0) using the proxy’s auth features.
- Scope database credentials to least privilege; rotate secrets and use managed secret stores.
Observability and operations
- Log structured events for load times, cache hits, and query durations.
- Emit metrics to your monitoring stack and set alerts for error rates and latency.
- Profile hotspots locally with
cProfile
orpy-spy
and optimize queries or caching.
Advanced features worth exploring
- Real-time feel: Periodically refresh sections with
st.empty()
and loops, or use shorter cache TTLs for fresh data. - Editable data:
st.data_editor
to collect user edits, with validation before persisting to the database. - Custom components: If needed, integrate specialized JS visualizations via Streamlit Components without rewriting your app.
- Large models: Cache model objects with
st.cache_resource
and batch requests where possible.
Wrap up
Streamlit compresses the app-building lifecycle for data teams. With a reactive Python-first approach, strong caching, and a rich widget set, you can move from notebook to an interactive, reliable tool in days—not months. Start with a focused use case, ship a slice, and iterate with your users. That is how you build data-driven apps that people actually love to use.
Discover more from CPI Consulting
Subscribe to get the latest posts sent to your email.