Methodology

How estimates work

StackStats turns public Substack signals into transparent gross MRR ranges, confidence labels, and growth history. It is built for learning from what appears to work, not pretending public data gives private subscriber counts.

Public sources firstOnly unauthenticated public Substack data is used in V1.

Ranges, not false precisionRevenue is shown as a low/base/high gross MRR estimate.

Daily historySnapshots preserve what the model believed on each collection date.

estimate ranges

Estimate Ranges

StackStats estimates gross monthly recurring revenue from public paid-audience buckets and visible subscription prices.

The range is intentionally wide when the public signal is vague. A publication with only a bucket such as thousands of paid subscribers should not be presented as if we know the exact paid subscriber count.

Low estimate: conservative paid-subscriber and price interpretation.
Base estimate: the finance-v1 midpoint used for ranking.
High estimate: optimistic interpretation of the same public signals.

gross mrr

Gross MRR

Gross MRR means estimated subscription revenue before platform fees, payment processing, taxes, refunds, coupons, churn, or comped subscriptions.

The model uses regular public web plans first. Annual plans are converted into monthly revenue by dividing by 12, then combined with an explicit monthly/yearly mix assumption.

No discount or coupon adjustment.
No Substack platform fee or payment fee assumption.
No tax, refund, churn, or comped subscriber assumption.
No net revenue claim.

confidence

Confidence

Confidence describes the quality of the observable public inputs, not how successful a publication is.

A high-confidence estimate has a public paid-audience bucket, visible regular pricing, a free subscriber signal, recent source records, and enough post history to understand cadence.

High: paid audience, pricing, subscriber signal, recency, and post history are all present.
Medium: paid audience and at least one public price are present, but some supporting signals are missing.
Low: paid audience is missing, pricing is ambiguous, sources are stale, or the publication appears unusual.

source records

Source Records And Signals

Every external response is stored as a source record before normalized fields are derived from it. This keeps provenance available when parsers improve or the model changes.

Normalized signals include identity, public leaderboard rank, paid-audience wording, visible tier prices, public post metadata, and recommendation edges.

Raw source records preserve where a signal came from.
Normalized tables keep the product fast and queryable.
Daily snapshots freeze estimates, ranks, and graph-ready metrics for historical charts.

connected publications

Connected Publications

Exact paid subscriber counts and exact tier distribution are not public data. StackStats should not imply that V1 knows those numbers.

A future Connected Publication flow can let a creator authorize accurate private metrics for their own publication. Those verified numbers should be displayed separately from public estimates.

Public estimate: inferred from public buckets and prices.
Verified revenue: future creator-authorized data.
Exact paid-tier distribution requires creator permission.

graph signals

Graph Signals

Recommendation and public relationship signals are useful for discovery, adjacency, and growth insight. They are not complete subscriber lists.

V1 stores graph edges in Postgres and uses them to show adjacent publications. A graph database can wait until traversal becomes central to the product.

Recommendations can suggest who a publication wants to be associated with.
Visible public subscription/read edges, if collected later, will still be incomplete and privacy-biased.
Graph signals should support inspiration and discovery, not exact revenue claims.