HLS for Live: Scalable but Latency-Bounded
Standard HLS for live streaming works by having an encoder (like FFmpeg or OBS with RTMP output) push video to an ingest server, which packages it into 4–8 second segments and updates a .m3u8 manifest. Viewers download the manifest, fetch segments, and watch. This architecture scales to millions of concurrent viewers trivially — it's just static file delivery from a CDN. The unavoidable cost is latency: standard HLS has 15–30 second end-to-end lag. Low-Latency HLS reduces this to 2–4 seconds, which is acceptable for most broadcasts.
WebRTC: Real-Time but Costly to Scale
WebRTC was designed for peer-to-peer communication — video calls, not broadcasts. Its UDP-based transport achieves sub-second latency, which matters for interactive scenarios: live auctions, remote collaboration, gaming, and emergency communication. The scaling problem is fundamental: in a broadcast scenario, the server (SFU) must forward video to every viewer individually, whereas CDN-based HLS serves cached files. WebRTC broadcasting at 10,000 concurrent viewers requires significantly more infrastructure than HLS at the same scale.
Hybrid Architectures
The most sophisticated live platforms use both. WebRTC handles the broadcaster's ingest (low-latency upload to the server) and any interactive elements (e.g., the host's camera in a Q&A). HLS handles the viewer egress, serving the packaged stream at massive scale from a CDN. This hybrid gives you a reasonable 2–8 second viewer latency with CDN-level scalability, while keeping the interactive layer truly real-time.