How I created the reading mode feature in a Flutter app.

When I first set out to add a distraction-free reading mode to my Flutter app, I imagined it would be a straightforward task.

A side-by-side bar chart: “JS/WebView + Readability” vs. “Dart parser” vs. “WASM” in terms of (a) bundle size, (b) extraction
A side-by-side bar chart: “JS/WebView + Readability” vs. “Dart parser” vs. “WASM” in terms of (a) bundle size, (b) extraction time.

When I first set out to add a distraction-free reading mode to my Flutter app, I imagined it would be a straightforward task.

But as I dug into existing solutions—injecting readability.jsinto a WebView, parsing HTML with Dart libraries, even exploring WebAssembly for external libraries—each option revealed trade-offs that didn’t quite fit my needs.

In the end, I decided to build my own reading-mode engine from scratch. Here’s the story of why I took that path and what I learned along the way.

My current extracted web content from websites is in RSS feed.
My current extracted web content from websites is in RSS feed.

Option 1: Let’s Just Inject “Readability.js”

Injecting Readability.js into a WebView seemed like the fastest path. In pseudocode:

// 1. Load the page in a WebView 
InAppWebView( 
  initialUrlRequest: URLRequest(url: WebUri(articleUrl)), 
  onLoadStop: (controller, url) async { 
    // 2. Inject Readability.js (already bundled as an asset) 
    final jsLib = await rootBundle.loadString('assets/readability.js'); 
    await controller.evaluateJavascript(source: jsLib); 
    // 3. Run the extraction script 
    await controller.evaluateJavascript(source: ''' 
      (function() { 
        const article = new Readability(document).parse(); 
        document.body.innerHTML = article.content; 
      })(); 
    '''); 
  }, 
)

How it felt at first

  • Download a copy of readability.js(about 500 KB minified).
  • Drop it in assets/js/.
  • In pubspec.yaml, register it.
  • The WebView loaded the page and—pretty quickly—showed a neat, stripped-down article.

On a few big news sites (Medium, The Verge, Hacker News), I saw exactly what I wanted: big, readable text; captions under images; and no sidebar nonsense.

Cracks in the “Just Use Readability” Approach

1. Asset Size Surprise

  • When I built my debug APK, the compressed bundle looked okay (maybe +1 MB).
  • But on the device’s file explorer, readability.js was sitting there as a 23 MB file. Why?
  • The uncompressed version of third-party libraries (with comments, mappings) can be huge.
  • The Play Store compresses APKs, so the downloaded size shrunk, but on-device extraction blew it up.
download size vs on-device asset size comparison for readability.js.
Download size vs. on-device asset size comparison for readability.js.

2. WebView Performance Lag

  • On high-end phones, cleaning up via JS took around 300 ms after the page loaded.
  • But on mid-range or low-end devices, I saw a noticeable “flash”: first the original article (with ads) appeared, then it blinked out and the cleaned version rendered.
  • As a user, that flicker felt jarring—especially when images reflowed under the new CSS.

3. Fallback and Edge Cases

  • Lots of sites use dynamic imports, shadow DOM, or infinite-scroll article loaders.
  • If a site delayed loading its main content, running Readability(document).parse()too early meant I got an empty or partial result.
  • Every site had slightly different naming conventions for “.article-body” or “.post-content,” so I ended up chasing one-offs:
// Site-specific hack example 
if (window.location.hostname.includes('mycustomblog.com')) { 
  // Wait for dynamic content 
  setTimeout(() => {  
    const article = new Readability(document).parse(); 
    document.body.innerHTML = article.content; 
  }, 500); 
}

That worked for some sites, but it was brittle (lots of magic timeouts, unpredictable). I didn’t want a maintenance nightmare of “hack for site X, hack for site Y” every time a layout changed.


Option 2: Exploring Pure Dart Parsing

I thought, “Okay—what if I skip WebView + big JS and just fetch the HTML in Dart? Parse with the html package, find the main <article> or <div class="content">, strip out the rest, then show the clean HTML.”

1. Basic Dart-Only Fetch & Parse

In theory:

Future<String> fetchAndCleanArticle(String url) async { 
  final response = await http.get(Uri.parse(url)); 
  final document = parse(response.body); // from package:html 
  // 1. Remove <header>, <footer>, <nav> nodes 
  document.querySelectorAll('header, footer, nav').forEach((el) => el.remove()); 
  // 2. Remove ads 
  document.querySelectorAll('.ad, .sidebar, .promo').forEach((el) => el.remove()); 
  // 3. Choose main content block 
  final mainBlock =  
      document.querySelector('article') ?? 
      document.querySelector('main') ?? 
      document.querySelector('.content'); 
   
  return mainBlock?.outerHtml ?? document.body?.outerHtml ?? ''; 
}
  • I added this code to a simple demo app, then measured:
  • Parsing 300 KB of HTML: ~80 ms on my Pixel 3.
  • Removing nodes: another ~20 ms.
  • SerializingouterHtml: ~30 ms.
  • Total ~130 ms. Not bad. But on a Xiaomi Redmi Note 7 (low-end), it spiked to ~250 ms, which felt sluggish if I loaded multiple articles in a session.
A line graph: “HTML size vs. parse time on mid-range vs. low-end device.”
A line graph: “HTML size vs. parse time on mid-range vs. low-end device.”

2. Content-Scoring Complexity

  • Without Readability’s scoring, I needed a fallback that “picks something reasonable.”
  • I tried a simple rule: if article tag exists, use it; otherwise, search for div with “.post” or “.article-body.”
  • But some blogs wrap text in multiple <div>layers; sometimes the <article> tag contained only 100 words plus a sidebar promo. The real text was in a nested <div class="text-wrapper"> inside that same <article>.

I considered porting Readability’s scoring into Dart—assigning scores based on class names, link density, and text length. But rewriting those heuristics felt like rebuilding Readability from scratch.

And even if I succeeded, it would still parse the entire DOM tree, which on slower devices felt borderline.

So, now I started looking for more efficient and faster libraries outside of the Dart ecosystem.


Option 3: WebAssembly (WASM) Idea: Promising, but Overkill

I read about compiling JS libraries or Rust-based parsers into WASM for speed. In particular:

What I liked about WASM

  • Near-native execution speed.
  • Bundles are compressed and typically smaller than raw JS (no comments, no debug mappings).

Roadblocks

1. Flutter Integration

  • On mobile, Flutter doesn’t have a built-in “run WASM” API. Some people embed a minimal JavaScriptCore or V8 engine for WASM, but that adds complexity.
  • I found tutorials on iOS: use WKWebView to load a .wasm file in JS, but then pass results back to Dart via evaluateJavascript(…). Back to a WebView!

2. Build & Debug Complexity

  • Compiling readability-rs to .wasm was straightforward with wasm-pack, but hooking it into Flutter meant writing glue code, setting up memory buffers, and mapping exports.
  • Small changes (like adjusting the “link density” threshold in Rust) meant going back to a Rust toolchain, recompiling the WASM, and then rebuilding the Flutter app. It slowed down iteration drastically.

3. Edge Cases Remained

  • Even a fast WASM parser needs logic for dynamic content (e.g., single-page apps that fetch article text via AJAX). Now I’d need Dart + JS + WASM code to coordinate: (1) Fetch initial HTML, (2) wait for JS to load content, (3) serialize final HTML, (4) pass to WASM, and (5) extract main content. It became unwieldy.
Flowchart showing “Dart → WebView → JS → WASM → Dart” and all the overhead in those arrows.
Flowchart showing “Dart → WebView → JS → WASM → Dart” and all the overhead in those arrows.

Finally: Writing My Own “Good Enough” Dart Extractor

By this point, I’d spent weeks chasing various paths. All felt too big or too brittle. But then I realized

Most of my users read articles from a handful of well-structured news sites and blogs. I didn’t need to cover every fringe case. If I nailed the top 80 % of cases, it would be a big win. My users just need a cleaner content.

So I wrote a Dart function that was some improvements over Option — 2.

  1. Fetches the HTML using http.get.
  2. Quickly removes all obvious noise (<header>, <footer>, <nav>, .sidebar, .ad, etc.).
  3. Looks for the largest block of text by word count among <article>, <main>, and a few common <div class="…"> selectors.
  4. Wraps the result in a minimal CSS block so images go full-width and text is readable.
My parser works well at this stage itself.

1. Code Example

This is a pared-down version of what I ended up with—about 50 lines of Dart:

import 'package:html/parser.dart' as htmlParser; 
import 'package:html/dom.dart'; 
 
class SimpleReaderMode { 
  // Step 1: Remove noise 
  static void _removeNoise(Document doc) { 
    final selectorsToRemove = [ 
      'header', 'footer', 'nav', 'aside', 
      '.sidebar', '.ad', '.promo', '.popup', 
      '[class*="ad-"]', '[id*="ad-"]', 
    ]; 
    for (final sel in selectorsToRemove) { 
      doc.querySelectorAll(sel).forEach((el) => el.remove()); 
    } 
  } 
  // Step 2: Find candidate blocks and pick the "largest" by word count 
  static Element? _findMainBlock(Document doc) { 
    final candidates = <Element?>[ 
      doc.querySelector('article'), 
      doc.querySelector('main'), 
      doc.querySelector('.content'), 
      doc.querySelector('.post'), 
      doc.querySelector('.article-body'), 
    ]; 
    // Filter out null or tiny blocks 
    final valid = candidates.where((el) { 
      if (el == null) return false; 
      final text = el.text.trim(); 
      return text.split(RegExp(r'\\s+')).length > 50; // at least 50 words 
    }).cast<Element>().toList(); 
    if (valid.isEmpty) return null; 
    // Pick the block with the most words 
    valid.sort((a, b) => b.text.length.compareTo(a.text.length)); 
    return valid.first; 
  } 
  // Step 3: Wrap with minimal CSS 
  static String _wrapWithCss(String innerHtml) { 
    return ''' 
      <html> 
        <head> 
          <style> 
            body { margin: 0; padding: 20px; font-family: Arial, sans-serif; font-size: 18px; line-height: 1.6; } 
            img { max-width: 100%; height: auto; display: block; margin: 10px auto; } 
            h1, h2, h3 { margin-top: 1.2em; margin-bottom: 0.6em; font-weight: bold; } 
          </style> 
        </head> 
        <body> 
          $innerHtml 
        </body> 
      </html> 
    '''; 
  } 
  // Public API: fetch, parse, extract, wrap 
  static Future<String> extract(String url) async { 
    final response = await http.get(Uri.parse(url)); 
    final document = htmlParser.parse(response.body); 
     
    _removeNoise(document); 
    final mainBlock = _findMainBlock(document); 
    if (mainBlock != null) { 
      return _wrapWithCss(mainBlock.outerHtml); 
    } else { 
      // Fallback: return raw <body> if we couldn't find a main block 
      return _wrapWithCss(document.body?.outerHtml ?? ''); 
    } 
  } 
}
  • _removeNoise: Kills obvious ads, sidebars, headers/footers—elements I never want in my reading mode.
  • _find MainBlock: Picks the biggest chunk of text (by length) among a few common tags. If nothing is obvious, it falls back to raw <body>.
  • _wrapWithCss: Injects simple CSS so text is legible and images don’t overflow.

2. Performance and Size

  • Code Size: This Dart logic is under 100 lines, plus the html package (~150 KB).
  • Parsing Speed:
  • On a mid-range device, it consistently took ~100 ms to fetch + parse + extract.
  • On lower-end devices, ~180 ms total—still acceptable for a “mode switch.”
  • Resulting HTML: Usually 30–50 KB (stripped down), which loads instantly in a minimal WebView or in a flutter_html widget.
This is just a glimpse of the 1st version of the algorithm.
There are many more optimizations needed, like CORS issues ,web crawling limitations due to simple HTTP fetch, Cloudflare authentication limitations, etc.
More details in further parts 2 and 3.

Why This “Simple Dart Parser” Became My Go-To

1. Small Footprint

  • No 23 MB JS asset.
  • No extra WASM loader code.
  • Just package:html (~150 KB) and my 50 lines of Dart.

2. Speed

  • Consistently under 200 ms end-to-end, even on budget phones.
  • No WebView flashing or double-render lag.

3. Full Control in Dart

  • I can tweak _removeNoise selectors instantly without leaving my Dart IDE.
  • If a specific site changes its structure, I add a custom rule in Dart. (No JS build-step or WASM recompile.)

4. Graceful Fallback

  • If _findMainBlock returns null (rare), I still show a readable, stripped version of <body>.
  • Users still see something — no blank screen or broken JS errors.

5. Maintenance

  • My code sits right next to the rest of my Flutter app.
  • I’m not juggling multiple toolchains or asset formats.

Final Thoughts

Writing my own parser was surprisingly straightforward once I accepted that it didn’t need to cover every obscure edge case.

It handles 80% of sites beautifully, runs fast on most devices, and lives entirely in Dart—so I never wrestle with asset bloat, build chains, or cross-language debugging.

The few times a site trips up (e.g., paywall or AJAX-only article), I still fall back to a basic WebView so the user can read something.

If you want to build a similar feature, here’s a quick checklist:

  1. Start Simple: Strip out <header>, <footer>, .sidebar, .ad, .promo.
  2. Pick Main Content: Look for <article>, <main>, etc. Choose the largest by text length.
  3. Wrap with minimal CSS: Ensure fonts and images scale nicely.
  4. Measure Performance: Test on both high-end and budget devices.
  5. Add Fallbacks: If “find main content” fails, show raw <body> or fallback to a WebView.

That’s the story of how I went from “just inject Readability” to “write a small Dart parser” and never looked back—because it gave me speed, control, and a tiny bundle.


Feel free to suggest some improvements and your thoughts on my approach.

And if you liked this tutorial, please like, comment, and subscribe. It's free, and it will motivate me to add more useful tutorials.