Headline Numbers
Held-out F1 (0.9963, 30 manually-verified modern mega-caps) being higher than the 4168-filing weak-label train F1 (0.9184) is not "the model generalizes better than it trained" — it's a distribution shift: held-out filings are structurally regular, whereas train spans messy SGML, 10-K/A amendments, and 20-F foreign issuers. Reviewers should test both kinds. The Mixed-era 4168 number is the realistic full-market average; the held-out 30 number is the best-case ceiling.
All three splits use rules-first parsing + Layer-4 LLM rescue (multi-round + title-dictionary). Train F1 dragged by legacy SGML 1995-2008 — see §02.
| Era | N | Precision | Recall | F1 |
|---|---|---|---|---|
| early_ixbrl_2009_2019 | 74 | 1.000 | 0.899 | 0.9464 |
| html_2001_2008 | 54 | 1.000 | 0.855 | 0.9173 |
| sgml_1995_2000 | 42 | 0.984 | 0.917 | 0.9394 |
| modern_ixbrl_2020_plus | 6 | 0.992 | 0.932 | 0.9609 |
| Industry | N | Precision | Recall | F1 |
|---|---|---|---|---|
| unknown | 176 | 0.996 | 0.891 | 0.9363 |
| Years | N | Precision | Recall | F1 |
|---|---|---|---|---|
| ? | 176 | 0.996 | 0.891 | 0.9363 |
| Size bucket | N | Precision | Recall | F1 |
|---|---|---|---|---|
| unknown | 176 | 0.996 | 0.891 | 0.9363 |
Caveat: this is a pragmatic proxy, not literal market cap. We use SGML envelope_bytes (the full filing package size from L1 acquirer) because envelope size correlates strongly with revenue, exhibit count, and narrative complexity — bigger filers ship denser disclosures. Quartile- derived thresholds (n=299): p25=350KB, median=1.7MB, p75=9.4MB, max=123MB. Apple FY23 = 22MB → mega. We do NOT pull SEC's `EntityCommonStockSharesOutstanding` × stock price for true market cap because that requires per-CIK XBRL lookups + price API; envelope_bytes gives 99% of the signal at 0% cost.
| Filing | Items | P | R | F1 | IBR | Era |
|---|---|---|---|---|---|---|
| Apple (2025-10-31) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Amazon (2026-02-06) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Microsoft (2025-07-30) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Alphabet (2026-02-05) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Meta (2026-01-29) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Tesla (2026-01-29) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| NVIDIA (2026-02-25) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| IBM (2026-02-24) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Intel (2026-02-17) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| AMD (2026-02-19) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| GE (2026-01-29) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| GM (2025-10-10) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Ford (2026-02-11) | 24 | 0.958 | 1.000 | 0.9787 | 5/5 | modern_html |
| JPMorgan (2026-02-13) | 22 | 1.000 | 0.957 | 0.9778 | 5/5 | modern_html |
| Bank of America (2026-02-25) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Wells Fargo (2026-02-24) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Coca-Cola (2026-02-20) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| PepsiCo (2026-02-03) | 22 | 1.000 | 0.957 | 0.9778 | 5/5 | modern_html |
| Procter & Gamble (2025-08-04) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Johnson & Johnson (2026-02-11) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Costco (2025-10-08) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Lockheed Martin (2026-01-29) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Charles Schwab (2026-02-12) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Sysco (2025-08-22) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Berkshire Hathaway (2026-03-02) | 22 | 1.000 | 0.957 | 0.9778 | 5/5 | modern_html |
| ExxonMobil (2026-02-18) | 22 | 1.000 | 0.957 | 0.9778 | 5/5 | modern_html |
| Chevron (2026-02-24) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| AT&T (2025-02-12) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Verizon (2026-02-17) | 23 | 1.000 | 1.000 | 1.0000 | 5/5 | modern_html |
| Lubrizol (2026-02-10) | 23 | 1.000 | 1.000 | 1.0000 | 4/5 | modern_html |
| Kind | N | Matched | Match rate | Healed used |
|---|---|---|---|---|
| extract | 46 | 37 | 80.4% | 15 |
| navigate-extract | 18 | 15 | 83.3% | 4 |
| broken-selector | 27 | 26 | 96.3% | 23 |
| count-something | 1 | 1 | 100.0% | 1 |
| silent-failure-trap | 8 | 6 | 75.0% | 0 |