Extract title and description from a generated article HTML file.
Reads the predictable template structure produced by the aggregator
article generator. Falls back to empty strings when the file cannot
be read. HTML entities from the template are decoded to produce
plain text.
Title resolution order:
<head><title> value with the trailing — EU Parliament Monitor
(or legacy | EU Parliament Monitor) site-suffix stripped.
This is where the editorial-highlights resolver + SEO backport
script write their output, so using it as the primary source
surfaces the strongest headline on index cards and sitemaps.
First body <h1> — fallback for files whose <title> was never
refreshed.
NOTE: The meta description regex relies on the template's use of
escapeHTML(), which converts " to ". Because descriptions are
always stored with double-quote delimiters and inner quotes are
HTML-encoded, the [^"]+ pattern safely captures the full value.
Parameters
filepath: string
Path to the article HTML file
Returns {title:string;description:string}
Object with title (from head-title, else first body h1) and
description (from meta description)
Extract title and description from a generated article HTML file. Reads the predictable template structure produced by the aggregator article generator. Falls back to empty strings when the file cannot be read. HTML entities from the template are decoded to produce plain text.
Title resolution order:
<head><title>value with the trailing— EU Parliament Monitor(or legacy| EU Parliament Monitor) site-suffix stripped. This is where the editorial-highlights resolver + SEO backport script write their output, so using it as the primary source surfaces the strongest headline on index cards and sitemaps.<h1>— fallback for files whose<title>was never refreshed.NOTE: The meta description regex relies on the template's use of escapeHTML(), which converts
"to". Because descriptions are always stored with double-quote delimiters and inner quotes are HTML-encoded, the[^"]+pattern safely captures the full value.