Proyecto WWW maintenance tool (sections index generator)

Description: Python tool to automatically generate bilingual index pages for Proyecto WWW from plain text files listing sections and articles.

Introduction

This maintenance tool in Python automatically generates the bilingual index pages of Proyecto WWW from plain text files containing the published sections and articles.

Index input files

The script uses two plain text files as data sources: secciones-es.txt for the Spanish version and secciones-en.txt for the English version.

Each file is organized by sections; every section starts with its name on a separate line, followed by pairs of lines containing first the article title and then its corresponding URL.

At least one blank line is left between sections, which allows the script to correctly identify the beginning of each content category.

General script workflow

The script reads both text files, interprets the structure of sections, titles, and URLs, and builds an ordered in‑memory representation of the Proyecto WWW content.

From that representation, it generates two HTML files (output-es.html and output-en.html) that include the section tabs, the article lists, and the JSON-LD blocks required by search engines.

Implementation details

File reading and internal structure (read_sections)

The read_sections function processes each file line by line, detects each section name, and groups titles and URLs into a dictionary, where every key is a section name and each value is the list of its articles.

The function accepts plain text URLs as well as URLs written in Markdown format, which are cleaned to extract the actual link before adding it to the internal structure.

JSON-LD generation (generate_json_ld)

The generate_json_ld function builds, for each section, a structured data block using the BreadcrumbList type from schema.org, including article titles and URLs.

These blocks are inserted into the final HTML inside <script type="application/ld+json"> tags, improving how search engines understand the page content.

HTML and tab construction (write_html_and_jsonld)

The write_html_and_jsonld function generates the full HTML structure, adds the short descriptive comment for search engines, and creates the buttons for each section, which act as navigation tabs.

For every section, it generates a container with its title and an ordered list of links to the articles; it also adapts visible texts according to the language and appends the Proyecto WWW featured articles section.

Practical use in Proyecto WWW

To update the indexes, you only need to edit secciones-es.txt and secciones-en.txt, adding or modifying sections, titles, and URLs as needed.

After saving the changes, run the Python script; it will generate the output-es.html and output-en.html files, whose content is then copied into the corresponding Spanish and English pages of Proyecto WWW.

Complete source code

Below is the complete source code of the script, ready to be copied, adapted, or reused in future maintenance tasks for Proyecto WWW.


import json

def read_sections(file_path):
    """
    Lee un archivo donde las categorías están separadas por líneas en blanco.
    Cada categoría comienza con su nombre en una línea, seguida de pares de líneas:
    nombre del artículo y URL.
    Retorna un diccionario {categoria: [(nombre, url), ...], ...}
    """
    sections = {}
    current_category = None

    with open(file_path, 'r', encoding='utf-8') as f:
        lines = [line.rstrip('\n') for line in f]

    i = 0
    n = len(lines)
    while i < n:
        line = lines[i].strip()

        # Saltar líneas vacías
        if not line:
            i += 1
            continue

        # Detectar categoría: línea después de línea vacía o al inicio
        if current_category is None or (i > 0 and lines[i-1].strip() == ''):
            current_category = line
            if current_category not in sections:
                sections[current_category] = []
            i += 1
            continue

        # Si llegamos aquí, estamos dentro de una categoría y esta línea es nombre de artículo
        name = line
        url = None

        # Mirar la siguiente línea para ver si es una URL
        if i + 1 < n:
            url_candidate = lines[i + 1].strip()
            # Aceptamos URLs con o sin formato Markdown
            if url_candidate.startswith('http') or url_candidate.startswith('['):
                # Ejemplo de Markdown: [https://...](https://...)
                if url_candidate.startswith('[') and '](' in url_candidate:
                    url_inside = url_candidate.split('](')[1].rstrip(')')
                    url = url_inside
                else:
                    url = url_candidate.strip('[]')
                i += 2
            else:
                i += 1
        else:
            i += 1

        if current_category is None:
            raise ValueError("Artículo sin categoría definida")

        if url:
            sections[current_category].append((name, url))

    return sections


def generate_json_ld(category, items):
    """Genera la estructura JSON-LD para una categoría dada."""
    item_list = []
    position = 1
    for name, url in items:
        if url:
            item_list.append({
                "@type": "ListItem",
                "position": position,
                "item": {
                    "@id": url,
                    "name": name
                }
            })
            position += 1

    return {
        "@context": "https://schema.org",
        "@type": "BreadcrumbList",
        "name": category,
        "itemListElement": item_list
    }


def write_html_and_jsonld(sections, language='es'):
    """
    Genera el HTML completo (comentario + JSON-LD + contenido)
    a partir del diccionario de secciones.

    language:
      'es' → textos visibles en español (output-es.html)
      'en' → textos visibles en inglés (output-en.html)
    """
    html = ""

    # Comentario con descripción corta para motores de búsqueda / Blogger
    if language == 'es':
        html += "<!-- Secciones del Proyecto WWW, con enlaces a los artículos publicados en español. -->\n\n"
    else:
        html += "<!-- Sections of Proyecto WWW, with links to articles published in English. -->\n\n"

    # JSON-LD por cada categoría
    for category, items in sections.items():
        json_ld = generate_json_ld(category, items)
        json_ld_str = json.dumps(json_ld, ensure_ascii=False, indent=4)
        html += f'<script type="application/ld+json">\n{json_ld_str}\n</script>\n\n'

    # Bloque inicial según idioma
    if language == 'es':
        html += (
            '<div>\n'
            '\t<p>Secciones establecidas para el contenido del Proyecto WWW.</p><br>\n'
            '\t<a id="resumen" class="subrayadoSolidoAnaranjado01 letra_capital">Resumen del contenido</a>:<br>\n'
            '\t<p class="FuenteNoventaPorCiento" style="padding: 3%;">\n'
            '\t\tCada una de las partes o divisiones establecidas para el contenido del Proyecto WWW\n'
            '\t</p>\n'
            '</div>\n\n'
        )
    else:
        html += (
            '<div>\n'
            '\t<p>Sections established for the content of Proyecto WWW.</p><br>\n'
            '\t<a id="resumen" class="subrayadoSolidoAnaranjado01 letra_capital">Content summary</a>:<br>\n'
            '\t<p class="FuenteNoventaPorCiento" style="padding: 3%;">\n'
            '\t\tEach of the parts or divisions established for the content of Proyecto WWW\n'
            '\t</p>\n'
            '</div>\n\n'
        )

    html += '<div class="contenedor_full">\n'

    # Botones (pestañas) por categoría
    for i, category in enumerate(sections.keys()):
        is_active = (i == 0)
        aria_expanded = 'true' if is_active else 'false'
        id_attr = ' id="abrirPorDefecto"' if is_active else ''

        # Nombre visible según idioma (Arte → Arte Visual, Art → Visual Art)
        if language == 'es' and category == 'Arte':
            visible_name = 'Arte Visual'
        elif language == 'en' and category == 'Art':
            visible_name = 'Visual Art'
        else:
            visible_name = category

        html += (
            f'<button class="etiqueta_enlace" aria-controls="{category.lower()}" '
            f'aria-expanded="{aria_expanded}" tabindex="0" '
            f'onclick="seccionAbrirPagina(\'{category.lower()}\', this)"{id_attr}>{visible_name}</button>\n'
        )

    # Contenido de cada categoría
    for i, (category, items) in enumerate(sections.items()):
        is_active = (i == 0)
        aria_hidden = 'false' if is_active else 'true'
        html += (
            f'<div id="{category.lower()}" class="etiqueta_contenido" aria-hidden="{aria_hidden}">\n'
        )

        # Misma regla para el título de la sección
        if language == 'es' and category == 'Arte':
            visible_name = 'Arte Visual'
        elif language == 'en' and category == 'Art':
            visible_name = 'Visual Art'
        else:
            visible_name = category

        html += f'\t<h3 class="separador">{visible_name}</h3>\n\t<ol style="font-size: 1.05882em;">\n'
        for name, url in items:
            if url:
                safe_name = name.replace("'", "&#39;").replace('"', "&quot;")
                html += f"\t\t<li><a href='{url}'>{safe_name}</a>.</li>\n"
        html += '\t</ol>\n</div>\n'

    html += '</div><!-- contenedor_full -->\n'

    # Sección Destacados / Featured
    if language == 'es':
        html += (
            '<div id="destacados">\n'
            '\t<h3>Destacados</h3>\n'
            '\t<ol>\n'
            "\t\t<li>\n"
            "\t\t\t<a href='https://www.proyectowww.com.ar/search/label/DESTACADOS'>\n"
            "\t\t\t\tListado de los artículos mas notorios o relevantes, de todas las secciones\n"
            "\t\t\t</a>.\n"
            "\t\t</li>\n"
            '\t</ol>\n'
            '</div>\n'
        )
    else:
        html += (
            '<div id="destacados">\n'
            '\t<h3>Featured</h3>\n'
            '\t<ol>\n'
            "\t\t<li>\n"
            "\t\t\t<a href='https://www.proyectowww.com.ar/search/label/DESTACADOS'>\n"
            "\t\t\t\tList of the most notable or relevant articles from all sections\n"
            "\t\t\t</a>.\n"
            "\t\t</li>\n"
            '\t</ol>\n'
            '</div>\n'
        )

    return html


def main():
    # Versión en español: leer secciones-es.txt → output-es.html
    sections_es = read_sections('secciones-es.txt')
    html_es = write_html_and_jsonld(sections_es, language='es')
    with open('output-es.html', 'w', encoding='utf-8') as f:
        f.write(html_es)

    # Versión en inglés: leer secciones-en.txt → output-en.html
    sections_en = read_sections('secciones-en.txt')
    html_en = write_html_and_jsonld(sections_en, language='en')
    with open('output-en.html', 'w', encoding='utf-8') as f:
        f.write(html_en)


if __name__ == "__main__":
    main()
Related articles
Home