Proyecto WWW maintenance tool (sections index generator)
Description: Python tool to automatically generate bilingual index pages for Proyecto WWW from plain text files listing sections and articles.
Introduction
This maintenance tool in Python automatically generates the bilingual index pages of Proyecto WWW from plain text files containing the published sections and articles.
Index input files
The script uses two plain text files as data sources: secciones-es.txt for the Spanish version and secciones-en.txt for the English version.
Each file is organized by sections; every section starts with its name on a separate line, followed by pairs of lines containing first the article title and then its corresponding URL.
At least one blank line is left between sections, which allows the script to correctly identify the beginning of each content category.
General script workflow
The script reads both text files, interprets the structure of sections, titles, and URLs, and builds an ordered in‑memory representation of the Proyecto WWW content.
From that representation, it generates two HTML files (output-es.html and output-en.html) that include the section tabs, the article lists, and the JSON-LD blocks required by search engines.
Implementation details
File reading and internal structure (read_sections)
The read_sections function processes each file line by line, detects each section name, and groups titles and URLs into a dictionary, where every key is a section name and each value is the list of its articles.
The function accepts plain text URLs as well as URLs written in Markdown format, which are cleaned to extract the actual link before adding it to the internal structure.
JSON-LD generation (generate_json_ld)
The generate_json_ld function builds, for each section, a structured data block using the BreadcrumbList type from schema.org, including article titles and URLs.
These blocks are inserted into the final HTML inside <script type="application/ld+json"> tags, improving how search engines understand the page content.
HTML and tab construction (write_html_and_jsonld)
The write_html_and_jsonld function generates the full HTML structure, adds the short descriptive comment for search engines, and creates the buttons for each section, which act as navigation tabs.
For every section, it generates a container with its title and an ordered list of links to the articles; it also adapts visible texts according to the language and appends the Proyecto WWW featured articles section.
Practical use in Proyecto WWW
To update the indexes, you only need to edit secciones-es.txt and secciones-en.txt, adding or modifying sections, titles, and URLs as needed.
After saving the changes, run the Python script; it will generate the output-es.html and output-en.html files, whose content is then copied into the corresponding Spanish and English pages of Proyecto WWW.
Complete source code
Below is the complete source code of the script, ready to be copied, adapted, or reused in future maintenance tasks for Proyecto WWW.
import json
def read_sections(file_path):
"""
Lee un archivo donde las categorías están separadas por líneas en blanco.
Cada categoría comienza con su nombre en una línea, seguida de pares de líneas:
nombre del artículo y URL.
Retorna un diccionario {categoria: [(nombre, url), ...], ...}
"""
sections = {}
current_category = None
with open(file_path, 'r', encoding='utf-8') as f:
lines = [line.rstrip('\n') for line in f]
i = 0
n = len(lines)
while i < n:
line = lines[i].strip()
# Saltar líneas vacías
if not line:
i += 1
continue
# Detectar categoría: línea después de línea vacía o al inicio
if current_category is None or (i > 0 and lines[i-1].strip() == ''):
current_category = line
if current_category not in sections:
sections[current_category] = []
i += 1
continue
# Si llegamos aquí, estamos dentro de una categoría y esta línea es nombre de artículo
name = line
url = None
# Mirar la siguiente línea para ver si es una URL
if i + 1 < n:
url_candidate = lines[i + 1].strip()
# Aceptamos URLs con o sin formato Markdown
if url_candidate.startswith('http') or url_candidate.startswith('['):
# Ejemplo de Markdown: [https://...](https://...)
if url_candidate.startswith('[') and '](' in url_candidate:
url_inside = url_candidate.split('](')[1].rstrip(')')
url = url_inside
else:
url = url_candidate.strip('[]')
i += 2
else:
i += 1
else:
i += 1
if current_category is None:
raise ValueError("Artículo sin categoría definida")
if url:
sections[current_category].append((name, url))
return sections
def generate_json_ld(category, items):
"""Genera la estructura JSON-LD para una categoría dada."""
item_list = []
position = 1
for name, url in items:
if url:
item_list.append({
"@type": "ListItem",
"position": position,
"item": {
"@id": url,
"name": name
}
})
position += 1
return {
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"name": category,
"itemListElement": item_list
}
def write_html_and_jsonld(sections, language='es'):
"""
Genera el HTML completo (comentario + JSON-LD + contenido)
a partir del diccionario de secciones.
language:
'es' → textos visibles en español (output-es.html)
'en' → textos visibles en inglés (output-en.html)
"""
html = ""
# Comentario con descripción corta para motores de búsqueda / Blogger
if language == 'es':
html += "<!-- Secciones del Proyecto WWW, con enlaces a los artículos publicados en español. -->\n\n"
else:
html += "<!-- Sections of Proyecto WWW, with links to articles published in English. -->\n\n"
# JSON-LD por cada categoría
for category, items in sections.items():
json_ld = generate_json_ld(category, items)
json_ld_str = json.dumps(json_ld, ensure_ascii=False, indent=4)
html += f'<script type="application/ld+json">\n{json_ld_str}\n</script>\n\n'
# Bloque inicial según idioma
if language == 'es':
html += (
'<div>\n'
'\t<p>Secciones establecidas para el contenido del Proyecto WWW.</p><br>\n'
'\t<a id="resumen" class="subrayadoSolidoAnaranjado01 letra_capital">Resumen del contenido</a>:<br>\n'
'\t<p class="FuenteNoventaPorCiento" style="padding: 3%;">\n'
'\t\tCada una de las partes o divisiones establecidas para el contenido del Proyecto WWW\n'
'\t</p>\n'
'</div>\n\n'
)
else:
html += (
'<div>\n'
'\t<p>Sections established for the content of Proyecto WWW.</p><br>\n'
'\t<a id="resumen" class="subrayadoSolidoAnaranjado01 letra_capital">Content summary</a>:<br>\n'
'\t<p class="FuenteNoventaPorCiento" style="padding: 3%;">\n'
'\t\tEach of the parts or divisions established for the content of Proyecto WWW\n'
'\t</p>\n'
'</div>\n\n'
)
html += '<div class="contenedor_full">\n'
# Botones (pestañas) por categoría
for i, category in enumerate(sections.keys()):
is_active = (i == 0)
aria_expanded = 'true' if is_active else 'false'
id_attr = ' id="abrirPorDefecto"' if is_active else ''
# Nombre visible según idioma (Arte → Arte Visual, Art → Visual Art)
if language == 'es' and category == 'Arte':
visible_name = 'Arte Visual'
elif language == 'en' and category == 'Art':
visible_name = 'Visual Art'
else:
visible_name = category
html += (
f'<button class="etiqueta_enlace" aria-controls="{category.lower()}" '
f'aria-expanded="{aria_expanded}" tabindex="0" '
f'onclick="seccionAbrirPagina(\'{category.lower()}\', this)"{id_attr}>{visible_name}</button>\n'
)
# Contenido de cada categoría
for i, (category, items) in enumerate(sections.items()):
is_active = (i == 0)
aria_hidden = 'false' if is_active else 'true'
html += (
f'<div id="{category.lower()}" class="etiqueta_contenido" aria-hidden="{aria_hidden}">\n'
)
# Misma regla para el título de la sección
if language == 'es' and category == 'Arte':
visible_name = 'Arte Visual'
elif language == 'en' and category == 'Art':
visible_name = 'Visual Art'
else:
visible_name = category
html += f'\t<h3 class="separador">{visible_name}</h3>\n\t<ol style="font-size: 1.05882em;">\n'
for name, url in items:
if url:
safe_name = name.replace("'", "'").replace('"', """)
html += f"\t\t<li><a href='{url}'>{safe_name}</a>.</li>\n"
html += '\t</ol>\n</div>\n'
html += '</div><!-- contenedor_full -->\n'
# Sección Destacados / Featured
if language == 'es':
html += (
'<div id="destacados">\n'
'\t<h3>Destacados</h3>\n'
'\t<ol>\n'
"\t\t<li>\n"
"\t\t\t<a href='https://www.proyectowww.com.ar/search/label/DESTACADOS'>\n"
"\t\t\t\tListado de los artículos mas notorios o relevantes, de todas las secciones\n"
"\t\t\t</a>.\n"
"\t\t</li>\n"
'\t</ol>\n'
'</div>\n'
)
else:
html += (
'<div id="destacados">\n'
'\t<h3>Featured</h3>\n'
'\t<ol>\n'
"\t\t<li>\n"
"\t\t\t<a href='https://www.proyectowww.com.ar/search/label/DESTACADOS'>\n"
"\t\t\t\tList of the most notable or relevant articles from all sections\n"
"\t\t\t</a>.\n"
"\t\t</li>\n"
'\t</ol>\n'
'</div>\n'
)
return html
def main():
# Versión en español: leer secciones-es.txt → output-es.html
sections_es = read_sections('secciones-es.txt')
html_es = write_html_and_jsonld(sections_es, language='es')
with open('output-es.html', 'w', encoding='utf-8') as f:
f.write(html_es)
# Versión en inglés: leer secciones-en.txt → output-en.html
sections_en = read_sections('secciones-en.txt')
html_en = write_html_and_jsonld(sections_en, language='en')
with open('output-en.html', 'w', encoding='utf-8') as f:
f.write(html_en)
if __name__ == "__main__":
main()