Implementa funcionalidade de envio de XML para validação e transformação para PDF #20

pitangainnovare · 2025-08-18T15:10:41Z

O que esse PR faz?

Criação do app xml_manager, responsável por gerenciar o envio e processamento de arquivos XML. Ele recebe um arquivo, valida sua estrutura e conteúdo, e dispara processos de conversão (PDF).

Onde a revisão poderia começar?

No arquivo xml_manager/views.py, que contém a lógica principal de recebimento e disparo do processamento.

Como este poderia ser testado manualmente?

Subir o projeto.
Acessar a URL de envio de XML.
Enviar um XML válido.
Conferir a listagem no template e verificar se as tasks de validação e conversão foram disparadas.

Algum cenário de contexto que queira dar?

Essa é a primeira versão do módulo, servindo como base para evoluções futuras. Algumas partes ainda estão pendentes (modelo e task de PDF).

Screenshots

N/A.

Quais são tickets relevantes?

Sem vínculo com issues até o momento.

Referências

Código utilitário reaproveitado de outro repositório.

Remove anotação que desabilitava proteção CSRF Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

robertatakenaka · 2025-08-20T13:02:05Z

core/utils/utils.py

@@ -0,0 +1,94 @@
+import logging


@pitangainnovare a pasta já se chama utils, obtenha o requester.py do scms-upload para as funções de requests
https://github.com/scieloorg/scms-upload/tree/main/core/utils

robertatakenaka · 2025-08-20T13:05:14Z

xml_manager/models.py

+        null=True,
+        verbose_name=_("Exceptions File")
+    )
+    pdf_file = models.FileField(


Crie um modelo, em que tenha idioma. Além disso, cada XML tem que ter 0 ou mais PDF

robertatakenaka · 2025-08-20T13:05:41Z

xml_manager/models.py

+        null=True,
+        verbose_name=_("PDF File")
+    )
+    html_file = models.FileField(


Crie um modelo, em que tenha idioma. Além disso, cada XML tem que ter 0 ou mais HTML

robertatakenaka · 2025-08-20T13:06:56Z

xml_manager/tasks.py

+
+@celery_app.task(bind=True, name=_('Process XML Document'), timelimit=-1)
+def task_process_xml_document(self, xml_id, user_id=None, username=None):
+    user = _get_user(self.request, username=username, user_id=user_id)


@pitangainnovare Esta linha pode ser após o bloco try/except. E como está definindo o user aqui, então ...

robertatakenaka · 2025-08-20T13:07:39Z

xml_manager/tasks.py

+        return False
+
+    logging.info(f'Processing XML file {xml_document.xml_file.name}.')
+    task_validate_xml_file.delay(xml_id, user_id=user_id, username=username)


@pitangainnovare então aqui use user.id e não precisará de username_username

Nos outros projetos (usage, upload, core) adotamos essa abordagem de informar ou user_id ou username. Isso possibilita chamar as coisas via periodic_task na interface passando username e via outras formas usando user_id.

robertatakenaka · 2025-08-20T13:11:44Z

xml_manager/wagtail_hooks.py

+    ]
+
+
+class XMLDocumentSnippetViewSet(SnippetViewSet):


@pitangainnovare coloque View em views.py

Isso é uma classe do Wagtail - é o SnippetViewSet (aquele que substitui "WagtailAdminView" quando subimos o wagtail para versões mais novas).

robertatakenaka · 2025-08-20T13:12:26Z

xml_manager/wagtail_hooks.py

+
+
+@hooks.register('register_admin_urls')
+def register_admin_urls():


@pitangainnovare isso deveria ficar em urls?

As funções com @hooks.register precisam ficar em wagtail_hooks.py, porque o Wagtail carrega esse arquivo automaticamente quando inicia.

"hooks" são uma forma de extender o wagtail.

robertatakenaka

@pitangainnovare a mudança realmente necessária é a modelagem que permita associar 1 XML a vários PDF e HTML

Copilot

Pull Request Overview

This PR introduces a new Django app called xml_manager for handling XML file processing and conversion. The app provides functionality to upload, validate, and convert XML files to PDF and HTML formats using asynchronous task processing.

Creation of complete XML manager app with models for XML documents and their generated outputs
Implementation of async task processing for validation and file conversion workflows
Integration with Wagtail admin interface for file management and processing controls

Reviewed Changes

Copilot reviewed 28 out of 32 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
xml_manager/models.py	Defines core models for XML documents and their PDF/HTML derivatives
xml_manager/tasks.py	Implements Celery tasks for XML validation and conversion processes
xml_manager/utils.py	Contains utility functions for XML validation and PDF/HTML generation
xml_manager/wagtail_hooks.py	Configures Wagtail admin interface and custom table columns
xml_manager/views.py	Provides view for triggering XML processing tasks
tracker/models.py	Adds XMLDocumentEvent model and refactors existing event tracking
requirements/base.txt	Adds packtools and langdetect dependencies
compose/*/django/Dockerfile	Installs LibreOffice and fonts for document conversion

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-28T00:08:21Z

xml_manager/tasks.py

+            message=str(e),
+            save=True,
+        )
+        return False


The task_generate_html_file function is missing a return statement for the successful execution path. After line 192 (html_instance.save()), the function should return True to indicate successful completion.

Copilot · 2025-08-28T00:08:21Z

xml_manager/utils.py

+        os.makedirs(output_root_dir)
+
+    # ToDo: Implement HTML generation logic here
+    return


The generate_html_for_xml_document function returns None but the calling code in tasks.py expects it to return path_html and lang values. This will cause an unpacking error when the task tries to assign the return values.

Suggested change

return

return None, None

Copilot · 2025-08-28T00:08:22Z

xml_manager/tasks.py

+            message=str(e),
+            save=True,
+        )
+    except exceptions.XML_File_HTML_Generation_Error as e:


The exception class XML_File_HTML_Generation_Error is referenced but not defined in the xml_manager.exceptions module. This will cause a NameError when this exception path is executed.

Copilot · 2025-08-28T00:08:22Z

xml_manager/tasks.py

+
+    user = _get_user(self.request, username=username, user_id=user_id)
+
+    params = {'libreoffice_binary': libreoffice_binary,}


[nitpick] There's an unnecessary trailing comma after the dictionary value. Consider removing it for cleaner code formatting.

Suggested change

params = {'libreoffice_binary': libreoffice_binary,}

params = {'libreoffice_binary': libreoffice_binary}

Copilot · 2025-08-28T00:08:22Z

xml_manager/wagtail_hooks.py

+
+    list_display = (
+        "xml_document",
+        LinkColumn("pdf_file", "PDF file"),


The LinkColumn constructor is missing the 'label' parameter name. It should be LinkColumn("pdf_file", label="PDF file") to match the pattern used in other list_display entries.

Suggested change

LinkColumn("pdf_file", "PDF file"),

LinkColumn("pdf_file", label="PDF file"),

Copilot · 2025-08-28T00:08:22Z

xml_manager/wagtail_hooks.py

+    list_display = (
+        "xml_document",
+        LinkColumn("pdf_file", "PDF file"),
+        LinkColumn("docx_file", "DOCX file"),


The LinkColumn constructor is missing the 'label' parameter name. It should be LinkColumn("docx_file", label="DOCX file") to match the pattern used in other list_display entries.

Suggested change

LinkColumn("docx_file", "DOCX file"),

LinkColumn("docx_file", label="DOCX file"),

Copilot · 2025-08-28T00:08:23Z

xml_manager/wagtail_hooks.py

+
+    list_display = (
+        "xml_document",
+        LinkColumn("html_file", "HTML file"),


The LinkColumn constructor is missing the 'label' parameter name. It should be LinkColumn("html_file", label="HTML file") to match the pattern used in other list_display entries.

Suggested change

LinkColumn("html_file", "HTML file"),

LinkColumn("html_file", label="HTML file"),

Copilot · 2025-08-28T00:08:23Z

xml_manager/utils.py

+    if not os.path.exists(output_root_dir):
+        os.makedirs(output_root_dir)
+
+    # ToDo: Implement HTML generation logic here


[nitpick] Use standard spelling 'TODO' instead of 'ToDo' for consistency with common development conventions.

Suggested change

# ToDo: Implement HTML generation logic here

# TODO: Implement HTML generation logic here

pitangainnovare added 10 commits August 18, 2025 11:57

cria app xml_manager

b86f8be

Adiciona urls para enviar e processar xmls

2777043

cria view para receber xml e disparar processamento

2e328bf

adiciona módulo admin e apps

1538b3a

adiciona utils.py a partir de outro repositório

df202bd

adiciona template para processar e listar documentos (customizados)

ced61ce

cria modelo para gerenciar arquivos XML (um para um) - a corrigir

4e65b8b

cria task para processar xml - chama processos de validate, pdf e html

8f3c5bd

cria task para validar xml

c94f4d4

cria tasks para gerar pdf e html (há ToDos) - a completar

50a5dd0

pitangainnovare requested review from Copilot and robertatakenaka August 18, 2025 15:10

This comment was marked as outdated.

Sign in to view

Update xml_manager/views.py

321ef80

Remove anotação que desabilitava proteção CSRF Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

robertatakenaka reviewed Aug 20, 2025

View reviewed changes

robertatakenaka requested changes Aug 20, 2025

View reviewed changes

pitangainnovare added 8 commits August 24, 2025 21:50

remove anotação name em definição de tasks

1d88857

define _get_user (similar ao upload) e muda lugar em que é chamado

8d7749a

apaga utils.py

9134e57

acrescenta requester.py

c4009b2

inclui xml_manager em lista de apps de settings.py

c794bad

inclui dependências packtools e langdetect

7be9c34

Cria modelos XMLDocumentPDF e XMLDocumentHTML e altera XMLDocument

318ab05

organiza menu de apps

76c87fc

pitangainnovare added 19 commits August 25, 2025 15:20

Cria choices relacionados a ocorrência de eventos ao processar XML

d7359b7

exclui exceções não usadas e cria outras

e781091

refatora tracker para conter eventos associados ao document_xml e gerais

a4e79ab

melhora forma de processar XML - remove views/templates desnecessários

e031845

refatora models para ter xmldocument, xmldocument_pdf e xmldocument_html

16299be

melhora wagtail hooks para apresentar os modelos de documentxml*

5b48f5d

melhora tasks e implementa processamento de xml para pdf

8404fa5

migrações de xml_manager

8c59d47

adiciona layout padrao docx

ae6f657

adiciona instruções para install de libreoffice

e9446e4

Cria mais choices para rotular erros durante fluxo de geração de pdf

27c071b

cria campo docx em XMLDocument_PDF para acomodar DOCX intermediário

e68ebd7

Passa a usar LinkColumn para permitir download em listagem de dados

5d137f5

Cria exceções para lidar com pipeline de conversão a pdf

8e93c7b

cria utilitário com funções para validar xml e gerar pdf

d86cf3c

corrige imports em tasks

4b4202a

refatora tasks para usar utilitário novo e tratar erros com detalhes

764b54c

gera migrações

66bbe00

atualiza dep packtools para 4.11.22

6ac1c76

pitangainnovare requested review from Copilot and robertatakenaka August 28, 2025 00:06

Copilot AI reviewed Aug 28, 2025

View reviewed changes

robertatakenaka merged commit b65657a into scieloorg:main Sep 1, 2025
2 of 3 checks passed



		@hooks.register('register_admin_urls')
		def register_admin_urls():


		user = _get_user(self.request, username=username, user_id=user_id)

		params = {'libreoffice_binary': libreoffice_binary,}

	LinkColumn("pdf_file", "PDF file"),
	LinkColumn("pdf_file", label="PDF file"),

	LinkColumn("docx_file", "DOCX file"),
	LinkColumn("docx_file", label="DOCX file"),

	LinkColumn("html_file", "HTML file"),
	LinkColumn("html_file", label="HTML file"),

	# ToDo: Implement HTML generation logic here
	# TODO: Implement HTML generation logic here

Implementa funcionalidade de envio de XML para validação e transformação para PDF #20

Implementa funcionalidade de envio de XML para validação e transformação para PDF #20

Uh oh!

Conversation

pitangainnovare commented Aug 18, 2025

O que esse PR faz?

Onde a revisão poderia começar?

Como este poderia ser testado manualmente?

Algum cenário de contexto que queira dar?

Screenshots

Quais são tickets relevantes?

Referências

Uh oh!

This comment was marked as outdated.

Uh oh!

robertatakenaka Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertatakenaka Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertatakenaka left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robertatakenaka Aug 20, 2025 •

edited

Loading

robertatakenaka Aug 20, 2025 •

edited

Loading