Skip to content

create document

1. Key Features:

  • File Handling and Document Editing: The code handles file uploads, document editing (Word documents in .doc and .docx formats), and PDF generation from those documents. Users can upload files, modify their contents by replacing placeholders, and generate PDFs.

  • Conversion and Format Handling: It converts .doc files to .docx format if necessary, using LibreOffice for conversion. It also generates PDFs from Word documents, ensuring that the document’s formatting remains intact during conversions.

  • CSRF Protection: The code uses CSRF protection (imported from a module) to prevent cross-site request forgery attacks, although specific details of the implementation are not shown here.

  • Table and Placeholder Management: The code supports inserting tables into Word documents before specific keywords and replacing placeholders in both text and table cells.

  • Retry Mechanism: The code has retry mechanisms for some operations, such as converting files or replacing values in documents, ensuring robustness in case of failures.

  • Environment Variable Configuration: It uses an environment variable (BACK_END_URL) to construct URLs for file paths dynamically.


2. Key Functionalities:

  • process_edit_document(): This is the core function for editing Word documents. It receives data via a POST request, modifies the document (replacing placeholders with provided values), and then saves the modified document to a designated folder.

    • Placeholder Replacement: It iterates over the placeholders in the document and replaces them with the corresponding values from the request.

    • Table Insertion: If there is a request to insert a table, it identifies the placeholder for the table and inserts it dynamically.

  • format_dates(): This function formats date values in the document from a standard format (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS) to a more readable format (DD Month YYYY).

  • retry_on_failure(): This helper function retries certain operations (like document processing or conversion) multiple times before failing.

  • upload_file(): Handles file uploads. It saves the uploaded file to a specific folder and returns the URL of the uploaded file.

  • PDF Generation Routes: There are multiple routes (/generate-pdf-TOR, /generate-pdf-EOI, /generate-pdf-ITQ, etc.) that generate PDFs from Word documents stored in a database. The code interacts with LibreOffice to convert .docx files to PDFs.

  • Error Handling: Each function has comprehensive error handling, logging failures with detailed error messages.


3. Data Fetching and Step Handling:

  • Data Fetching:

    • The data required to modify the document is fetched from the POST request in the process_edit_document() route. This includes file_path, values_to_change, file_name, and is_table.

    • Data is also fetched from the MongoDB database using the get_tender_collection() method in the /generate-pdf-* routes, where it looks up documents based on the serial-id parameter.

    • For the PDF generation routes, the file paths of the documents are stored in the database (tender['gov_attachment']), and the relevant document is fetched based on the serial-id query parameter.

  • Step Handling:

    • Document Processing: The process begins with receiving data (like file path and values to change). Then, the document is opened, values are replaced, and tables (if specified) are inserted. Finally, the document is saved and a URL is returned to the frontend.

    • File Conversion: If the file is a .doc file, it is converted to .docx using LibreOffice. This step ensures that the document can be processed by the Python python-docx library.

    • PDF Generation: For generating PDFs, the system looks for the document using the serial-id in the database, converts the Word document to PDF using LibreOffice, and returns the path or file to the frontend.

  • Flow of Information:

    1. Document Editing Flow:

      • The client sends a request with document details.
      • The server fetches the file, processes placeholders, inserts tables (if needed), and returns the modified document URL.
    2. PDF Generation Flow:

      • The client provides a serial-id.
      • The server fetches the document from the database, converts it to PDF, and sends the generated PDF file or URL back.

Code Reference and Explanation:

Here’s a small part of the code that handles the document processing:

@document_bp.route('/edit-doc', methods=['POST'])
@csrf.exempt
def process_edit_document():
data = request.get_json() # Get the request data as JSON
file_path = data.get('file_path') # Local file path
values_to_change = data.get('values_to_change') # Data for placeholder replacement
is_table = data.get('is_table') # Whether we need to insert a table
# Format dates in the values
formatted_data = format_dates(values_to_change)
values_to_change = formatted_data
# Check if table is to be inserted and process accordingly
if is_table:
table_keyword, table = extract_table(values_to_change) # Extract table data
values_to_change = [item for item in values_to_change if item] # Clean up the data
# Open the Word document
doc = Document(file_path)
replace_values_in_document(doc, values_to_change) # Replace values in the document
# Save the modified document
doc.save("path_to_save/modified_document.docx")
return jsonify({"status": True, "message": "Document processed successfully"}), 200
  • Key Operation:
    • It retrieves the document file path and data to replace in the placeholders.
    • It processes the placeholders (and table data if required).
    • The document is then modified and saved with the updated content.