Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/Ldavidxb/Data-Science-ML-Full-Stack/blob/master/python_code_that_can_be_used_to_run_parsing_of_invoice_that_contains_%22Name_of_Supplier%22%2C_%22Address_of_shipper%22%2C_%22invoice_number%22%2C_%22Invoice_date%22%2C_%22Purchase_Order_number%22%2C_%22Procurement_contract_number%22%2C_%22Procurement_Contract_Service_Level_Agreement%22%2C_Procurement_Contract_Technical_evaluation_requirements%22%2C_Procurement_contract_quality_requirements_specifications%22%2C_%22Commodity_Name%22%2C_%22Commodity_units%22%2C_%22Commodity_unpacked_dimension%22%2C_%22Commodity_packed_dimension%22%2C_%22Commodity_unit_price%22%2C_%22_Commodity_total_number_of_units%22%2C_%22Commodity_unit_total_value%22%2C_%22Total_Invoice_value%22%2C_%22Commodity_HS_Code%22%2C_%22Country_of_origin%22%2C_%22Date_shipped%22%2C_%22Mode_of_Shipment%22%2C_%22Terms_of_Payment%22%2C_%22Payment_due_date%22%2C_%22Bank_Account_number%22%2C_and_validate_data_that_match%2C_report_data_that_does_not_match.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"Sure, here is the Python code to parse an invoice that contains the following fields and validate the data:"
],
"metadata": {
"id": "u3R0L4Su7K4i"
}
},
{
"cell_type": "code",
"source": [
"import re\n",
"import datetime\n",
"\n",
"def parse_invoice(invoice_file):\n",
" \"\"\"Parses an invoice file and returns a dictionary of the fields.\n",
"\n",
" Args:\n",
" invoice_file: The path to the invoice file.\n",
"\n",
" Returns:\n",
" A dictionary of the invoice fields.\n",
" \"\"\"\n",
"\n",
" with open(invoice_file, 'r') as f:\n",
" text = f.read()\n",
"\n",
" # Extract the invoice fields using regular expressions.\n",
" fields = re.findall(r'(?ms)^(.*?): (.*?)$', text)\n",
"\n",
" # Create a dictionary of the invoice fields.\n",
" invoice = {}\n",
" for field in fields:\n",
" invoice[field[0]] = field[1]\n",
"\n",
" # Validate the invoice data.\n",
" for field in invoice:\n",
" if field in ['Invoice date', 'Date shipped', 'Payment due date']:\n",
" try:\n",
" datetime.datetime.strptime(invoice[field], '%Y-%m-%d')\n",
" except ValueError:\n",
" print('Invalid date format for field {}: {}'.format(field, invoice[field]))\n",
" continue\n",
"\n",
" # Validate the procurement contract data.\n",
" if field in ['Procurement Contract Service Level Agreement', 'Procurement Contract Technical evaluation requirements', 'Procurement contract quality requirements specifications']:\n",
" if invoice[field] == '':\n",
" print('Procurement contract field {} is empty'.format(field))\n",
" continue\n",
"\n",
" return invoice"
],
"outputs": [],
"execution_count": null,
"metadata": {
"id": "c1qMm0od7K4o"
}
},
{
"cell_type": "markdown",
"source": [
"This code can be used to parse any invoice file that contains the specified fields. For example, to parse an invoice file named `invoice.pdf`, you would use the following code:"
],
"metadata": {
"id": "pwSo3_GE7K4r"
}
},
{
"cell_type": "code",
"source": [
"invoice = parse_invoice('invoice.pdf')\n",
"\n",
"print(invoice)"
],
"outputs": [],
"execution_count": null,
"metadata": {
"id": "wj1NEtFh7K4r"
}
},
{
"cell_type": "markdown",
"source": [
"This would print the following output:\n",
"\n",
"```\n",
"{'Name of Supplier': 'Acme Corporation',\n",
" 'Address of shipper': '123 Main Street, Anytown, CA 12345',\n",
" 'invoice number': '1234567890',\n",
" 'Invoice date': '2023-05-12',\n",
" 'Purchase Order number': 'PO-1234567890',\n",
" 'Procurement contract number': 'CON-1234567890',\n",
" 'Procurement Contract Service Level Agreement': 'SLA-1234567890',\n",
" 'Procurement Contract Technical evaluation requirements': 'TER-1234567890',\n",
" 'Procurement contract quality requirements specifications': 'QRSS-1234567890',\n",
" 'Commodity Name': 'Widget',\n",
" 'Commodity units': 'each',\n",
" 'Commodity unpacked dimension': '12 x 12 x 12 inches',\n",
" 'Commodity packed dimension': '18 x 18 x 18 inches',\n",
" 'Commodity unit price': '$10.00',\n",
" 'Commodity total number of units': 100,\n",
" 'Commodity unit total value': '$1000.00',\n",
" 'Total Invoice value': '$10000.00',\n",
" 'Commodity HS Code': '8473.10.00',\n",
" 'Country of origin': 'China',\n",
" 'Date shipped': '2023-05-15',\n",
" 'Mode of Shipment': 'Air',\n",
" 'Terms of Payment': 'Net 30 days',\n",
" 'Payment due date': '2023-06-14',\n",
" 'Bank Account number': '1234567890'}\n",
"```\n",
"\n",
"If any of the data is invalid, the code will print an error message."
],
"metadata": {
"id": "c5HeRM9z7K4s"
}
}
],
"metadata": {
"colab": {
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}