πŸ”­πŸ“Š spreadsheet-intelligence#

https://img.shields.io/badge/license-Apache%202.0-blue.svg

πŸ€” What is Spreadsheet Intelligence?#

Spreadsheet Intelligence parses the XML of Excel files to load various data and enhance the RAG performance of Excel files using LLM.

Currently, it supports the conversion of system configuration diagrams consisting of autoshapes in Excel, and it is a powerful tool reported in our paper to overcome the limitations of VLM in diagram interpretation.

https://img.shields.io/badge/arXiv-2502.04389v1-blue.svg

⚑️ Quick Install#

With pip (coming soon):

pip install spreadsheet_intelligence

πŸš€ Quick Start#

Extract autoshape information from Excel and convert it to JSON format. .. code-block:: python

from spreadsheet_intelligence.core.excel_autoshape_loader import ExcelAutoshapeLoader

loader = ExcelAutoshapeLoader(file_path=”path/to/your/excel/file.xlsx”) loader.load() autoshape_info_json = loader.export2json() print(autoshape_info_json)

Output example:

{
    "connectors": [
        {
            "type": "straightConnector1",
            "arrowType": "bidirectional",
            "color": "#000000",
            "startX": "8.47",
            "startY": "8.77",
            "endX": "18.30",
            "endY": "8.77"
        },
        {
            "type": "bentConnector3",
            "arrowType": "unidirectional",
            "color": "#000000",
            "startX": "14.75",
            "startY": "4.74",
            "StartArrowHeadDirection": "left",
            "endX": "21.59",
            "endY": "6.00",
            "EndArrowHeadDirection": "right"
        }
    ],
    "shapes": [
        {
            "shapeType": "round_rect",
            "fillColor": "#156082",
            "borderColor": "#0E2841",
            "left": "1.41",
            "top": "5.52",
            "right": "39.13",
            "bottom": "23.40",
            "text": null
        }
    ]
}

πŸ—‚οΈ Project Structure#

This package is mainly composed of the following 5 packages:

spreadsheet_intelligence/
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ excel_autoshape_loader.py
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ converted/
β”‚   β”œβ”€β”€ raw/
β”œβ”€β”€ parsers/
β”œβ”€β”€ converters/
β”œβ”€β”€ formatters/
β”œβ”€β”€ ...

πŸ”₯ Processing Flow#

The Excel file loaded as XML is processed in the following flow:

  1. It is parsed by parsers in a nearly raw state and stored in Raw models.

  2. It is converted by converters from the XML representation to a structure that is easy for humans (LLM) to understand and stored in Converted models.

  3. It is converted by formatters from the Converted models to JSON format data that can be directly used in LLM prompts.

Basically, by using the ExcelAutoshapeLoader class in the core package, this flow can be wrapped and executed.

πŸ”§ Customizability#

Mainly extendable in the following ways:

  • Extend the data retrieved from XML -> Extend by inheriting from parsers

  • Extend the data conversion methods -> Extend by inheriting from converters

  • Extend the output data format -> Extend by inheriting from formatters