ππ spreadsheet-intelligence#
Contents:
π€ What is Spreadsheet Intelligence?#
Spreadsheet Intelligence parses the XML of Excel files to load various data and enhance the RAG performance of Excel files using LLM.
Currently, it supports the conversion of system configuration diagrams consisting of autoshapes in Excel, and it is a powerful tool reported in our paper to overcome the limitations of VLM in diagram interpretation.
β‘οΈ Quick Install#
With pip (coming soon):
pip install spreadsheet_intelligence
π Quick Start#
Extract autoshape information from Excel and convert it to JSON format. .. code-block:: python
from spreadsheet_intelligence.core.excel_autoshape_loader import ExcelAutoshapeLoader
loader = ExcelAutoshapeLoader(file_path=βpath/to/your/excel/file.xlsxβ) loader.load() autoshape_info_json = loader.export2json() print(autoshape_info_json)
Output example:
{
"connectors": [
{
"type": "straightConnector1",
"arrowType": "bidirectional",
"color": "#000000",
"startX": "8.47",
"startY": "8.77",
"endX": "18.30",
"endY": "8.77"
},
{
"type": "bentConnector3",
"arrowType": "unidirectional",
"color": "#000000",
"startX": "14.75",
"startY": "4.74",
"StartArrowHeadDirection": "left",
"endX": "21.59",
"endY": "6.00",
"EndArrowHeadDirection": "right"
}
],
"shapes": [
{
"shapeType": "round_rect",
"fillColor": "#156082",
"borderColor": "#0E2841",
"left": "1.41",
"top": "5.52",
"right": "39.13",
"bottom": "23.40",
"text": null
}
]
}
ποΈ Project Structure#
This package is mainly composed of the following 5 packages:
spreadsheet_intelligence/
βββ core/
β βββ excel_autoshape_loader.py
βββ models/
β βββ converted/
β βββ raw/
βββ parsers/
βββ converters/
βββ formatters/
βββ ...
π₯ Processing Flow#
The Excel file loaded as XML is processed in the following flow:
It is parsed by
parsers
in a nearly raw state and stored inRaw
models.It is converted by
converters
from the XML representation to a structure that is easy for humans (LLM) to understand and stored inConverted
models.It is converted by
formatters
from theConverted
models to JSON format data that can be directly used in LLM prompts.
Basically, by using the ExcelAutoshapeLoader
class in the core
package, this flow can be wrapped and executed.
π§ Customizability#
Mainly extendable in the following ways:
Extend the data retrieved from XML -> Extend by inheriting from
parsers
Extend the data conversion methods -> Extend by inheriting from
converters
Extend the output data format -> Extend by inheriting from
formatters