加载器教程

许多现有系统和平台都支持从 CSV 文件加载数据。许多用户更喜欢使用电子表格软件和多表文件格式（包括 XLSX）。SheetJS 库通过将复杂的工作簿转换为简单的 CSV 数据来帮助弥补差距。

¥Many existing systems and platforms include support for loading data from CSV files. Many users prefer to work in spreadsheet software and multi-sheet file formats including XLSX. SheetJS libraries help bridge the gap by translating complex workbooks to simple CSV data.

此示例的目标是将电子表格数据加载到向量存储中，并使用大型语言模型根据英语输入生成查询。现有工具支持 CSV，但不支持真正的电子表格。

¥The goal of this example is to load spreadsheet data into a vector store and use a large language model to generate queries based on English language input. The existing tooling supports CSV but does not support real spreadsheets.

在 "SheetJS 转换" 中，我们将使用 SheetJS 库为 LangChainJS CSV 加载器生成 CSV 文件。这些转换可以在预处理步骤中运行，而不会破坏现有的 CSV 工作流程。

¥In "SheetJS Conversion", we will use SheetJS libraries to generate CSV files for the LangChainJS CSV loader. These conversions can be run in a preprocessing step without disrupting existing CSV workflows.

在 "SheetJS 加载器" 中，我们将在自定义 LoadOfSheet 数据加载器中使用 SheetJS 库直接生成文档和元数据。

¥In "SheetJS Loader", we will use SheetJS libraries in a custom LoadOfSheet data loader to directly generate documents and metadata.

"SheetJS 加载器演示" 是一个完整的演示，它使用 SheetJS Loader 根据 XLS 工作簿中的数据回答问题。

¥"SheetJS Loader Demo" is a complete demo that uses the SheetJS Loader to answer questions based on data from a XLS workbook.

测试部署

此演示在以下配置中进行了测试：

¥This demo was tested in the following configurations:

平台	架构	日期
NVIDIA RTX 5090（32 GB 显存）+ Ryzen Z1 Extreme（24 GB 显存）	`win11-x64`	2025-06-17
NVIDIA RTX 4090（24 GB VRAM）+ Ryzen Z1 Extreme（16 GB RAM）	`win11-x64`	2025-04-17
NVIDIA RTX 4090（24 GB VRAM）+ Ryzen Z1 Extreme（16 GB RAM）	`linux-x64`	2025-01-28
AMD RX 7900 XTX（24 GB VRAM）+ Ryzen Z1 Extreme（16 GB RAM）	`win11-x64`	2025-06-20
AMD RX 7900 XTX（24 GB VRAM）+ Ryzen Z1 Extreme（16 GB RAM）	`linux-x64`	2025-01-29
Intel Arc B580 (12 GB VRAM) + Ryzen Z1 Extreme (24 GB RAM)	`win11-x64`	2025-06-20
Intel Arc B580 (12 GB VRAM) + Ryzen Z1 Extreme (16 GB RAM)	`linux-x64`	2025-02-08
Apple M4 Max 16 核 CPU + 40 核 GPU（48 GB 统一内存）	`darwin-arm`	2025-03-06
Apple M3 Ultra 28 核 CPU + 60 核 GPU（96 GB 统一内存）	`darwin-arm`	2025-06-24
Apple M2 Max 12 核 CPU + 30 核 GPU（32 GB 统一内存）	`darwin-arm`	2025-03-25

SheetJS 用户已在其他配置中验证了此演示：

¥SheetJS users have verified this demo in other configurations:

Other tested configurations (click to show)

Platform	Architecture	Demo
NVIDIA L40 (48 GB VRAM) + i9-13900K (32 GB RAM)	`linux-x64`	LangChainJS
NVIDIA RTX 4080 SUPER (16 GB VRAM) + Ryzen Z1 Extreme (16 GB RAM)	`win11-x64`	LangChainJS
NVIDIA RTX 4070 Ti SUPER (16 GB VRAM) + Ryzen Z1 Extreme (16 GB RAM)	`win11-x64`	LangChainJS
NVIDIA RTX 4070 Ti (12 GB VRAM) + Ryzen 7 5800x (64 GB RAM)	`win11-x64`	LangChainJS
NVIDIA RTX 4060 (8 GB VRAM) + Ryzen 7 5700g (32 GB RAM)	`win11-x64`	LangChainJS
NVIDIA RTX 3090 (24 GB VRAM) + Ryzen 9 3900XT (128 GB RAM)	`win11-x64`	LangChainJS
NVIDIA RTX 3080 (12 GB VRAM) + Ryzen 7 5800X (32 GB RAM)	`win11-x64`	LangChainJS
NVIDIA RTX 3070 (8 GB VRAM) + Ryzen Z1 Extreme (16 GB RAM)	`win11-x64`	LangChainJS
NVIDIA RTX 3060 (12 GB VRAM) + i5-11400 (32 GB RAM)	`win10-x64`	LangChainJS
NVIDIA RTX 2080 (12 GB VRAM) + i7-9700K (16 GB RAM)	`win10-x64`	LangChainJS
NVIDIA RTX 2070 (8 GB VRAM) + Ryzen 7 3700x (80 GB RAM)	`linux-x64`	LangChainJS
NVIDIA RTX 2060 (6 GB VRAM) + Ryzen 5 3600 (32 GB RAM)	`win10-x64`	LangChainJS
NVIDIA GTX 1080 (8 GB VRAM) + Ryzen 7 5800x (64 GB RAM)	`win10-x64`	LangChainJS
NVIDIA GTX 1070 (8 GB VRAM) + Ryzen 7 7700x (32 GB RAM)	`win11-x64`	LangChainJS
AMD RX 6800 XT (16 GB VRAM) + Ryzen Z1 Extreme (16 GB RAM)	`win11-x64`	LangChainJS
Apple M4 10-Core CPU + 10-Core GPU (24 GB unified memory)	`darwin-arm`	LangChainJS

特别感谢以下用户使用多种配置进行测试：

¥Special thanks to the following users for testing with multiple configurations:

CSV 加载器

¥CSV Loader

此解释已针对 LangChainJS 0.2 进行了验证。

¥This explanation was verified against LangChainJS 0.2.

文档加载器从数据源生成数据对象（"documents"）和相关元数据。

¥Document loaders generate data objects ("documents") and associated metadata from data sources.

LangChainJS 提供了一个 CSVLoader[^1] 组件，用于从文件加载 CSV 数据：

¥LangChainJS offers a CSVLoader[^1] component for loading CSV data from a file:

Generating Documents from a CSV file
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";

const loader = new CSVLoader("pres.csv");
const docs = await loader.load();

console.log(docs);

CSV 加载器使用第一行确定列标题并为每个数据行生成一个文档。例如，以下 CSV 保存总统数据：

¥The CSV loader uses the first row to determine column headers and generates one document per data row. For example, the following CSV holds Presidential data:

Name,Index
Bill Clinton,42
GeorgeW Bush,43
Barack Obama,44
Donald Trump,45
Joseph Biden,46

每个数据行都转换为一个文档，其内容是属性和值的列表。例如，第三个数据行如下所示：

¥Each data row is translated to a document whose content is a list of attributes and values. For example, the third data row is shown below:

CSV Row	Document Content
`Name,Index Barack Obama,44`	`Name: Barack Obama Index: 44`

LangChainJS CSV 加载器将在文档中包含源元数据：

¥The LangChainJS CSV loader will include source metadata in the document:

Document generated by the CSV loader
Document {
  pageContent: 'Name: Barack Obama\nIndex: 44',
  metadata: { source: 'pres.csv', line: 3 }
}

SheetJS 转换

¥SheetJS Conversion

SheetJS NodeJS 模块可以导入使用 LangChainJS 和其他 JavaScript 库的 NodeJS 脚本中。

¥The SheetJS NodeJS module can be imported in NodeJS scripts that use LangChainJS and other JavaScript libraries.

一个简单的预处理步骤可以将工作簿转换为可以由现有 CSV 工具处理的 CSV 文件：

¥A simple pre-processing step can convert workbooks to CSV files that can be processed by the existing CSV tooling:

从文件系统解析文件

¥Parsing files from the filesystem

SheetJS readFile 方法 [^2] 可以读取工作簿文件。该方法接受路径并返回符合 SheetJS 数据模型 [^3] 的工作簿对象。

¥The SheetJS readFile method[^2] can read workbook files. The method accepts a path and returns a workbook object that conforms to the SheetJS data model[^3].

/* Load SheetJS Libraries */
import { readFile, set_fs } from 'xlsx';

/* Load 'fs' for readFile support */
import * as fs from 'fs';
set_fs(fs);

/* Parse `pres.xlsx` */
const wb = readFile("pres.xlsx");

检查 SheetJS 工作簿和工作表对象

¥Inspecting SheetJS workbook and worksheet objects

Workbook 对象表示多工作表工作簿文件。它们存储单个工作表对象和其他元数据。

¥Workbook objects represent multi-sheet workbook files. They store individual worksheet objects and other metadata.

与此讨论相关，工作簿对象使用以下键 [^7]：

¥Relevant to this discussion, the workbook object uses the following keys[^7]:

SheetNames 是一个工作表名称数组

¥SheetNames is an array of worksheet names
Sheets 是一个对象，其键是工作表名称，其值是工作表对象。

¥Sheets is an object whose keys are sheet names and whose values are sheet objects.

SheetNames[0] 是第一个工作表名称，因此以下代码片段将从工作簿中提取第一个工作表：

¥SheetNames[0] is the first worksheet name, so the following snippet will pull the first worksheet from the workbook:

const first_ws = wb.Sheets[wb.SheetNames[0]];

将 SheetJS 工作表导出到 CSV

¥Exporting SheetJS worksheets to CSV

可以使用 SheetJS sheet_to_csv[^4] 方法将工作簿中的每个工作表写入 CSV 文本。该方法接受 SheetJS 工作表对象并返回字符串。

¥Each worksheet in the workbook can be written to CSV text using the SheetJS sheet_to_csv[^4] method. The method accepts a SheetJS worksheet object and returns a string.

const csv = utils.sheet_to_csv(first_ws);

完成脚本

¥Complete Script

例如，以下 NodeJS 脚本读取 pres.xlsx 并显示第一个工作表中的 CSV 行：

¥For example, the following NodeJS script reads pres.xlsx and displays CSV rows from the first worksheet:

Print CSV data from the first worksheet
/* Load SheetJS Libraries */
import { readFile, set_fs, utils } from 'xlsx';

/* Load 'fs' for readFile support */
import * as fs from 'fs';
set_fs(fs);

/* Parse `pres.xlsx` */
const wb = readFile("pres.xlsx");

/* Print CSV rows from first worksheet */
const first_ws = wb.Sheets[wb.SheetNames[0]];
const csv = utils.sheet_to_csv(first_ws);
console.log(csv);

类似的工作流程

¥Similar Workflows

许多演示涵盖了精神上相似的工作流程：

¥A number of demos cover spiritually similar workflows:

Stata、MATLAB 和 Maple 支持 XLSX 数据导入。SheetJS 集成从用户提供的电子表格生成干净的 XLSX 工作簿。

¥Stata, MATLAB and Maple support XLSX data import. The SheetJS integrations generate clean XLSX workbooks from user-supplied spreadsheets.
TensorFlow.js、熊猫和 Mathematica 支持 CSV 数据导入。SheetJS 集成生成干净的 CSV 并使用内置的 CSV 处理器。

¥TensorFlow.js, Pandas and Mathematica support CSV data import. The SheetJS integrations generate clean CSVs and use built-in CSV processors.
"命令行工具" 演示涵盖了制作用于文件转换的独立命令行工具的技术。

¥The "Command-Line Tools" demo covers techniques for making standalone command-line tools for file conversion.

单一工作表

¥Single Worksheet

对于单个工作表，SheetJS 预处理步骤可以将 CSV 行写入文件，CSVLoader 可以加载新写入的文件。

¥For a single worksheet, a SheetJS pre-processing step can write the CSV rows to file and the CSVLoader can load the newly written file.

Code example (click to hide)

Pulling data from the first worksheet of a workbook
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";
import { readFile, set_fs, utils } from 'xlsx';

/* Load 'fs' for readFile support */
import * as fs from 'fs';
set_fs(fs);

/* Parse `pres.xlsx`` */
const wb = readFile("pres.xlsx");

/* Generate CSV and write to `pres.xlsx.csv` */
const first_ws = wb.Sheets[wb.SheetNames[0]];
const csv = utils.sheet_to_csv(first_ws);
fs.writeFileSync("pres.xlsx.csv", csv);

/* Create documents with CSVLoader */
const loader = new CSVLoader("pres.xlsx.csv");
const docs = await loader.load();

console.log(docs);
// ...

工作簿

¥Workbook

工作簿是工作表的集合。每个工作表都可以导出到单独的 CSV。如果 CSV 写入子文件夹，DirectoryLoader[^5] 可以一步处理文件。

¥A workbook is a collection of worksheets. Each worksheet can be exported to a separate CSV. If the CSVs are written to a subfolder, a DirectoryLoader[^5] can process the files in one step.

Code example (click to hide)

在此示例中，脚本创建了一个名为 csv 的子文件夹。将处理工作簿中的每个工作表，并将生成的 CSV 存储到编号文件中。第一个工作表将存储到 csv/0.csv。

¥In this example, the script creates a subfolder named csv. Each worksheet in the workbook will be processed and the generated CSV will be stored to numbered files. The first worksheet will be stored to csv/0.csv.

Pulling data from the each worksheet of a workbook
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";
import { DirectoryLoader } from "langchain/document_loaders/fs/directory";
import { readFile, set_fs, utils } from 'xlsx';

/* Load 'fs' for readFile support */
import * as fs from 'fs';
set_fs(fs);

/* Parse `pres.xlsx`` */
const wb = readFile("pres.xlsx");

/* Create a folder `csv` */
try { fs.mkdirSync("csv"); } catch(e) {}

/* Generate CSV data for each worksheet */
wb.SheetNames.forEach((name, idx) => {
  const ws = wb.Sheets[name];
  const csv = utils.sheet_to_csv(ws);
  fs.writeFileSync(`csv/${idx}.csv`, csv);
});

/* Create documents with DirectoryLoader */
const loader = new DirectoryLoader("csv", {
  ".csv": (path) => new CSVLoader(path)
});
const docs = await loader.load();

console.log(docs);
// ...

SheetJS 加载器

¥SheetJS Loader

LangChainJS CSVLoader 不添加任何文档元数据，也不生成任何属性。自定义加载器可以绕过 CSV 工具中的限制，并可能包含没有 CSV 等效项的元数据。

¥The LangChainJS CSVLoader does not add any Document metadata and does not generate any attributes. A custom loader can work around limitations in the CSV tooling and potentially include metadata that has no CSV equivalent.

演示 LoadOfSheet 加载器将在所有工作表中为每个数据行生成一个文档。它还将尝试构建元数据和属性以用于自查询检索器。

¥The demo LoadOfSheet loader will generate one Document per data row across all worksheets. It will also attempt to build metadata and attributes for use in self-querying retrievers.

Sample usage
/* read and parse `data.xlsb` */
const loader = new LoadOfSheet("./data.xlsb");

/* generate documents */
const docs = await loader.load();

/* synthesized attributes for the SelfQueryRetriever */
const attributes = loader.attributes;

Sample SheetJS Loader (click to show)

This example loader pulls data from each worksheet. It assumes each worksheet includes one header row and a number of data rows.

loadofsheet.mjs
import { Document } from "@langchain/core/documents";
import { BufferLoader } from "langchain/document_loaders/fs/buffer";
import { read, utils } from "xlsx";

/**

 * Document loader that uses SheetJS to load documents.

 *  * Each worksheet is parsed into an array of row objects using the SheetJS

 * `sheet_to_json` method and projected to a `Document`. Metadata includes

 * original sheet name, row data, and row index
 */
export default class LoadOfSheet extends BufferLoader {
  /** @type {import("langchain/chains/query_constructor").AttributeInfo[]}  */
  attributes = [];

  /**

   * Document loader that uses SheetJS to load documents.

   *    * @param {string|Blob} filePathOrBlob Source Data
   */
  constructor(filePathOrBlob) {
    super(filePathOrBlob);
    this.attributes = [];
  }

  /**

   * Parse document

   *    * NOTE: column labels in multiple sheets are not disambiguated!

   *    * @param {Buffer} raw Raw data Buffer

   * @param {Document["metadata"]} metadata Document metadata

   * @returns {Promise<Document[]>} Array of Documents
   */
  async parse(raw, metadata) {
    /** @type {Document[]} */
    const result = [];

    this.attributes = [
      { name: "worksheet", description: "Sheet or Worksheet Name", type: "string" },
      { name: "rowNum", description: "Row index", type: "number" }
    ];

    const wb = read(raw, {type: "buffer", WTF:1});
    for(let name of wb.SheetNames) {
      const fields = {};
      const ws = wb.Sheets[name];
      if(!ws) return;

      const aoo = utils.sheet_to_json(ws);
      aoo.forEach((row, idx) => {
        result.push({
          pageContent: "Row " + (idx + 1) + " has the following content: \n" + Object.entries(row).map(kv => `- ${kv[0]}: ${kv[1]}`).join("\n") + "\n",
          metadata: {
            worksheet: name,
            rowNum: row["__rowNum__"],
            ...metadata,
            ...row
          }
        });
        Object.entries(row).forEach(([k,v]) => { if(v != null) (fields[k] || (fields[k] = {}))[v instanceof Date ? "date" : typeof v] = true } );
      });
      Object.entries(fields).forEach(([k,v]) => this.attributes.push({
        name: k, description: k, type: Object.keys(v).join(" or ")
      }));
    }

    return result;
  }
};

从文本到二进制

¥From Text to Binary

许多库和平台都提供通用的 "text" 加载器，它们假设 UTF-8 编码来处理文件。这会破坏许多电子表格格式，包括 XLSX、XLSB、XLSM 和 XLS。

¥Many libraries and platforms offer generic "text" loaders that process files assuming the UTF-8 encoding. This corrupts many spreadsheet formats including XLSX, XLSB, XLSM and XLS.

此问题影响许多 JavaScript 工具。各种演示涵盖解决方法：

¥This issue affects many JavaScript tools. Various demos cover workarounds:

ViteJS 插件接收工作簿文件的相对路径并可以直接读取文件。

¥ViteJS plugins receive the relative path to the workbook file and can read the file directly.
Webpack 插件支持特殊的 raw 选项，指示打包器传递原始二进制数据。

¥Webpack plugins support a special raw option that instructs the bundler to pass raw binary data.
NuxtJS 解析器和转换器可以从内部标识符推断出工作簿文件的路径。

¥NuxtJS parsers and transformers can deduce the path to the workbook file from internal identifiers.

CSVLoader 扩展了一个特殊的 TextLoader，强制进行 UTF-8 文本解析。

¥The CSVLoader extends a special TextLoader that forces UTF-8 text parsing.

PDF 加载器使用一个单独的 BufferLoader 类，它使用 NodeJS Buffer 对象传递原始数据。

¥There is a separate BufferLoader class, used by the PDF loader, that passes the raw data using NodeJS Buffer objects.

Binary Text

Binary	Text
pdf.ts (structure) `export class PDFLoader extends BufferLoader { // ... public async parse( raw: Buffer, metadata: Document["metadata"] ): Promise<Document[]> { // ... } // ... }`	csv.ts (structure) `export class CSVLoader extends TextLoader { // ... protected async parse( raw: string ): Promise<string[]> { // ... } // ... }`

pdf.ts (structure)
export class PDFLoader extends BufferLoader {
  // ...
  public async parse(
    raw: Buffer,
    metadata: Document["metadata"]
  ): Promise<Document[]> {
    // ...
  }
  // ...
}

csv.ts (structure)
export class CSVLoader extends TextLoader {
  // ...
  protected async parse(
    raw: string

  ): Promise<string[]> {
    // ...
  }
  // ...
}

NodeJS 缓冲区

¥NodeJS Buffers

SheetJS read 方法直接支持 NodeJS Buffer 对象 [^6]：

¥The SheetJS read method supports NodeJS Buffer objects directly[^6]:

Parsing a workbook in a BufferLoader
import { BufferLoader } from "langchain/document_loaders/fs/buffer";
import { read, utils } from "xlsx";

export default class LoadOfSheet extends BufferLoader {
  // ...
  async parse(raw, metadata) {
    const wb = read(raw, {type: "buffer"});
    // At this point, `wb` is a SheetJS workbook object
    // ...
  }
}

read 方法返回 SheetJS 工作簿对象 [^7]。

¥The read method returns a SheetJS workbook object[^7].

生成内容

¥Generating Content

SheetJS sheet_to_json 方法 [^8] 返回一个数据对象数组，其键从工作表的第一行中提取。

¥The SheetJS sheet_to_json method[^8] returns an array of data objects whose keys are drawn from the first row of the worksheet.

Spreadsheet	Array of Objects
	`[ { Name: "Bill Clinton", Index: 42 }, { Name: "GeorgeW Bush", Index: 43 }, { Name: "Barack Obama", Index: 44 }, { Name: "Donald Trump", Index: 45 }, { Name: "Joseph Biden", Index: 46 } ]`

原始 CSVLoader 为每个键值对写入一行。可以通过循环数据行对象的键和值来生成此文本。Object.entries 辅助函数简化了转换：

¥The original CSVLoader wrote one row for each key-value pair. This text can be generated by looping over the keys and values of the data row object. The Object.entries helper function simplifies the conversion:

function make_csvloader_doc_from_row_object(row) {
  return Object.entries(row).map(([k,v]) => `${k}: ${v}`).join("\n");
}

生成文档

¥Generating Documents

加载器必须为工作簿中的每个工作表生成行对象。

¥The loader must generate row objects for each worksheet in the workbook.

在 SheetJS 数据模型中，工作簿对象有两个相关字段：

¥In the SheetJS data model, the workbook object has two relevant fields:

SheetNames 是一个工作表名称数组

¥SheetNames is an array of sheet names
Sheets 是一个对象，其键是工作表名称，其值是工作表对象。

¥Sheets is an object whose keys are sheet names and values are sheet objects.

for..of 循环可以遍历工作表：

¥A for..of loop can iterate across the worksheets:

Looping over a workbook (skeleton)
    const wb = read(raw, {type: "buffer", WTF:1});
    for(let name of wb.SheetNames) {
      const ws = wb.Sheets[name];
      const aoa = utils.sheet_to_json(ws);
      // at this point, `aoa` is an array of objects
    }

此简化的 parse 函数使用上一节中的代码片段：

¥This simplified parse function uses the snippet from the previous section:

BufferLoader parse function (skeleton)
  async parse(raw, metadata) {
    /* array to hold generated documents */
    const result = [];

    /* read workbook */
    const wb = read(raw, {type: "buffer", WTF:1});

    /* loop over worksheets */
    for(let name of wb.SheetNames) {
      const ws = wb.Sheets[name];
      const aoa = utils.sheet_to_json(ws);

      /* loop over data rows */
      aoa.forEach((row, idx) => {
        /* generate a new document and add to the result array */
        result.push({
          pageContent: Object.entries(row).map(([k,v]) => `${k}: ${v}`).join("\n")
        });
      });
    }

    return result;
  }

元数据和属性

¥Metadata and Attributes

强烈建议为自查询检索应用生成额外的元数据和属性。

¥It is strongly recommended to generate additional metadata and attributes for self-query retrieval applications.

Implementation Details (click to show)

Metadata

Metadata is attached to each document object. The following example appends the raw row data to the document metadata:

Document with metadata (snippet)
        /* generate a new document and add to the result array */
        result.push({
          pageContent: Object.entries(row).map(([k,v]) => `${k}: ${v}`).join("\n"),
          metadata: {
            worksheet: name, // name of the worksheet
            rowNum: idx, // data row index
            ...row // raw row data
          }
        });

Attributes

Each attribute object specifies three properties:

name corresponds to the field in the document metadata
description is a description of the field
type is a description of the data type.

While looping through data rows, a simple type check can keep track of the data type for each column:

Tracking column types (sketch)
    for(let name of wb.SheetNames) {
      /* track column types */
      const fields = {};
      // ...

      aoo.forEach((row, idx) => {
        result.push({/* ... */});
        /* Check each property */
        Object.entries(row).forEach(([k,v]) => {
          /* Update fields entry to reflect the new data point */
          if(v != null) (fields[k] || (fields[k] = {}))[v instanceof Date ? "date" : typeof v] = true
        });
      });
      // ...
    }

Attributes can be generated after writing the worksheet data. Storing attributes in a loader property will make it accessible to scripts that use the loader.

Adding Attributes to a Loader (sketch)
export default class LoadOfSheet extends BufferLoader {
  attributes = [];
  // ...

  async parse(raw, metadata) {
    // Add the worksheet name and row index attributes
    this.attributes = [
      { name: "worksheet", description: "Sheet or Worksheet Name", type: "string" },
      { name: "rowNum", description: "Row index", type: "number" }
    ];
    const wb = read(raw, {type: "buffer", WTF:1});
    for(let name of wb.SheetNames) {
      const fields = {};
      // ...
      const aoo = utils.sheet_to_json(ws);
      aoo.forEach((row, idx) => {
        result.push({/* ... */});
        /* Check each property */
        Object.entries(row).forEach(([k,v]) => {
          /* Update fields entry to reflect the new data point */
          if(v != null) (fields[k] || (fields[k] = {}))[v instanceof Date ? "date" : typeof v] = true
        });
      });
      /* Add one attribute per metadata field */
      Object.entries(fields).forEach(([k,v]) => this.attributes.push({
        name: k, description: k,
        /* { number: true, string: true } -> "number or string" */
        type: Object.keys(v).join(" or ")
      }));
    }
    // ...
  }

SheetJS 加载器演示

¥SheetJS Loader Demo

演示针对示例汽车数据集执行查询 "哪些行每加仑行驶里程超过 40 英里？" 并显示结果。

¥The demo performs the query "Which rows have over 40 miles per gallon?" against a sample cars dataset and displays the results.

此演示在 Ollama 中使用 Phi-4[^9] 进行测试。

¥This demo was tested using the Phi-4[^9] in Ollama.

测试模型使用了高达 10GB 的 VRAM。强烈建议在至少具有 12GB VRAM 的 GPU 或具有至少 32GB 统一内存的较新的 Apple Silicon Mac 上运行演示。

¥The tested model used up to 10GB VRAM. It is strongly recommended to run the demo on a GPU with at least 12GB VRAM or a newer Apple Silicon Mac with at least 32GB unified memory.

安装先决条件：

¥Install pre-requisites:

Ollama 应安装在与 NodeJS 相同的平台上。如果在 WSL 中运行 NodeJS，则还应在 WSL 中安装 Ollama。

¥Ollama should be installed on the same platform as NodeJS. If NodeJS is run within WSL, Ollama should also be installed within WSL.

Intel ARC GPU 需要 Intel PyTorch 扩展 (IPEX) 和附带相关 LLM 库 (IPEX-LLM) 的特殊版本的 Ollama。

¥Intel ARC GPUs require the Intel Extension for PyTorch (IPEX) and a special version of Ollama that ships with the associated LLM Library (IPEX-LLM).

ARC Instructions (click to show)

These instructions are based on the official Intel recommendations.

A) If Ollama for Windows was installed, close the program by right-clicking on the tray icon and selecting "Quit Ollama".

B) Install Miniforge3[^10], selecting "Just Me" when prompted.

C) Launch a normal Command Prompt and create a Conda environment:

cd %USERPROFILE%\Documents
mkdir ollama-intel
cd ollama-intel
set PATH=%PATH%;%USERPROFILE%\miniforge3\condabin
conda create -n llm-cpp python=3.11

D) Activate the environment in the session and install dependencies:

conda activate llm-cpp
pip install --pre --upgrade ipex-llm[cpp]

Close the window after the installation.

E) Launch a new Administrator Command Prompt and set up Ollama:

cd %USERPROFILE%\Documents\ollama-intel
set PATH=%PATH%;%USERPROFILE%\miniforge3\condabin
conda activate llm-cpp
init-ollama.bat

Close the window.

F) Launch a normal Command Prompt window and run Ollama:

cd %USERPROFILE%\Documents\ollama-intel
set PATH=%PATH%;%USERPROFILE%\miniforge3\condabin
conda activate llm-cpp
set OLLAMA_NUM_GPU=999
set no_proxy=localhost,127.0.0.1
set ZES_ENABLE_SYSMAN=1
set SYCL_CACHE_PERSISTENT=1
set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
ollama serve

This window should be kept open throughout the demo.

安装依赖后，启动新的终端会话。

¥After installing dependencies, start a new terminal session.

创建一个新项目：

¥Create a new project:

mkdir sheetjs-loader
cd sheetjs-loader
npm init -y

下载演示脚本：

¥Download the demo scripts:

curl -LO https://xlsx.nodejs.cn/loadofsheet/query.mjs
curl -LO https://xlsx.nodejs.cn/loadofsheet/loadofsheet.mjs

在 PowerShell 中，命令可能会因参数错误而失败：

¥In PowerShell, the command may fail with a parameter error:

Invoke-WebRequest : A parameter cannot be found that matches parameter name 'LO'.

curl.exe 必须直接调用：

¥curl.exe must be invoked directly:

curl.exe -LO https://xlsx.nodejs.cn/loadofsheet/query.mjs
curl.exe -LO https://xlsx.nodejs.cn/loadofsheet/loadofsheet.mjs

安装 SheetJS NodeJS 模块：

¥Install the SheetJS NodeJS module:

npm i --save https://sheet.lol/balls/xlsx-0.20.3.tgz

安装依赖：

¥Install dependencies:

npm i --save @langchain/core@0.3.59 langchain@0.3.28 @langchain/ollama@0.2.2 peggy@3.0.2

在某些测试运行中，出现与依赖和对等依赖版本相关的错误消息。--force 标志将抑制版本不匹配错误：

¥In some test runs, there were error messages relating to dependency and peer dependency versions. The --force flag will suppress version mismatch errors:

npm i --save @langchain/core@0.3.59 langchain@0.3.28 @langchain/ollama@0.2.2 peggy@3.0.2 --force

下载汽车数据集：

¥Download the cars dataset:

curl -LO https://xlsx.nodejs.cn/cd.xls

在 PowerShell 中，命令可能会因参数错误而失败：

¥In PowerShell, the command may fail with a parameter error:

Invoke-WebRequest : A parameter cannot be found that matches parameter name 'LO'.

curl.exe 必须直接调用：

¥curl.exe must be invoked directly:

curl.exe -LO https://xlsx.nodejs.cn/cd.xls

使用 Ollama 安装 phi4:14b 模型：

¥Install the phi4:14b model using Ollama:

ollama pull phi4:14b

Additional steps for Intel GPUs (click to show)

A different embedding model must be used on Intel GPUs:

A) Install the nomic-embed-text:latest model through Ollama:

ollama pull nomic-embed-text:latest

B) Edit query.mjs to use the embedding model:

query.mjs (edit highlighted line)
const llm = new ChatOllama({ baseUrl: "http://127.0.0.1:11434", model });
const embeddings = new OllamaEmbeddings({ baseUrl: "http://127.0.0.1:11434", model: "nomic-embed-text:latest"});

运行演示脚本

¥Run the demo script

node query.mjs

演示执行查询 "哪些行每加仑行驶里程超过 40 英里？"。它将打印以下九个结果：

¥The demo performs the query "Which rows have over 40 miles per gallon?". It will print the following nine results:

Expected output (order of lines may differ)
{ Name: 'volkswagen rabbit custom diesel', MPG: 43.1 }
{ Name: 'vw rabbit c (diesel)', MPG: 44.3 }
{ Name: 'renault lecar deluxe', MPG: 40.9 }
{ Name: 'honda civic 1500 gl', MPG: 44.6 }
{ Name: 'datsun 210', MPG: 40.8 }
{ Name: 'vw pickup', MPG: 44 }
{ Name: 'mazda glc', MPG: 46.6 }
{ Name: 'vw dasher (diesel)', MPG: 43.4 }
{ Name: 'vw rabbit', MPG: 41.5 }

一些使用较旧 GPU 的 SheetJS 用户报告了错误。

¥Some SheetJS users with older GPUs have reported errors.

如果命令失败，请尝试再次运行脚本。

¥If the command fails, please try running the script a second time.

要找到预期结果：

¥To find the expected results:

在 Excel 中打开 cd.xls 电子表格

¥Open the cd.xls spreadsheet in Excel
在功能区中选择主页 > 排序和筛选 > 筛选

¥Select Home > Sort & Filter > Filter in the Ribbon
选择 B 列 (Miles_per_Gallon) 的筛选选项

¥Select the filter option for column B (Miles_per_Gallon)
在弹出窗口中，在 Filter 下拉列表中选择 "大于" 并输入 40

¥In the popup, select "Greater Than" in the Filter dropdown and type 40