加载器教程
许多现有系统和平台都支持从 CSV 文件加载数据。许多用户更喜欢使用电子表格软件和多表文件格式(包括 XLSX)。SheetJS 库通过将复杂的工作簿转换为简单的 CSV 数据来帮助弥补差距。
¥Many existing systems and platforms include support for loading data from CSV files. Many users prefer to work in spreadsheet software and multi-sheet file formats including XLSX. SheetJS libraries help bridge the gap by translating complex workbooks to simple CSV data.
此示例的目标是将电子表格数据加载到向量存储中,并使用大型语言模型根据英语输入生成查询。现有工具支持 CSV,但不支持真正的电子表格。
¥The goal of this example is to load spreadsheet data into a vector store and use a large language model to generate queries based on English language input. The existing tooling supports CSV but does not support real spreadsheets.
在 "SheetJS 转换" 中,我们将使用 SheetJS 库为 LangChain CSV 加载器生成 CSV 文件。这些转换可以在预处理步骤中运行,而不会破坏现有的 CSV 工作流程。
¥In "SheetJS Conversion", we will use SheetJS libraries to generate CSV files for the LangChain CSV loader. These conversions can be run in a preprocessing step without disrupting existing CSV workflows.
在 "SheetJS 加载器" 中,我们将在自定义 LoadOfSheet
数据加载器中使用 SheetJS 库直接生成文档和元数据。
¥In "SheetJS Loader", we will use SheetJS libraries in a
custom LoadOfSheet
data loader to directly generate documents and metadata.
"SheetJS 加载器演示" 是一个完整的演示,它使用 SheetJS Loader 根据 XLS 工作簿中的数据回答问题。
¥"SheetJS Loader Demo" is a complete demo that uses the SheetJS Loader to answer questions based on data from a XLS workbook.
此演示在以下配置中进行了测试:
¥This demo was tested in the following configurations:
日期 | 平台 |
---|---|
2024-08-31 | NVIDIA RTX 4090(24 GB VRAM)+ i9-10910(128 GB RAM) |
2024-08-09 | NVIDIA RTX 4080 SUPER(16 GB VRAM)+ i9-10910(128 GB RAM) |
2024-09-21 | AMD RX 7900 XTX(24 GB VRAM)+ Ryzen Z1 Extreme(16 GB RAM) |
2024-07-15 | Apple M2 Max 12 核 CPU + 30 核 GPU(32 GB 统一内存) |
SheetJS 用户已在其他配置中验证了此演示:
¥SheetJS users have verified this demo in other configurations:
Other tested configurations (click to show)
Demo | Platform |
---|---|
LangChainJS | NVIDIA RTX 4070 Ti (12 GB VRAM) + Ryzen 7 5800x (64 GB RAM) |
LangChainJS | NVIDIA RTX 4060 (8 GB VRAM) + Ryzen 7 5700g (32 GB RAM) |
LangChainJS | NVIDIA RTX 3090 (24 GB VRAM) + Ryzen 9 3900XT (128 GB RAM) |
LangChainJS | NVIDIA RTX 3080 (12 GB VRAM) + Ryzen 7 5800X (32 GB RAM) |
LangChainJS | NVIDIA RTX 3070 (8 GB VRAM) + Ryzen Z1 Extreme (16 GB RAM) |
LangChainJS | NVIDIA RTX 3060 (12 GB VRAM) + i5-11400 (32 GB RAM) |
LangChainJS | NVIDIA RTX 2080 (12 GB VRAM) + i7-9700K (16 GB RAM) |
LangChainJS | NVIDIA RTX 2060 (6 GB VRAM) + Ryzen 5 3600 (32 GB RAM) |
LangChainJS | NVIDIA GTX 1080 (8 GB VRAM) + Ryzen 7 5800x (64 GB RAM) |
LangChainJS | NVIDIA GTX 1070 (8 GB VRAM) + Ryzen 7 7700x (32 GB RAM) |
LangChainJS | AMD RX 6800 XT (16 GB VRAM) + Ryzen Z1 Extreme (16 GB RAM) |
特别感谢:
¥Special thanks to:
CSV 加载器
¥CSV Loader
此说明已针对 LangChain 0.2 进行了验证。
¥This explanation was verified against LangChain 0.2.
文档加载器从数据源生成数据对象("documents")和相关元数据。
¥Document loaders generate data objects ("documents") and associated metadata from data sources.
LangChain 提供 CSVLoader
[^1] 组件,用于从文件加载 CSV 数据:
¥LangChain offers a CSVLoader
[^1] component for loading CSV data from a file:
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";
const loader = new CSVLoader("pres.csv");
const docs = await loader.load();
console.log(docs);
CSV 加载器使用第一行确定列标题并为每个数据行生成一个文档。例如,以下 CSV 保存总统数据:
¥The CSV loader uses the first row to determine column headers and generates one document per data row. For example, the following CSV holds Presidential data:
Name,Index
Bill Clinton,42
GeorgeW Bush,43
Barack Obama,44
Donald Trump,45
Joseph Biden,46
每个数据行都转换为一个文档,其内容是属性和值的列表。例如,第三个数据行如下所示:
¥Each data row is translated to a document whose content is a list of attributes and values. For example, the third data row is shown below:
CSV Row | Document Content |
---|---|
|
|
LangChain CSV 加载器将在文档中包含源元数据:
¥The LangChain CSV loader will include source metadata in the document:
Document {
pageContent: 'Name: Barack Obama\nIndex: 44',
metadata: { source: 'pres.csv', line: 3 }
}
SheetJS 转换
¥SheetJS Conversion
SheetJS NodeJS 模块 可以在使用 LangChain 和其他 JavaScript 库的 NodeJS 脚本中导入。
¥The SheetJS NodeJS module can be imported in NodeJS scripts that use LangChain and other JavaScript libraries.
一个简单的预处理步骤可以将工作簿转换为可以由现有 CSV 工具处理的 CSV 文件:
¥A simple pre-processing step can convert workbooks to CSV files that can be processed by the existing CSV tooling:
从文件系统解析文件
¥Parsing files from the filesystem
SheetJS readFile
方法 [^2] 可以读取工作簿文件。该方法接受路径并返回符合 SheetJS 数据模型 [^3] 的工作簿对象。
¥The SheetJS readFile
method[^2] can read workbook files. The method accepts a
path and returns a workbook object that conforms to the SheetJS data model[^3].
/* Load SheetJS Libraries */
import { readFile, set_fs } from 'xlsx';
/* Load 'fs' for readFile support */
import * as fs from 'fs';
set_fs(fs);
/* Parse `pres.xlsx` */
const wb = readFile("pres.xlsx");
检查 SheetJS 工作簿和工作表对象
¥Inspecting SheetJS workbook and worksheet objects
Workbook 对象表示多工作表工作簿文件。它们存储单个工作表对象和其他元数据。
¥Workbook objects represent multi-sheet workbook files. They store individual worksheet objects and other metadata.
与此讨论相关,工作簿对象使用以下键 [^7]:
¥Relevant to this discussion, the workbook object uses the following keys[^7]:
-
SheetNames
是一个工作表名称数组¥
SheetNames
is an array of worksheet names -
Sheets
是一个对象,其键是工作表名称,其值是工作表对象。¥
Sheets
is an object whose keys are sheet names and whose values are sheet objects.
SheetNames[0]
是第一个工作表名称,因此以下代码片段将从工作簿中提取第一个工作表:
¥SheetNames[0]
is the first worksheet name, so the following snippet will pull
the first worksheet from the workbook:
const first_ws = wb.Sheets[wb.SheetNames[0]];
将 SheetJS 工作表导出到 CSV
¥Exporting SheetJS worksheets to CSV
可以使用 SheetJS sheet_to_csv
[^4] 方法将工作簿中的每个工作表写入 CSV 文本。该方法接受 SheetJS 工作表对象并返回字符串。
¥Each worksheet in the workbook can be written to CSV text using the SheetJS
sheet_to_csv
[^4] method. The method accepts a SheetJS worksheet object and
returns a string.
const csv = utils.sheet_to_csv(first_ws);
完成脚本
¥Complete Script
例如,以下 NodeJS 脚本读取 pres.xlsx
并显示第一个工作表中的 CSV 行:
¥For example, the following NodeJS script reads pres.xlsx
and displays CSV rows
from the first worksheet:
/* Load SheetJS Libraries */
import { readFile, set_fs, utils } from 'xlsx';
/* Load 'fs' for readFile support */
import * as fs from 'fs';
set_fs(fs);
/* Parse `pres.xlsx` */
const wb = readFile("pres.xlsx");
/* Print CSV rows from first worksheet */
const first_ws = wb.Sheets[wb.SheetNames[0]];
const csv = utils.sheet_to_csv(first_ws);
console.log(csv);
许多演示涵盖了精神上相似的工作流程:
¥A number of demos cover spiritually similar workflows:
-
Stata、MATLAB 和 Maple 支持 XLSX 数据导入。SheetJS 集成从用户提供的电子表格生成干净的 XLSX 工作簿。
¥Stata, MATLAB and Maple support XLSX data import. The SheetJS integrations generate clean XLSX workbooks from user-supplied spreadsheets.
-
TensorFlow.js、熊猫 和 Mathematica 支持 CSV 数据导入。SheetJS 集成生成干净的 CSV 并使用内置的 CSV 处理器。
¥TensorFlow.js, Pandas and Mathematica support CSV data import. The SheetJS integrations generate clean CSVs and use built-in CSV processors.
-
"命令行工具" 演示涵盖了制作用于文件转换的独立命令行工具的技术。
¥The "Command-Line Tools" demo covers techniques for making standalone command-line tools for file conversion.
单一工作表
¥Single Worksheet
对于单个工作表,SheetJS 预处理步骤可以将 CSV 行写入文件,CSVLoader
可以加载新写入的文件。
¥For a single worksheet, a SheetJS pre-processing step can write the CSV rows to
file and the CSVLoader
can load the newly written file.
Code example (click to hide)
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";
import { readFile, set_fs, utils } from 'xlsx';
/* Load 'fs' for readFile support */
import * as fs from 'fs';
set_fs(fs);
/* Parse `pres.xlsx`` */
const wb = readFile("pres.xlsx");
/* Generate CSV and write to `pres.xlsx.csv` */
const first_ws = wb.Sheets[wb.SheetNames[0]];
const csv = utils.sheet_to_csv(first_ws);
fs.writeFileSync("pres.xlsx.csv", csv);
/* Create documents with CSVLoader */
const loader = new CSVLoader("pres.xlsx.csv");
const docs = await loader.load();
console.log(docs);
// ...
工作簿
¥Workbook
工作簿是工作表的集合。每个工作表都可以导出到单独的 CSV。如果 CSV 写入子文件夹,DirectoryLoader
[^5] 可以一步处理文件。
¥A workbook is a collection of worksheets. Each worksheet can be exported to a
separate CSV. If the CSVs are written to a subfolder, a DirectoryLoader
[^5]
can process the files in one step.
Code example (click to hide)
在此示例中,脚本创建了一个名为 csv
的子文件夹。将处理工作簿中的每个工作表,并将生成的 CSV 存储到编号文件中。第一个工作表将存储到 csv/0.csv
。
¥In this example, the script creates a subfolder named csv
. Each worksheet in
the workbook will be processed and the generated CSV will be stored to numbered
files. The first worksheet will be stored to csv/0.csv
.
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";
import { DirectoryLoader } from "langchain/document_loaders/fs/directory";
import { readFile, set_fs, utils } from 'xlsx';
/* Load 'fs' for readFile support */
import * as fs from 'fs';
set_fs(fs);
/* Parse `pres.xlsx`` */
const wb = readFile("pres.xlsx");
/* Create a folder `csv` */
try { fs.mkdirSync("csv"); } catch(e) {}
/* Generate CSV data for each worksheet */
wb.SheetNames.forEach((name, idx) => {
const ws = wb.Sheets[name];
const csv = utils.sheet_to_csv(ws);
fs.writeFileSync(`csv/${idx}.csv`, csv);
});
/* Create documents with DirectoryLoader */
const loader = new DirectoryLoader("csv", {
".csv": (path) => new CSVLoader(path)
});
const docs = await loader.load();
console.log(docs);
// ...
SheetJS 加载器
¥SheetJS Loader
LangChain 附带的 CSVLoader
不添加任何文档元数据,也不生成任何属性。自定义加载器可以绕过 CSV 工具中的限制,并可能包含没有 CSV 等效项的元数据。
¥The CSVLoader
that ships with LangChain does not add any Document metadata and
does not generate any attributes. A custom loader can work around limitations in
the CSV tooling and potentially include metadata that has no CSV equivalent.
演示 LoadOfSheet
加载器 将在所有工作表中为每个数据行生成一个文档。它还将尝试构建元数据和属性以用于自查询检索器。
¥The demo LoadOfSheet
loader will
generate one Document per data row across all worksheets. It will also attempt
to build metadata and attributes for use in self-querying retrievers.
/* read and parse `data.xlsb` */
const loader = new LoadOfSheet("./data.xlsb");
/* generate documents */
const docs = await loader.load();
/* synthesized attributes for the SelfQueryRetriever */
const attributes = loader.attributes;
Sample SheetJS Loader (click to show)
This example loader pulls data from each worksheet. It assumes each worksheet includes one header row and a number of data rows.
import { Document } from "@langchain/core/documents";
import { BufferLoader } from "langchain/document_loaders/fs/buffer";
import { read, utils } from "xlsx";
/**
* Document loader that uses SheetJS to load documents.
* * Each worksheet is parsed into an array of row objects using the SheetJS
* `sheet_to_json` method and projected to a `Document`. Metadata includes
* original sheet name, row data, and row index
*/
export default class LoadOfSheet extends BufferLoader {
/** @type {import("langchain/chains/query_constructor").AttributeInfo[]} */
attributes = [];
/**
* Document loader that uses SheetJS to load documents.
* * @param {string|Blob} filePathOrBlob Source Data
*/
constructor(filePathOrBlob) {
super(filePathOrBlob);
this.attributes = [];
}
/**
* Parse document
* * NOTE: column labels in multiple sheets are not disambiguated!
* * @param {Buffer} raw Raw data Buffer
* @param {Document["metadata"]} metadata Document metadata
* @returns {Promise<Document[]>} Array of Documents
*/
async parse(raw, metadata) {
/** @type {Document[]} */
const result = [];
this.attributes = [
{ name: "worksheet", description: "Sheet or Worksheet Name", type: "string" },
{ name: "rowNum", description: "Row index", type: "number" }
];
const wb = read(raw, {type: "buffer", WTF:1});
for(let name of wb.SheetNames) {
const fields = {};
const ws = wb.Sheets[name];
if(!ws) return;
const aoo = utils.sheet_to_json(ws);
aoo.forEach((row, idx) => {
result.push({
pageContent: "Row " + (idx + 1) + " has the following content: \n" + Object.entries(row).map(kv => `- ${kv[0]}: ${kv[1]}`).join("\n") + "\n",
metadata: {
worksheet: name,
rowNum: row["__rowNum__"],
...metadata,
...row
}
});
Object.entries(row).forEach(([k,v]) => { if(v != null) (fields[k] || (fields[k] = {}))[v instanceof Date ? "date" : typeof v] = true } );
});
Object.entries(fields).forEach(([k,v]) => this.attributes.push({
name: k, description: k, type: Object.keys(v).join(" or ")
}));
}
return result;
}
};
从文本到二进制
¥From Text to Binary
许多库和平台提供通用 "text" 加载器,这些加载器假定采用 UTF8 编码来处理文件。这会破坏许多电子表格格式,包括 XLSX、XLSB、XLSM 和 XLS。
¥Many libraries and platforms offer generic "text" loaders that process files assuming the UTF8 encoding. This corrupts many spreadsheet formats including XLSX, XLSB, XLSM and XLS.
此问题影响许多 JavaScript 工具。各种演示涵盖解决方法:
¥This issue affects many JavaScript tools. Various demos cover workarounds:
-
ViteJS 插件 接收工作簿文件的相对路径并可以直接读取文件。
¥ViteJS plugins receive the relative path to the workbook file and can read the file directly.
-
Webpack 插件 支持特殊的
raw
选项,指示打包器传递原始二进制数据。¥Webpack plugins support a special
raw
option that instructs the bundler to pass raw binary data. -
NuxtJS 解析器和转换器 可以从内部标识符推断出工作簿文件的路径。
¥NuxtJS parsers and transformers can deduce the path to the workbook file from internal identifiers.
CSVLoader
扩展了一个特殊的 TextLoader
,强制进行 UTF8 文本解析。
¥The CSVLoader
extends a special TextLoader
that forces UTF8 text parsing.
PDF 加载器使用一个单独的 BufferLoader
类,它使用 NodeJS Buffer
对象传递原始数据。
¥There is a separate BufferLoader
class, used by the PDF loader, that passes
the raw data using NodeJS Buffer
objects.
Binary | Text |
---|---|
pdf.ts (structure)
| csv.ts (structure)
|
NodeJS 缓冲区
¥NodeJS Buffers
SheetJS read
方法直接支持 NodeJS Buffer 对象 [^6]:
¥The SheetJS read
method supports NodeJS Buffer objects directly[^6]:
import { BufferLoader } from "langchain/document_loaders/fs/buffer";
import { read, utils } from "xlsx";
export default class LoadOfSheet extends BufferLoader {
// ...
async parse(raw, metadata) {
const wb = read(raw, {type: "buffer"});
// At this point, `wb` is a SheetJS workbook object
// ...
}
}
read
方法返回 SheetJS 工作簿对象 [^7]。
¥The read
method returns a SheetJS workbook object[^7].
生成内容
¥Generating Content
SheetJS sheet_to_json
方法 [^8] 返回一个数据对象数组,其键从工作表的第一行中提取。
¥The SheetJS sheet_to_json
method[^8] returns an array of data objects whose
keys are drawn from the first row of the worksheet.
Spreadsheet | Array of Objects |
---|---|
|
原始 CSVLoader
为每个键值对写入一行。可以通过循环数据行对象的键和值来生成此文本。Object.entries
辅助函数简化了转换:
¥The original CSVLoader
wrote one row for each key-value pair. This text can be
generated by looping over the keys and values of the data row object. The
Object.entries
helper function simplifies the conversion:
function make_csvloader_doc_from_row_object(row) {
return Object.entries(row).map(([k,v]) => `${k}: ${v}`).join("\n");
}
生成文档
¥Generating Documents
加载器必须为工作簿中的每个工作表生成行对象。
¥The loader must generate row objects for each worksheet in the workbook.
在 SheetJS 数据模型中,工作簿对象有两个相关字段:
¥In the SheetJS data model, the workbook object has two relevant fields:
-
SheetNames
是一个工作表名称数组¥
SheetNames
is an array of sheet names -
Sheets
是一个对象,其键是工作表名称,其值是工作表对象。¥
Sheets
is an object whose keys are sheet names and values are sheet objects.
for..of
循环可以遍历工作表:
¥A for..of
loop can iterate across the worksheets:
const wb = read(raw, {type: "buffer", WTF:1});
for(let name of wb.SheetNames) {
const ws = wb.Sheets[name];
const aoa = utils.sheet_to_json(ws);
// at this point, `aoa` is an array of objects
}
此简化的 parse
函数使用上一节中的代码片段:
¥This simplified parse
function uses the snippet from the previous section:
async parse(raw, metadata) {
/* array to hold generated documents */
const result = [];
/* read workbook */
const wb = read(raw, {type: "buffer", WTF:1});
/* loop over worksheets */
for(let name of wb.SheetNames) {
const ws = wb.Sheets[name];
const aoa = utils.sheet_to_json(ws);
/* loop over data rows */
aoa.forEach((row, idx) => {
/* generate a new document and add to the result array */
result.push({
pageContent: Object.entries(row).map(([k,v]) => `${k}: ${v}`).join("\n")
});
});
}
return result;
}
元数据和属性
¥Metadata and Attributes
强烈建议为自查询检索应用生成额外的元数据和属性。
¥It is strongly recommended to generate additional metadata and attributes for self-query retrieval applications.
Implementation Details (click to show)
Metadata
Metadata is attached to each document object. The following example appends the raw row data to the document metadata:
/* generate a new document and add to the result array */
result.push({
pageContent: Object.entries(row).map(([k,v]) => `${k}: ${v}`).join("\n"),
metadata: {
worksheet: name, // name of the worksheet
rowNum: idx, // data row index
...row // raw row data
}
});
Attributes
Each attribute object specifies three properties:
name
corresponds to the field in the document metadatadescription
is a description of the fieldtype
is a description of the data type.
While looping through data rows, a simple type check can keep track of the data type for each column:
for(let name of wb.SheetNames) {
/* track column types */
const fields = {};
// ...
aoo.forEach((row, idx) => {
result.push({/* ... */});
/* Check each property */
Object.entries(row).forEach(([k,v]) => {
/* Update fields entry to reflect the new data point */
if(v != null) (fields[k] || (fields[k] = {}))[v instanceof Date ? "date" : typeof v] = true
});
});
// ...
}
Attributes can be generated after writing the worksheet data. Storing attributes in a loader property will make it accessible to scripts that use the loader.
export default class LoadOfSheet extends BufferLoader {
attributes = [];
// ...
async parse(raw, metadata) {
// Add the worksheet name and row index attributes
this.attributes = [
{ name: "worksheet", description: "Sheet or Worksheet Name", type: "string" },
{ name: "rowNum", description: "Row index", type: "number" }
];
const wb = read(raw, {type: "buffer", WTF:1});
for(let name of wb.SheetNames) {
const fields = {};
// ...
const aoo = utils.sheet_to_json(ws);
aoo.forEach((row, idx) => {
result.push({/* ... */});
/* Check each property */
Object.entries(row).forEach(([k,v]) => {
/* Update fields entry to reflect the new data point */
if(v != null) (fields[k] || (fields[k] = {}))[v instanceof Date ? "date" : typeof v] = true
});
});
/* Add one attribute per metadata field */
Object.entries(fields).forEach(([k,v]) => this.attributes.push({
name: k, description: k,
/* { number: true, string: true } -> "number or string" */
type: Object.keys(v).join(" or ")
}));
}
// ...
}
SheetJS 加载器演示
¥SheetJS Loader Demo
演示针对 示例汽车数据集 执行查询 "哪些行每加仑行驶里程超过 40 英里?" 并显示结果。
¥The demo performs the query "Which rows have over 40 miles per gallon?" against a sample cars dataset and displays the results.
SheetJS 团队成员已使用 PowerShell 和 Ollama for Windows 在 Windows 10 和 Windows 11 上测试了此演示。
¥SheetJS team members have tested this demo on Windows 10 and Windows 11 using PowerShell and Ollama for Windows.
SheetJS 用户还在 Windows Subsystem for Linux 中测试了此演示。
¥SheetJS users have also tested this demo within Windows Subsystem for Linux.
此演示使用 Ollama 中的 ChatQA-1.5 model[^9] 进行了测试。
¥This demo was tested using the ChatQA-1.5 model[^9] in Ollama.
测试模型使用了高达 9.2GB 的 VRAM。强烈建议在较新的 Apple Silicon Mac 或配备 Nvidia GPU(至少 12GB VRAM)的 PC 上运行演示。SheetJS 用户已在 VRAM 低至 6GB 的系统上测试了此演示。
¥The tested model used up to 9.2GB VRAM. It is strongly recommended to run the demo on a newer Apple Silicon Mac or a PC with an Nvidia GPU with at least 12GB VRAM. SheetJS users have tested the demo on systems with as little as 6GB VRAM.
-
安装先决条件:
¥Install pre-requisites:
Ollama 应安装在与 NodeJS 相同的平台上。如果在 WSL 中运行 NodeJS,则还应在 WSL 中安装 Ollama。
¥Ollama should be installed on the same platform as NodeJS. If NodeJS is run within WSL, Ollama should also be installed within WSL.
安装依赖后,启动新的终端会话。
¥After installing dependencies, start a new terminal session.
-
创建一个新项目:
¥Create a new project:
mkdir sheetjs-loader
cd sheetjs-loader
npm init -y
-
下载演示脚本:
¥Download the demo scripts:
curl -LO https://xlsx.nodejs.cn/loadofsheet/query.mjs
curl -LO https://xlsx.nodejs.cn/loadofsheet/loadofsheet.mjs
在 PowerShell 中,命令可能会因参数错误而失败:
¥In PowerShell, the command may fail with a parameter error:
Invoke-WebRequest : A parameter cannot be found that matches parameter name 'LO'.
curl.exe
必须直接调用:
¥curl.exe
must be invoked directly:
curl.exe -LO https://xlsx.nodejs.cn/loadofsheet/query.mjs
curl.exe -LO https://xlsx.nodejs.cn/loadofsheet/loadofsheet.mjs
-
安装 SheetJS NodeJS 模块:
¥Install the SheetJS NodeJS module:
npm i --save https://sheet.lol/balls/xlsx-0.20.3.tgz
-
安装依赖:
¥Install dependencies:
npm i --save @langchain/community@0.2.18 @langchain/core@0.2.15 langchain@0.2.9 peggy@4.0.3 --force
上次测试此演示时,出现与依赖和对等依赖版本相关的错误消息。--force
标志是必需的。
¥When this demo was last tested, there were error messages relating to dependency
and peer dependency versions. The --force
flag was required.
-
下载 汽车数据集:
¥Download the cars dataset:
curl -LO https://xlsx.nodejs.cn/cd.xls
在 PowerShell 中,命令可能会因参数错误而失败:
¥In PowerShell, the command may fail with a parameter error:
Invoke-WebRequest : A parameter cannot be found that matches parameter name 'LO'.
curl.exe
必须直接调用:
¥curl.exe
must be invoked directly:
curl.exe -LO https://xlsx.nodejs.cn/cd.xls
-
使用 Ollama 安装
llama3-chatqa:8b-v1.5-q8_0
模型:¥Install the
llama3-chatqa:8b-v1.5-q8_0
model using Ollama:
ollama pull llama3-chatqa:8b-v1.5-q8_0
-
运行演示脚本
¥Run the demo script
node query.mjs
演示执行查询 "哪些行每加仑行驶里程超过 40 英里?"。它将打印以下九个结果:
¥The demo performs the query "Which rows have over 40 miles per gallon?". It will print the following nine results:
{ Name: 'volkswagen rabbit custom diesel', MPG: 43.1 }
{ Name: 'vw rabbit c (diesel)', MPG: 44.3 }
{ Name: 'renault lecar deluxe', MPG: 40.9 }
{ Name: 'honda civic 1500 gl', MPG: 44.6 }
{ Name: 'datsun 210', MPG: 40.8 }
{ Name: 'vw pickup', MPG: 44 }
{ Name: 'mazda glc', MPG: 46.6 }
{ Name: 'vw dasher (diesel)', MPG: 43.4 }
{ Name: 'vw rabbit', MPG: 41.5 }
一些使用较旧 GPU 的 SheetJS 用户报告了错误。
¥Some SheetJS users with older GPUs have reported errors.
如果命令失败,请尝试再次运行脚本。
¥If the command fails, please try running the script a second time.
要找到预期结果:
¥To find the expected results:
-
在 Excel 中打开
cd.xls
电子表格¥Open the
cd.xls
spreadsheet in Excel -
在功能区中选择主页 > 排序和筛选 > 筛选
¥Select Home > Sort & Filter > Filter in the Ribbon
-
选择 B 列 (
Miles_per_Gallon
) 的筛选选项¥Select the filter option for column B (
Miles_per_Gallon
) -
在弹出窗口中,在 Filter 下拉列表中选择 "大于" 并输入 40
¥In the popup, select "Greater Than" in the Filter dropdown and type 40
过滤后的结果应与以下屏幕截图匹配:
¥The filtered results should match the following screenshot:
SheetJS 模型 公开 formulae 和其他功能。
¥The SheetJS model exposes formulae and other features.
SheetJS 专业版 构建公开了单元格样式、图片、图表、表格和其他功能。
¥SheetJS Pro builds expose cell styling, images, charts, tables, and other features.
[^1]: 有关 LangChain 文档中的 "如何加载 CSV 数据"
¥See "How to load CSV data" in the LangChain documentation
[^2]: 见 readFile
于 "读取文件"
¥See readFile
in "Reading Files"
[^3]: 见 "SheetJS 数据模型"
¥See "SheetJS Data Model"
[^4]: 见 sheet_to_csv
于 "CSV 和文本"
¥See sheet_to_csv
in "CSV and Text"
[^5]: 有关 LangChain 文档中的 "包含多个文件的文件夹"
¥See "Folders with multiple files" in the LangChain documentation
[^6]: 见 "写入文件" 中的 "支持的输出格式" 类型
¥See "Supported Output Formats" type in "Writing Files"
[^7]: 见 "工作簿对象"
¥See "Workbook Object"
[^8]: 见 sheet_to_json
于 "实用工具"
¥See sheet_to_json
in "Utilities"
[^9]: 有关 ChatQA 论文和其他模型详细信息,请参阅 官方 ChatQA 网站。
¥See the official ChatQA website for the ChatQA paper and other model details.