数学与统计
Excel 中的每个有效数字都可以表示为 "IEEE754 双"[^1]。
¥Each valid number in Excel can be represented as an "IEEE754 double"[^1].
JavaScript 完全支持 IEEE754 双打和单打,是数学和统计分析的优秀语言。它也被证明是一个可行的机器学习平台。
¥With full support for IEEE754 doubles and singles, JavaScript is an excellent language for mathematics and statistical analysis. It has also proven to be a viable platform for machine learning.
演示
¥Demos
各种库的演示包含在单独的页面中:
¥Demos for various libraries are included in separate pages:
类型化数组
¥Typed Arrays
现代 JavaScript 数学和统计库通常使用 Float64Array
或 Float32Array
对象来有效存储数据变量。
¥Modern JavaScript math and statistics libraries typically use Float64Array
or
Float32Array
objects to efficiently store data variables.
Technical details (click to show)
Under the hood, ArrayBuffer
objects represent raw binary data. "Typed arrays"
such as Float64Array
and Float32Array
are objects designed for efficient
interpretation and mutation of ArrayBuffer
data.
ArrayBuffer
objects are roughly analogous to heap-allocated memory. Typed
arrays behave like typed pointers.
JavaScript
const buf = new ArrayBuffer(16);
const dbl = new Float64Array(buf);
dbl[1] = 3.14159;
const u8 = new Uint8Array(buf);
for(let i = 0; i < 8; ++i)
console.log(u8[i+8]);
Equivalent C
void *const buf = malloc(16);
double *const dbl = (double *)buf;
dbl[1] = 3.14159;
uint8_t *const u8 = (uint8_t *)buf;
for(uint8_t i = 0; i < 8; ++i)
printf("%u\n", u8[i+8]);
从表格中读取
¥Reading from Sheets
每个类型化数组类都有一个 from
静态方法,用于将数据转换为类型化数组。Float64Array.from
返回 double
类型数组(每个值 8 字节),Float32Array.from
生成 float
类型数组(4 字节)。
¥Each typed array class has a from
static method for converting data into a
typed array. Float64Array.from
returns a double
typed array (8 bytes per
value) and Float32Array.from
generates a float
typed array (4 bytes).
const column_f32 = Float32Array.from(arr); // 4-byte floats
const column_f64 = Float64Array.from(arr); // 8-byte doubles
数组中的值将被强制为相关的数据类型。不支持的条目将转换为安静的 NaN
值。
¥Values in the array will be coerced to the relevant data type. Unsupported
entries will be converted to quiet NaN
values.
提取工作表数据
¥Extracting Worksheet Data
带有选项 header: 1
的 SheetJS sheet_to_json
[^2] 方法从工作表对象生成数组的数组。结果按行优先顺序排列:
¥The SheetJS sheet_to_json
[^2] method with the option header: 1
generates an
array of arrays from a worksheet object. The result is in row-major order:
const aoa = XLSX.utils.sheet_to_json(worksheet, {header: 1});
分类变量
¥Categorical Variables
二分变量通常表示为电子表格 TRUE
或 FALSE
。SheetJS sheet_to_json
方法会将这些值转换为 true
和 false
。类型化数组方法将分别将值解释为 1
和 0
。
¥Dichotomous variables are commonly represented as spreadsheet TRUE
or FALSE
.
The SheetJS sheet_to_json
method will translate these values to true
and
false
. Typed array methods will interpret values as 1
and 0
respectively.
多分变量必须手动映射到数值。例如,使用 Iris 数据集:
¥Polychotomous variables must be manually mapped to numeric values. For example, using the Iris dataset:
[
["sepal length", "sepal width", "petal length", "petal width", "class"],
[5.1, 3.5, 1.4, 0.2, "Iris-setosa"],
[4.9, 3, 1.4, 0.2, "Iris-setosa"],
]
E 列 (class
) 是一个多分变量,必须手动翻译:
¥Column E (class
) is a polychotomous variable and must be manually translated:
const aoa = XLSX.utils.sheet_to_json(worksheet, {header: 1});
/* index_to_class will be needed to recover the values later */
const index_to_class = [];
/* map from class name to number */
const class_to_index = new Map();
/* loop over the data */
for(let R = 1; R < aoa.length; ++R) {
/* Column E = SheetJS row 4 */
const category = aoa[R][4];
const val = class_to_index.get(category);
if(val == null) {
/* assign a new index */
class_to_index.set(category, index_to_class.length);
aoa[R][4] = index_to_class.length;
index_to_class.push(category);
} else aoa[R][4] = val;
}
Live Demo (click to show)
This example fetches and parses iris.xlsx
.
The first worksheet is processed and the new data and mapping are printed.
function SheetJSPolychotomy() { const [cat, setCat] = React.useState([]); const [aoa, setAoA] = React.useState([]); React.useEffect(() => { (async() => { const ab = await (await fetch("/typedarray/iris.xlsx")).arrayBuffer(); const wb = XLSX.read(ab); const aoa = XLSX.utils.sheet_to_json(wb.Sheets[wb.SheetNames[0]], {header:1}); const index_to_class = []; const class_to_index = new Map(); for(let R = 1; R < aoa.length; ++R) { const category = aoa[R][4]; const val = class_to_index.get(category); if(val == null) { class_to_index.set(category, index_to_class.length); aoa[R][4] = index_to_class.length; index_to_class.push(category); } else aoa[R][4] = val; } /* display every 25 rows, skipping the header row */ setAoA(aoa.filter((_, i) => (i % 25) == 1)); setCat(index_to_class); })(); }, []); return ( <> <b>Mapping</b><br/> <table><thead><tr><th>Index</th><th>Name</th></tr></thead><tbody> {cat.map((name, i) => (<tr><td>{i}</td><td>{name}</td></tr>))} </tbody></table> <b>Sample Data</b><br/> <table><thead><tr>{"ABCDE".split("").map(c => (<th>{c}</th>))}</tr></thead><tbody> {aoa.map(row => (<tr>{row.map(col => (<td>{col}</td>))}</tr>))} </tbody></table> </> ); }
每列一个变量
¥One Variable per Column
存储数据集时,每行代表一个观察值,每列代表一个变量是很常见的:
¥It is common to store datasets where each row represents an observation and each column represents a variable:
var aoa = [
["sepal length", "sepal width", "petal length", "petal width", "class"],
[5.1, 3.5, 1.4, 0.2, "Iris-setosa"],
[4.9, 3, 1.4, 0.2, "Iris-setosa"],
]
数组 map
操作可以从单个列中提取数据。映射后,slice
可以删除标头标签。例如,以下代码片段将列 C ("花瓣长度") 拉入 Float64Array
:
¥An array map
operation can pull data from an individual column. After mapping,
a slice
can remove the header label. For example, the following snippet pulls
column C ("petal length") into a Float64Array
:
const C = XLSX.utils.decode_col("C"); // Column "C" = SheetJS index 2
const petal_length = Float64Array.from(aoa.map(row => row[C]).slice(1));
每行一个变量
¥One Variable per Row
一些数据集存储在表中,其中每行代表一个变量,每列代表一个观察值:
¥Some datasets are stored in tables where each row represents a variable and each column represents an observation:
JavaScript | Spreadsheet |
---|---|
|
在行优先数组中,外部数组的每个条目都是一行。
¥From the row-major array of arrays, each entry of the outer array is a row.
许多工作表都包含标题列。slice
方法可以删除标头。去掉头部后,Float64Array.from
可以生成一个类型化数组。例如,此代码片段将第 3 行 ("花瓣长度") 拉入 Float64Array
:
¥Many sheets include header columns. The slice
method can remove the header.
After removing the header, Float64Array.from
can generate a typed array. For
example, this snippet pulls row 3 ("petal length") into a Float64Array
:
const petal_length = Float64Array.from(aoa[2].slice(1));
写入表格
¥Writing to Sheets
SheetJS aoa_to_sheet
[^3] 方法可以从数组数组生成工作表。同样,sheet_add_aoa
[^4] 可以将数据数组的数组添加到现有工作表对象中。origin
选项 [^5] 控制数据将写入工作表中的位置。
¥The SheetJS aoa_to_sheet
[^3] method can generate a worksheet from an array of
arrays. Similarly, sheet_add_aoa
[^4] can add an array of arrays of data into
an existing worksheet object. The origin
option[^5] controls where data will
be written in the worksheet.
这两种方法都不理解类型化数组,因此数据列必须转换为数组的数组。
¥Neither method understands typed arrays, so data columns must be converted to arrays of arrays.
每行一个变量
¥One Variable per Row
可以使用 Array.from
将单个类型数组转换为纯 JS 数组:
¥A single typed array can be converted to a pure JS array with Array.from
:
const arr = Array.from(row);
可以从数组创建数组的数组:
¥An array of arrays can be created from the array:
const aoa = [
arr // this array is the first element of the array literal
];
aoa_to_sheet
和 sheet_add_aoa
将其视为一行。默认情况下,数据将写入工作表第一行的单元格中。
¥aoa_to_sheet
and sheet_add_aoa
treat this as one row. By default, data will
be written to cells in the first row of the worksheet.
可以使用 unshift
操作将标题添加到数据行,但使用 aoa_to_sheet
构建工作表效率更高:
¥Titles can be added to data rows with an unshift
operation, but it is more
efficient to build up the worksheet with aoa_to_sheet
:
/* sample data */
const data = new Float64Array([54337.95, 3.14159, 2.718281828]);
const title = "Values";
/* convert sample data to array */
const arr = Array.from(data);
/* create worksheet from title (array of arrays) */
const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);
/* add data starting at B1 */
XLSX.utils.sheet_add_aoa(ws, [ arr ], { origin: "B1" });
Live Demo (click to hide)
在此示例中,导出两个类型化数组。aoa_to_sheet
创建工作表,sheet_add_aoa
将数据添加到工作表中。
¥In this example, two typed arrays are exported. aoa_to_sheet
creates the
worksheet and sheet_add_aoa
will add the data to the sheet.
function SheetJSeriesToRows() { return (<button onClick={() => { /* typed arrays */ const ta1 = new Float64Array([54337.95, 3.14159, 2.718281828]); const ta2 = new Float64Array([281.3308004, 201.8675309, 1900.6492568]); /* create worksheet from first typed array */ const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]); const arr1 = Array.from(ta1); XLSX.utils.sheet_add_aoa(ws, [ arr1 ], { origin: "B1" }); /* add second title to cell A2 */ XLSX.utils.sheet_add_aoa(ws, [["Value2"]], { origin: "A2" }); /* add second typed array starting from cell B2 */ const arr2 = Array.from(ta2); XLSX.utils.sheet_add_aoa(ws, [ arr2 ], { origin: "B2" }); /* export to file */ const wb = XLSX.utils.book_new(ws, "Export"); XLSX.writeFile(wb, "SheetJSeriesToRows.xlsx"); }}><b>Click to export</b></button>); }
每列一个变量
¥One Variable per Column
可以使用 Array.from
将单个类型数组转换为纯 JS 数组。对于列,每个值应单独封装在一个数组中:
¥A single typed array can be converted to a pure JS array with Array.from
. For
columns, each value should be individually wrapped in an array:
JavaScript | Spreadsheet |
---|---|
|
Array.from
采用第二个参数。如果它是一个函数,则将在每个元素上调用该函数,并且将使用该值代替原始值(实际上,映射数据)。要生成数据列,每个元素必须封装在数组字面量中:
¥Array.from
takes a second argument. If it is a function, the function will be
called on each element and the value will be used in place of the original value
(in effect, mapping over the data). To generate a data column, each element must
be wrapped in an array literal:
var arr = Array.from(column, (value) => ([ value ]));
aoa_to_sheet
和 sheet_add_aoa
将此视为每行一列数据的行。默认情况下,数据将写入 "A" 列中的单元格。
¥aoa_to_sheet
and sheet_add_aoa
treat this as rows with one column of data
per row. By default, data will be written to cells in column "A".
可以使用 unshift
操作将标题添加到数据列,但使用 aoa_to_sheet
构建工作表效率更高:
¥Titles can be added to data columns with an unshift
operation, but it is more
efficient to build up the worksheet with aoa_to_sheet
:
/* sample data */
const data = new Float64Array([54337.95, 3.14159, 2.718281828]);
const title = "Values";
/* convert sample data to array */
const arr = Array.from(data, (value) => ([value]));
/* create worksheet from title (array of arrays) */
const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);
/* add data starting at B1 */
XLSX.utils.sheet_add_aoa(ws, arr, { origin: "A2" });
Live Demo (click to hide)
在此示例中,导出两个类型化数组。aoa_to_sheet
创建工作表,sheet_add_aoa
将数据添加到工作表中。
¥In this example, two typed arrays are exported. aoa_to_sheet
creates the
worksheet and sheet_add_aoa
will add the data to the sheet.
function SheetJSeriesToCols() { return (<button onClick={() => { /* typed arrays */ const ta1 = new Float64Array([54337.95, 3.14159, 2.718281828]); const ta2 = new Float64Array([281.3308004, 201.8675309, 1900.6492568]); /* create worksheet from first title */ const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]); /* add first typed array starting from cell B1 */ const arr1 = Array.from(ta1, (value) => ([value])); XLSX.utils.sheet_add_aoa(ws, arr1, { origin: "A2" }); /* add second title to cell B1 */ XLSX.utils.sheet_add_aoa(ws, [["Value2"]], { origin: "B1" }); /* add second typed array starting from cell B2 */ const arr2 = Array.from(ta2, (value) => ([value])); XLSX.utils.sheet_add_aoa(ws, arr2, { origin: "B2" }); /* export to file */ const wb = XLSX.utils.book_new(ws, "Export"); XLSX.writeFile(wb, "SheetJSeriesToCols.xlsx"); }}><b>Click to export</b></button>); }
[^1]: 详情请参阅 "基本值" 于 "单元格对象"
¥See "Underlying Values" in "Cell Objects" for more details
[^2]: 见 sheet_to_json
于 "实用工具"
¥See sheet_to_json
in "Utilities"
[^3]: 见 aoa_to_sheet
于 "实用工具"
¥See aoa_to_sheet
in "Utilities"
[^4]: 见 sheet_add_aoa
于 "实用工具"
¥See sheet_add_aoa
in "Utilities"