API Reference

Core Module

class chunking_experiment.core.ChunkingExperiment(input_file: str, output_file: str, file_format: FileFormat = FileFormat.CSV, auto_run: bool = True, n_chunks: int = 4, chunking_strategy: str = 'rows', save_chunks: bool = False)[source]

Bases: object

process_chunks(strategy: ChunkingStrategy) List[DataFrame] | List[ndarray][source]

Process input data into chunks and optionally save them to output files.

Parameters:

strategy – ChunkingStrategy enum specifying how to split the data

Returns:

List of pandas DataFrames or NumPy arrays representing the chunks

class chunking_experiment.core.ChunkingStrategy(value)[source]

Bases: str, Enum

An enumeration.

BLOCKS = 'blocks'
COLUMNS = 'columns'
NO_CHUNKS = 'None'
ROWS = 'rows'
TOKENS = 'tokens'
class chunking_experiment.core.FileFormat(value)[source]

Bases: str, Enum

An enumeration.

CSV = 'csv'
JSON = 'json'
NUMPY = 'numpy'
PARQUET = 'parquet'

Gradio Interface