Row Association

Association

Associate row items

class row_association.grouper.Association(line_item_fields, predictions=None)

Bases: object

Class for assigning row_number to line item fields given workflow predictions

Example Usage:

litems = Association(

line_item_fields=[“line_value”, “line_date”], predictions=[{“label”: “line_date”, “start”: 12, “text”: “1/2/2021”…..}]

)

litems.get_bounding_boxes(ocr_tokens=[{“postion”…,},]) litems.assign_row_number()

# Get your updated predictions updated_preds: List[dict] = litems.updated_predictions

assign_row_number(in_place=True)

Adds a row_number:int key/val pair based on bounding box position and page Args: in_place (bool): if False, returns updated_predictions Updates: self._line_item_predictions (list of dicts): predictions with row_number added

get_bounding_boxes(ocr_tokens, add_boxes_to_all=False, raise_for_no_token=True, in_place=True)

Adds keys for bounding box top/bottom and page number to line item extraction predictions, and adds all preds to property self._line_item_predictions Args: ocr_tokens (list of dicts): OCR tokens from ‘ondocument’ config (workflow default) raise_for_no_token (bool): raise exception if a matching token isn’t found for a prediction add_boxes_to_all (bool): add bounding box and page number metadata to non line item predictions in_place (bool): if False, returns tokens with bounding boxes

Return type

List[dict]

get_workflow_predictions(workflow_result, pred_status='final', model_name=None, in_place=False)

Gets the predictions and modelname from workflow result Args: workflow_result (dict): Output from completed indico workflow submission pred_status (string): get predictions from final or pre_review in_place (bool): if False, returns predictions model_name (string): optionally, specify the model name Returns: predictions (list of dicts): predictions from indico

Return type

List[dict]

static sequences_overlap(x, y)

Boolean return value indicates whether or not seqs overlap

Return type

bool

property updated_predictions

Indico Wrapper

class row_association.indico_wrapper.Tokens(client)

Bases: object

Class for indico calls

get_ocr_tokens(workflow_result)

Get ocr document tokens from etl file Args: workflow_result: Indico workflow result

Return type

List[dict]