RemoveRepetitiveElements
Documentation / extractor/pdf-to-html/transformations/line-item/RemoveRepetitiveElements
default​
Remove elements with similar content on same page positions, like page numbers, licenes information, etc...
Extends​
Constructors​
Constructor​
new default(): default;
Returns​
Overrides​
Properties​
name​
name: any;
Defined in: packages/ai-research-agent/src/extractor/pdf-to-html/transformations/Transformation.js:11
Inherited from​
itemType​
itemType: any;
Defined in: packages/ai-research-agent/src/extractor/pdf-to-html/transformations/Transformation.js:12
Inherited from​
Methods​
transform()​
transform(parseResult: any): default;
The idea is the following:
- For each page, collect all items of the first, and all items of the last line
- Calculate how often these items occur accros all pages (hash ignoring numbers, whitespace, upper/lowercase)
- Delete items occuring on more then 2/3 of all pages
Parameters​
Parameter | Type |
---|---|
|
|
Returns​
Overrides​
completeTransform()​
completeTransform(parseResult: any): any;
Defined in: packages/ai-research-agent/src/extractor/pdf-to-html/transformations/ToLineItemTransformation.js:19
Sometimes the transform() does only visualize a change. This methods then does the actual change.
Parameters​
Parameter | Type |
---|---|
|
|
Returns​
any