I'm trying to get the plain text version of an Alfresco node through its API. I understand that this version is created or the plain text at least is extracted because I can search documents by its content and Alfresco found it and shows the content in the search results page.
I'm using Alfresco 23.4 (community) on Docker with the official acs-deployment repo, and the best aproach for me is to avoid any change to its configuration.
By default, a node has no text plain rendition or children. For some file formats (like DOCX) a PDF rendition is available and for all nodes a doclib children is available as PNG image. But no plain text rendition/children.
So I ended up creating a new one with this configuration:
{"renditions": [ {"renditionName": "text","targetMediaType": "text/plain" } ]}
Then, I can trigger the transformation using a JS script that is executed on any change on a folder:
var renderingEngineName = 'reformat';var renditionDefinitionName = 'cm:text';var renditionDef = renditionService.createRenditionDefinition(renditionDefinitionName, renderingEngineName);renditionDef.parameters['mime-type'] = 'text/plain';var textRendition = renditionService.render(document, renditionDef);
Then I can get the content of this rendition using the REST API but there are two problems:
- First, Alfresco is painly slow after this changes (maybe I'm doing something wrong?)
- This has no sense if Alfresco already extract this text and stores it internally
I also tried using the CMIS API but can only get the original content instead of the plain text. So, is there a way to get the plain text version of a node without need to create a new rendition? Is there an endpoint for this? Maybe this text is only stored on Solr? If I'm right, how can I get the full content stored on Solr?
Thanks!