Virtualpaper
- Repository
- Official documents
- License: AGPL-v3
Virtualpaper is a document archiving solution that is heavily optimized for searching documents. The biggest difference with Virtualpaper to many other solutions is that Virtualpaper does not store the documents in folders. In fact there’s no such entity as a folder in Virtualpaper. How are documents located and filtered then? Virtualpaper features user-configurable key-value metadata along with a very powerful and fast full-text-search to achieve the same effect, and much more.
For more information see the official documentation.
The screenshot below showcases the most important aspect of Virtualpaper: finding the documents you’re looking for by typing any keywords, metadata or time ranges: The interactive search suggests you with keywords as you type.
Rather than storing documents in a traditional folder structure, the documents are simply stored in a single directory. The idea is to use metadata for storing the same relational information that the folder structure would encapsulate. Instead of putting related documents to same folder or subfolder, Virtualpaper uses metadata key-values to indicate that the documents are somehow related.
For instance, instead of using folder structures like year and month, category, alphabets, all of this data can be stored in each document’s metadata. While this seems complicated and unintuitive, the benefit is clear: instead of storing the documents in a single folder structure, the documents now exist in several parallel contexts, just like folders. Now documents can be filtered and sorted with any metadata or dates or their combinations. Instead of navigating to the document by the folder structure like “it was probably under year 2022 and under invoices” we can just query it with “date:2022 type:invoice”, which will result in the same documents being listed. Examples for multiple contexts are:
- List all ‘invoices’ from last year
- List all inquiries from company x that has value completed:false that are dated to time range
- List all documents related to a project
If you wish to benefit from this kind of filtering, you need to assign at least a few of these meaningful metadata-values. To help automate this, Virtualpaper tries to automatically match these values from document content when indexing them. In addition to filtering content according by metadata, Virtualpaper features full-text-search powered by Meilisearch, which covers all metadata as well as content of the document itself.
This project is in beta phase and help with testing and general feedback is much appreciated.
Features
- Store text documents (pdf, image files are extracted for text content)
- Save any use-configurable key-value metadata to documents
- If configured, try to match key-values automatically from documents
- Detect document date
- User configurable rules for modifying the data
- REST api (swagger documentation is located at api/swaggerdocs/swagger.json) or at /api/v1/swagger.json
- Full-text-search
- Total number of users is limited to 200. This is because Meilisearch has a limit of 200 indices, and each user uses one index. The benefit for own index is that each user can now configure their personal settings: synonyms, stop words and results ranking, thus users have more powerful search capability over their files. Maybe one day it is possible to have more users, though.