In particular when, e.g., the actual embedding instantly fails, it is noticeable that indexing is quite slow. I found that this is due to two reasons:
- The regular expression for finding the last heading at the end of a chunk is surprisingly slow.
- The Solr index is committed very frequently (after every document).
Both can easily be improved. |