Changes by Marius Dumitru Florea on 03/Sep/25 10:38
Assignee:
Marius Dumitru Florea
Resolution:
Invalid
Status:
OpenClosed
2 comments
Marius Dumitru Florea on 03/Sep/25 10:38
A wiki page that has an attachment produces (when indexed) at least 2 entries in the Solr (search core) index:
an entry for the wiki page itself
an entry for the attachment
Solr index entries have multiple fields. Not all entries have the same fields:
some fields are specific to a particular type of indexed entity
some fields are common to two or more entities
while others are common to all indexed entities
See the Solr schema configuration for details. One of the fields that are common to all indexed entities is type. This means that for our wiki page with an attachment we get:
an entry with type=DOCUMENT
an entry with type=ATTACHMENT
Document and attachment entries have other fields in common, including filename, attcontent and attauthor_display. In other words, attachment information is indexed "twice", both for the document type entry and for the attachment type entry. This redundancy allows for simpler search queries. Being able to search for wiki pages that have some specific information in attachments is an important use case, which would be complex to achieve without this redundancy.
When you perform a search query, the free text (that is not bound to an explicit index field) is matched against some of the index fields. This is controlled by the qf (query fields) parameter, which has a default value specified in solrconfig.xml. This default value is overwritten when the "Result type" facet is used, in order to provide more relevant results for that type. In other words:
when there is no Result type selected (or there are multiple values selected) we match fields from all entity types, with some generic boost values (that favors wiki pages over attachments for instance)
when there is a single result type selected we adapt the list of query fields and their boost to get more relevant results (and faster)
This works fine as long as the fields used for DOCUMENT type are included in the default query fields value. But you modified the DOCUMENT query fields without updating the default query frields which leads to facet inconsistencies when toggling the DOCUMENT result type. The problem is that you can't update the default query fields without removing attachments results completely because those fields you want to remove are common to both document and attachment results. By removing filename, attcontent and attauthor_display from default query fields value you will:
fix the Results type facet for DOCUMENT results
BUT get 0 attachments results when unchecking all result types or when selecting multiple results types
get some results when checking only ATTACHMENT results
Maybe this is acceptable to you, but the fact is that what you are trying to achieve (not matching attachments when looking for documents) is not really possible in a clean way currently, and this is by design, not a bug.
Marius Dumitru Florea on 03/Sep/25 10:45
A wiki page that has an attachment produces (when indexed) at least 2 entries in the Solr (search core) index: * an entry for the wiki page itself * an entry for the attachment
Solr index entries have multiple fields. Not all entries have the same fields: * some fields are *specific* to a particular type of indexed entity * some fields are common to two or more entities * while others are common to *all* indexed entities
See the [Solr schema configuration|https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-search-solr-server/xwiki-platform-search-solr-server-core-search/src/main/resources/conf/managed-schema.xml#L176] for details. One of the fields that are common to all indexed entities is {{{}type{}}}. This means that for our wiki page with an attachment we get: * an entry with {{type=DOCUMENT}} * an entry with {{type=ATTACHMENT}}
Document and attachment entries have other fields in common, including {{{}filename{}}}, {{attcontent}} and {{{}attauthor_display{}}}. In other words, attachment information is indexed "twice", both for the document type entry and for the attachment type entry. This redundancy allows for simpler search queries. Being able to search for wiki pages that have some specific information in attachments is an important use case, which would be complex to achieve without this redundancy.
When you perform a search query, the free text (that is not bound to an explicit index field) is matched against some of the index fields. This is controlled by the {{qf}} (query fields) parameter, which has a *default value* specified in [solrconfig.xml|https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-search-solr-server/xwiki-platform-search-solr-server-core-search/src/main/resources/conf/solrconfig.xml#L708]. This default value is overwritten when the "Result type" facet is used, in order to provide more relevant results for that type. In other words: * when there is no Result type selected (or there are multiple values selected) we match fields from all entity types, with some generic boost values (that favors wiki pages over attachments for instance) * when there is a single result type selected we adapt the list of query fields and their boost to get more relevant results (and faster)
This works fine as long as the fields used for DOCUMENT type are included in the ones from the default query fields value. But you modified the DOCUMENT query fields without updating the default query frields which leads to facet inconsistencies when toggling the DOCUMENT result type. The problem is that you can't update the default query fields without removing attachments results completely because those fields you want to remove are {*} common to both document and attachment results{*}. By removing {{{}filename{}}}, {{attcontent}} and {{attauthor_display}} from default query fields value you will: * fix the Results type facet for DOCUMENT results * BUT get 0 attachments results when unchecking all result types or when selecting multiple results types * get some results when checking only ATTACHMENT results
Maybe this is acceptable to you, but the fact is that what you are trying to achieve (not matching attachments when looking for documents) is not really possible in a clean way currently, and this is by design, not a bug.
This message was sent by Atlassian Jira (v9.3.0#930000-sha1:287aeb6)
If image attachments aren't displayed, see this article.