There is 1 update, 2 comments.
 
 
XWiki Platform / cid:jira-generated-image-avatar-7a5661b1-ab5f-477c-9f33-71b3d91643c6 XWIKI-18419 Open

Simplified Chinese content is not supported by the Solr search

 
View issue   ·   Add comment
 

1 update

 
cid:jira-generated-image-avatar-a3e221d6-4d49-4cb4-a7ce-b2ebf93a0064 Changes by KevinGao on 21/Jun/24 10:30
 
Attachment: image-2024-06-21-16-29-42-721.png
 
 

2 comments

 
cid:jira-generated-image-avatar-a3e221d6-4d49-4cb4-a7ce-b2ebf93a0064 KevinGao on 21/Jun/24 10:32
 

IMPORTANT:Lucene support Chinese content search officially. So is Solr.

 

In xwiki, you should operate manully to support Chinese content search by these step:

1、make sure the version of  Lucene(Solr) in the xwiki you choose. 

check [permanentDirectory]/store/solr/search/conf/solrconfig.xml, you would find content like:

`<luceneMatchVersion>9.8.0</luceneMatchVersion>`

2、download the smartcn jar package related to corresponding lucene version in https://repo1.maven.org/maven2/org/apache/lucene

lucene 9+ choose: lucene-analysis-smartcn/

lucene 4to8: choose: lucene-analyzers-smartcn/

example: for lucene 9.8.0 , I download https://repo1.maven.org/maven2/org/apache/lucene/lucene-analysis-smartcn/9.8.0/lucene-analysis-smartcn-9.8.0.jar

3、 put lucene-analysis-smartcn-X.X.X.jar in [permanentDirectory]/store/solr/search/lib (make sure the read permission for applicaiton running user)

4、edit [permanentDirectory]/store/solr/search/conf/managed-schema.xml  (in xwiki 15 and older, it maybe named `managed-schema`, but it's also a XML file)

add :

 

    <!-- smartcn tokenizer -->
    <dynamicField name="*_zh"  type="text_smartcn"    indexed="true"  stored="true" multiValued="true" />
    <dynamicField name="*_zh_CN"  type="text_smartcn"    indexed="true"  stored="true" multiValued="true" />
    <dynamicField name="*_zh_TW"  type="text_smartcn"    indexed="true"  stored="true" multiValued="true" />
  
    <!-- smartcn tokenizer -->
    <fieldType name="text_smartcn" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
    </fieldType> 

5、the end, restart xwiki, then reindex the xwiki in AdminPage/Search

Now, Chinese content search will be fine (in language zh_CN and zh).

42650_image-2024-06-21-16-29-42-721.png

 

好像我可以用中文写?

 

 

reference:https://jeshs.github.io/2020/10/xwiki%E7%9A%84%E9%85%8D%E7%BD%AE%E5%92%8C%E6%8F%92%E4%BB%B6/

 
cid:jira-generated-image-avatar-a3e221d6-4d49-4cb4-a7ce-b2ebf93a0064 KevinGao on 21/Jun/24 10:37
 
IMPORTANT:Lucene support Chinese content search officially. So is Solr.

 

In xwiki, you should operate manully to support Chinese content search by these step:

1、make sure the version of  Lucene(Solr) in the xwiki you choose. 

check {*}[permanentDirectory]/store/solr/search/conf/solrconfig.xml{*}, you would find content like:

`<luceneMatchVersion>9.8.0</luceneMatchVersion>`

2、download the smartcn jar package related to corresponding lucene version in
[ https://repo1.maven.org/maven2/org/apache/lucene ]

*lucene 9+ choose: lucene-analysis-smartcn/*

*lucene 4to8: choose: lucene-analyzers-smartcn/*

_example: for lucene 9.8.0 , I download [https://repo1.maven.org/maven2/org/apache/lucene/lucene-analysis-smartcn/9.8.0/lucene-analysis-smartcn-9.8.0.jar]_

3、 put lucene-analysis-smartcn-X.X.X.jar in *[permanentDirectory]/store/solr/search/lib* (make sure the read permission for applicaiton running user)

4、edit {*}[permanentDirectory]{*}/store/solr/search/conf/managed-schema.xml  (in xwiki 15 and older, it maybe named `managed-schema`, but it's also a XML file)

add :

 
{code:java}
    <!-- smartcn tokenizer -->
    <dynamicField name="*_zh"  type="text_smartcn"    indexed="true"  stored="true" multiValued="true" />
    <dynamicField name="*_zh_CN"  type="text_smartcn"    indexed="true"  stored="true" multiValued="true" />
    <dynamicField name="*_zh_TW"  type="text_smartcn"    indexed="true"  stored="true" multiValued="true" />
  
    <!-- smartcn tokenizer -->
    <fieldType name="text_smartcn" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
    </fieldType> {code}
note: if there is exception with in untested version, you may need to lookup lucene officail website to make sure the HMMChineseTokenizerFactory {color:#172b4d}position{color}.

5、the end, restart xwiki, then reindex the xwiki in AdminPage/Search

Now, Chinese content search will be fine (in language zh_CN and zh).

!image-2024-06-21-16-29-42-721.png|width=379,height=221!

 

好像我可以用中文写?

 

 

reference: [ https://jeshs.github.io/2020/10/xwiki%E7%9A%84%E9%85%8D%E7%BD%AE%E5%92%8C%E6%8F%92%E4%BB%B6/ ]