Jump to content

How to use SphinxSearch with KVS for better search and related videos performance


Tech Support

Recommended Posts

SphinxSearch is a 3rd-party software that provides better text search capabilities with many additional features, such as:

  • Fast indexing (for 99% KVS sites, indexing takes seconds to tens of seconds).
  • Fast search (up to 150-250 queries / second with 1,000,000 database items).
  • Stop words.
  • Morphological search: Sphinx has built-in modules for English, Russian, and Czech; modules for French, Spanish, Portuguese, Italian, Romanian, German, Dutch, Swedish, Norwegian, Danish, Finnish, and Hungarian are available.

Using Sphinx in KVS brings the following benefits specifically for KVS sites:

  • Better text search that can be further customized using Sphinx PHP API.
  • Reduce of MySQL database load by moving search and related videos queries to Sphinx, which works much faster than MySQL.
  • Ability to move Sphinx installation to a separate server to remove load from the primary server. This would make sense even if MySQL database is kept on the primary server, since Sphinx doesn't directly uses MySQL when processing queries.

 

Downloading Sphinx extension for KVS

Please download and extract the contents of this archive. If you plan to use Sphinx on the same server where KVS is installed, extract the contents into /admin/sphinx_search subdirectory of your KVS project and that's it.

Set 777 filesystem permissions to data subdirectory within /admin/sphinx_search directory. This directory will contain Sphinx index and data files.

 

Installing and configuring SphinxSearch on your server

We recommend using Sphinx 2.x version, as this is the latest version we tested this manual. It looks like the latest 3 version (3.5.1) is not supported.

First, you need to install Sphinx using their official documentation. After the installation is finished, by default it will come with no configuration file. So you need to create one with specifying KVS index and data source.

Please use root user to perform all operations.

NOTE: it might be possible that your server already has Sphinx used for some other projects, in this case you will need to add KVS data source and index to the existing sphinx.conf file. Same happens when adding additional KVS sites into the same Sphinx configuration - the sphinx.conf file is global and for each KVS project you need to add index + data source descriptions (see example below).

So create or modify /etc/sphinxsearch/sphinx.conf file with this data:

source kvs_videos
{
        type = mysql
        sql_host = <DB host>
        sql_user = <DB username>
        sql_pass = <DB password>
        sql_db = <DB name>
        sql_port = 3306

        sql_query_pre = SET NAMES utf8
        sql_query_pre = SET CHARACTER SET utf8

        sql_query =  SELECT video_id, video_id AS video_id_, title,  description, duration, video_viewed, rating/rating_amount as rating, UNIX_TIMESTAMP(post_date) as post_date, \
                        (SELECT  GROUP_CONCAT(CONCAT_WS(',',ktvs_tags.tag)) FROM ktvs_tags \
                        JOIN ktvs_tags_videos ON (ktvs_tags_videos.tag_id = ktvs_tags.tag_id) \
                        WHERE ktvs_tags_videos.tag_id = ktvs_tags.tag_id AND ktvs_tags_videos.video_id=ktvs_videos.video_id) AS `tags`, \
                        (SELECT  GROUP_CONCAT(CONCAT_WS(',',ktvs_categories.title,ktvs_categories.synonyms)) FROM ktvs_categories \
                        JOIN ktvs_categories_videos ON (ktvs_categories_videos.category_id = ktvs_categories.category_id) \
                        WHERE ktvs_categories_videos.category_id = ktvs_categories.category_id AND ktvs_categories_videos.video_id=ktvs_videos.video_id) AS `categories`, \
                        (SELECT  GROUP_CONCAT(CONCAT_WS(',',ktvs_models.title)) FROM ktvs_models \
                        JOIN ktvs_models_videos ON (ktvs_models_videos.model_id = ktvs_models.model_id) \
                        WHERE ktvs_models_videos.model_id = ktvs_models.model_id AND ktvs_models_videos.video_id=ktvs_videos.video_id) AS `models`, \
                        (SELECT  CONCAT_WS(',',title) FROM ktvs_content_sources WHERE content_source_id=ktvs_videos.content_source_id) AS content_source_title, \
                        (SELECT  CONCAT_WS(',',title) FROM ktvs_dvds WHERE dvd_id=ktvs_videos.dvd_id) AS dvd_title \
                    FROM ktvs_videos where status_id=1 and load_type_id>0

        sql_attr_timestamp = post_date
        sql_attr_bigint = duration
        sql_attr_bigint = video_viewed
        sql_attr_float = rating
        sql_attr_bigint = video_id_

        sql_attr_multi  = uint category_id_general FROM query; \
                          SELECT video_id, category_id \
                          FROM ktvs_categories_videos

}

index videos
{
        source = kvs_videos
        morphology = stem_en
        min_word_len = 2
        min_infix_len = 2
        index_exact_words = 1

        path = /PATH/admin/sphinx_search/data/videos

        charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
}

indexer
{
        mem_limit = 512M
}

searchd
{
        listen = 9312
        listen = 9306

        log = /var/log/sphinxsearch/searchd.log

        max_children = 30

        pid_file = /var/run/sphinxsearch/searchd.pid
}

 

Sections indexer and searchd in this file are global, and sections source kvs_videos and index videos are project-specific.

In source section you need to update database connection details to KVS project database (e.g. <DB host> and other). You can find KVS database connection details in /admin/include/setup_db.php file. But please note that if your Sphinx is located on remote server, you will need to enable remote access to your database and specify its remote IP for DB host.

In index section you need to modify this line and change PATH token to KVS project document root:

path = /PATH/admin/sphinx_search/data/videos

NOTE: in our example config we are using English language morphology and some other word-related settings that are recommended. There are more options that you can use here to tweak how your search works, including adding morphology for other languages, using stop words and etc. Please refer to official SphinxSearch docs for more information about the available options.

 

After config file is created, try to run Sphinx indexer to see if everything is in place:

sudo indexer --all

It should produce response like this:

Sphinx 2.2.11-id64-release (95ae9a6)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'videos'...
collected 225 docs, 0.0 MB
collected 399 attr values
sorted 0.0 Mvalues, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 225 docs, 27634 bytes
total 0.040 sec, 689075 bytes/sec, 5610.55 docs/sec
total 627 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 14 writes, 0.000 sec, 13.9 kb/call avg, 0.0 msec/call avg

If there are no errors, your config file is OK.

NOTE: if you see error about failure to lock some file, highly likely your Sphinx daemon is already running and you need to try rotate command instead:

sudo indexer --rotate --all

 

Once indexer was verified, you need to start Sphinx daemon (searchd) or Sphinx service. Please refer to Sphinx documentation about how to do that on your server OS. It is also important to make sure that Sphinx service is added to your system autorun configuration. When Sphinx is running, you can verify it using rotate command:

sudo indexer --rotate --all

This command should update all indexes defined in sphinx configuration:

Sphinx 2.2.11-id64-release (95ae9a6)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'videos'...
collected 225 docs, 0.0 MB
collected 399 attr values
sorted 0.0 Mvalues, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 225 docs, 27634 bytes
total 0.046 sec, 590028 bytes/sec, 4804.09 docs/sec
total 627 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 14 writes, 0.000 sec, 13.9 kb/call avg, 0.0 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=769449).

 

Finally you need to put a cron task that will automatically rotate (update) Sphinx indexes every 1 hour or so. Otherwise your video data will be old and no new videos will appear in Sphinx results. Set the following command on cron every hour:

sudo indexer --rotate --all

 

Finalizing KVS to start using Sphinx

First of all check that Sphinx connection is working and results are provided. Use this URL for local Sphinx installations, and similar URL adjusted with considering usage of external service if your Sphinx is installed remotely:

https://domain.com/admin/sphinx_search/kvs_sphinx_search_videos.php?query=test&limit=10

If everything is configured correctly, you should see XML with results (adjust query text if needed to make sure there are some results from your database):

<search_feed total_count="23" from="0">
  <gallery>
    <weight>50</weight>
    <kvs_data>1069</kvs_data>
  </gallery>
  ...
</search_feed>

If something is wrong, you should see error message from Sphinx API that may give some clue what is wrong.

 

Then go to KVS admin panel -> Plugins -> External search and configure the following values (replace domain.com with your project domain):

Once validated and saved, go to your site and test how search works. Now your site's search is using Sphinx.

 

In order to change related videos to Sphinx as well, please go to Website UI -> Pages section and search for "related". There are typically several related videos blocks in a project, one for site and the other for in-player related. Open each block and modify the following block parameter:

  • mode_related: Related by title (external search plugin)

Check your videos and test how related videos work. Now your related videos are using Sphinx as well.

 

Multiple projects on a single server

When having multiple projects connected to a single Sphinx server you will need to put their index and data source descriptions into the same sphinx.conf file, considering naming uniqueness as the following:

  • Each source should have unique name and should have different database connection details (at least DB name will be different).
  • Each index should have unique name and also it should refer the correct data source (source = kvs_videos line).
  • Each index should have unique path name (path = /PATH/admin/sphinx_search/data/videos).
  • Each /admin/sphinx_search/kvs_sphinx_search_videos.php file should refer the correct index name in this line
    $config['sphinx_index'] = 'videos';

 

SphinxSearch for albums and search queries

Starting from KVS 6.2.0 it is possible to use external search for albums and search queries. The configuration is almost same as for videos, here are fragments of Sphinx configuration that should be additionally added into sphinx.conf file for albums and search queries. Please note that you need to modify MySQL connection details and /PATH placeholder to your project path.

Also in KVS External search plugin GUI you need to specify different URLs in API call option for albums and search queries (see hints in plugin GUI). If your project misses the needed files, please take them from the updated version of this archive.

Albums:

source kvs_albums
{
        type = mysql
        sql_host = <DB host>
        sql_user = <DB username>
        sql_pass = <DB password>
        sql_db = <DB name>
        sql_port = 3306

        sql_query_pre = SET NAMES utf8
        sql_query_pre = SET CHARACTER SET utf8

        sql_query =  SELECT album_id, album_id AS album_id_, title,  description, album_viewed, rating/rating_amount as rating, UNIX_TIMESTAMP(post_date) as post_date, \
                        (SELECT  GROUP_CONCAT(CONCAT_WS(',',ktvs_tags.tag)) FROM ktvs_tags \
                        JOIN ktvs_tags_albums ON (ktvs_tags_albums.tag_id = ktvs_tags.tag_id) \
                        WHERE ktvs_tags_albums.tag_id = ktvs_tags.tag_id AND ktvs_tags_albums.album_id=ktvs_albums.album_id) AS `tags`, \
                        (SELECT  GROUP_CONCAT(CONCAT_WS(',',ktvs_categories.title,ktvs_categories.synonyms)) FROM ktvs_categories \
                        JOIN ktvs_categories_albums ON (ktvs_categories_albums.category_id = ktvs_categories.category_id) \
                        WHERE ktvs_categories_albums.category_id = ktvs_categories.category_id AND ktvs_categories_albums.album_id=ktvs_albums.album_id) AS `categories`, \
                        (SELECT  GROUP_CONCAT(CONCAT_WS(',',ktvs_models.title)) FROM ktvs_models \
                        JOIN ktvs_models_albums ON (ktvs_models_albums.model_id = ktvs_models.model_id) \
                        WHERE ktvs_models_albums.model_id = ktvs_models.model_id AND ktvs_models_albums.album_id=ktvs_albums.album_id) AS `models`, \
                        (SELECT  CONCAT_WS(',',title) FROM ktvs_content_sources WHERE content_source_id=ktvs_albums.content_source_id) AS content_source_title \
                    FROM ktvs_albums where status_id=1

        sql_attr_timestamp = post_date
        sql_attr_bigint = album_viewed
        sql_attr_float = rating
        sql_attr_bigint = album_id_

        sql_attr_multi  = uint category_id_general FROM query; \
                          SELECT album_id, category_id \
                          FROM ktvs_categories_albums

}

index albums
{
        source = kvs_albums
        morphology = stem_en
        min_word_len = 2
        min_infix_len = 2
        index_exact_words = 1

        path = /PATH/admin/sphinx_search/data/albums

        charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
}

 

Search queries:

source kvs_searches
{
        type = mysql
        sql_host = <DB host>
        sql_user = <DB username>
        sql_pass = <DB password>
        sql_db = <DB name>
        sql_port = 3306

        sql_query_pre = SET NAMES utf8
        sql_query_pre = SET CHARACTER SET utf8

        sql_query =  SELECT search_id, query, query_length, query_results_videos, query_results_albums, query_results_total, amount, added_date \
                    FROM ktvs_stats_search where status_id=1

        sql_attr_timestamp = added_date
        sql_attr_bigint = query_length
        sql_attr_bigint = query_results_videos
        sql_attr_bigint = query_results_albums
        sql_attr_bigint = query_results_total
        sql_attr_bigint = amount
}

index searches
{
        source = kvs_searches
        morphology = stem_en
        min_word_len = 2
        min_infix_len = 2
        index_exact_words = 1

        path = /PATH/admin/sphinx_search/data/searches

        charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
}

 

Multiple languages

It is also possible to configure Sphinx to search in titles and descriptions of multiple languages if your KVS installation has languages defined in Settings -> Languages and has titles / descriptions translated into them.

In order to do that you need to modify the following line in sphinx.conf file for your kvs_videos data source:

sql_query =  SELECT video_id, video_id AS video_id_, title,  description, duration, video_viewed....

In this line the title should be changed with the following construct (you can add all languages that are have title translations):

concat(title, ' ', title_de, ' ', title_es) as title

Similar if you have descriptions translated, you can replace description with this:

concat(description, ' ', description_de, ' ', description_es) as description

Here is the example of the full line:

sql_query =  SELECT video_id, video_id AS video_id_, concat(title, ' ', title_de, ' ', title_es) as title, concat(description, ' ', description_de, ' ', description_es) as description, duration, video_viewed, rating/rating_amount as rating, UNIX_TIMESTAMP(post_date) as post_date, \

 

Then run this command to rotate indexes:

sudo indexer --rotate --all

If there are no errors, your videos can now be searched in different locales.

Link to comment
Share on other sites

  • 1 month later...
On 8/10/2023 at 9:23 AM, Tech Support said:

we do not see much demand in it.

I understand that because many webmasters don't know the power of using Sphnix. even if some one have 1K albums Sphnix will decrease the CPU load 10% from those albums. and can save more resource.

 

On 8/10/2023 at 9:23 AM, Tech Support said:

KVS is a video software at first.

And The KVS moto is also described that is not only a video software but a great software. I just suggested a good Idea 💡 

A great site starts with an idea.

An idea you can make happen with KVS.

Link to comment
Share on other sites

  • 2 months later...

Ubuntu repository still uses Sphinx 2.2:

sudo apt-get install sphinxsearch
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
sphinxsearch is already the newest version (2.2.11-8).

We are not able to test the latest version due to that.

Also our addon is using some kind of official PHP API from here:

https://github.com/sphinxsearch/sphinx/blob/master/api/sphinxapi.php

This file was last updated 8 years ago, so it looks like it may not support recent sphinx versions.

  • Like 1
Link to comment
Share on other sites

Hello.  I'm trying to figure out how to use Sphinx full-text search with KVS in Flexible Matching mode.

 For example

 Kaia=Kaya

 Ethel=Etel

 So that only part of the request matches and a message does not pop up saying that nothing was found if the user missed or wrote 1-2 letters incorrectly.

 We are talking about Flexible Matching mode for Sphinx.

 Does anyone have an understanding of how this is correctly written in his config?

 

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...