Jump to content

How to create custom tube video grabber for KVS


Tech Support
 Share

Recommended Posts

KVS provides API to use youtube-dl server library for scrapping videos from other tube sites.

You can implement your own grabber class in PHP language and upload it into KVS. Here is how this can be done. The example features fully working custom youtube grabber (KVS has built-in grabber for youtube by the way).

NOTE: it is not strictly required to use youtube-dl API, it is also possible to create a completely custom grabber with your own code.

 

Implementing grabber class using youtube-dl API

Create CustomGrabberYoutube.php with the following code (also attached here as a text file):

<?php

// when you change classname, change it at the very bottom as well in this line:
// $grabber = new CustomGrabberYoutube();
class CustomGrabberYoutube extends KvsGrabberVideoYDL
{
   // ===============================================================================================================
   // infrastructure methods
   // ===============================================================================================================

   public function get_grabber_id()
   {
       //prefix your grabber ID with "custom_"
       return "custom_videos_youtube";
   }

   public function get_grabber_name()
   {
       // name displayed in admin panel
       return "youtube.com";
   }

   public function get_grabber_version()
   {
       // this is required for grabbers that are autoupdated from KVS
       return "1";
   }

   public function get_grabber_domain()
   {
       // domain name, KVS will check this to find out if this grabber is suitable for the given URL
       return "youtube.com";
   }

   public function get_supported_url_patterns()
   {
       // returns list of regexp patterns that describe video URLs, for youtube this pattern will match
       // https://www.youtube.com/watch?v=htOroIbxiFY
       return array("/https?:\/\/(www\.)?youtube\.com\/watch.*/i");
   }

   public function can_grab_description()
   {
       // return true if your grabber is going to provide description for each video
       return false;
   }

   public function can_grab_categories()
   {
       // return true if your grabber is going to provide categories for each video
       return false;
   }

   public function can_grab_tags()
   {
       // return true if your grabber is going to provide tags for each video
       return false;
   }

   public function can_grab_models()
   {
       // return true if your grabber is going to provide models for each video
       return false;
   }

   public function can_grab_content_source()
   {
       // return true if your grabber is going to provide content source for each video
       return false;
   }

   public function can_grab_date()
   {
       // return true if your grabber is going to provide date for each video
       return false;
   }

   public function can_grab_rating()
   {
       // return true if your grabber is going to provide rating for each video
       return false;
   }

   public function can_grab_views()
   {
       // return true if your grabber is going to provide views for each video
       return false;
   }


   public function can_grab_video_files()
   {
       // this should be true for youtube-dl
       return true;
   }

   public function get_supported_qualities()
   {
       // list of supported video qualities, should match what youtube-dl returns in its info under formats
       // run this command:
       // youtube-dl --dump-json https://www.youtube.com/watch?v=PhDXRCLsqz4 >> test.json
       // and open test.json in Firefox, find "formats" array and look into the available formats
       // youtube has too many formats, KVS only supports formats with "ext"="mp4"
       // you can list them here and you will be able to select from them in grabber settings
       return array('360p', '720p');
   }

   public function get_downloadable_video_format()
   {
       // for youtube-dl grabber KVS only supports mp4 formats
       return 'mp4';
   }

   public function can_grab_lists()
   {
       // return true if you want to allow this grabber to grab lists and thus be used on autopilot
       // if true, you will also need to implement grab_list() method - see below
       return false;
   }

   // ===============================================================================================================
   // parsing methods - modify if you need to parse lists or add additional info
   // ===============================================================================================================

   public function grab_list($list_url, $limit)
   {
       // this method is used to grab lists of videos from the given list URL
       // $limit parameter means the number of videos to grab (including pagination)
       // if $limit == 0, then you just need to find all videos on the given URL, no need to care about pagination

       $result = new KvsGrabberListResult();

       // $page_content here is the HTML code of the given page
       $page_content = $this->load_page($list_url);

       // parse $page_content and add all video URLs to the result
       // consider pagination if needed
       // you can use $this->load_page($list_url) method to get HTML from any URL
       $result->add_content_page("https://youtube.com/video1");
       $result->add_content_page("https://youtube.com/video2");
       $result->add_content_page("https://youtube.com/video3");

       return $result;
   }

   protected function grab_video_data_impl($page_url, $tmp_dir)
   {
       // by default the base class will populate these fields (if provided by youtube-dl):
       // - title
       // - MP4 video files for the qualities listed in get_supported_qualities() function
       // - description (should be enabled in can_grab_description() function)
       // - date (should be enabled in can_grab_date() function)
       // - tags (should be enabled in can_grab_tags() function)
       // - categories (should be enabled in can_grab_categories() function)

       $result = parent::grab_video_data_impl($page_url, $tmp_dir);
       if ($result->get_error_code() > 0)
       {
           return $result;
       }

       // do any custom grabbing here for additional fields, which are not supported by youtube-dl
       // $page_content here is the HTML code of the given video page
       //$page_content = $this->load_page($page_url);

       // parse HTML code and set additional data into $result, e.g. data which is not provided by youtube-dl
       //$result->set_rating(85);
       //$result->set_votes(10);
       //$result->set_views(123874);
       //$result->set_content_source("Content Source Name");
       //$result->add_model("Model 1");
       //$result->add_model("Model 2");

       return $result;
   }
}

$grabber = new CustomGrabberYoutube();
KvsGrabberFactory::register_grabber_class(get_class($grabber));
return $grabber;

The code has comments where needed. Basically youtube-dl provides main video info, such as title, description, tags, categories, date and files. If this is enough for you, you should only modify set of methods on top grouped under infrastructure methods section. These methods are designed to integrate grabber into KVS, so you should change them as described.

You should also modify grabber class name in 2 places (top and bottom) and make sure that grabber class name is unique and has Custom in its name (to avoid intersections with any future grabbers we will add).

If you want to implement parsing lists or add additional info, you should modify parsing methods as explained in the code.

 

Implementing grabber class without youtube-dl

Here is example grabber class that is not using youtube-dl. Put your custom parsing logic:

<?php

// when you change classname, change it at the very bottom as well in this line:
// $grabber = new CustomGrabberYoutube();
class CustomGrabberYoutube extends KvsGrabberVideo
{
   // ===============================================================================================================
   // infrastructure methods
   // ===============================================================================================================

   public function get_grabber_id()
   {
       //prefix your grabber ID with "custom_"
       return "custom_videos_youtube";
   }

   public function get_grabber_name()
   {
       // name displayed in admin panel
       return "youtube.com";
   }

   public function get_grabber_version()
   {
       // this is required for grabbers that are autoupdated from KVS
       return "1";
   }

   public function get_grabber_domain()
   {
       // domain name, KVS will check this to find out if this grabber is suitable for the given URL
       return "youtube.com";
   }

   public function get_supported_url_patterns()
   {
       // returns list of regexp patterns that describe video URLs, for youtube this pattern will match
       // https://www.youtube.com/watch?v=htOroIbxiFY
       return array("/https?:\/\/(www\.)?youtube\.com\/watch.*/i");
   }

   public function can_grab_description()
   {
       // return true if your grabber is going to provide description for each video
       return true;
   }

   public function can_grab_categories()
   {
       // return true if your grabber is going to provide categories for each video
       return true;
   }

   public function can_grab_tags()
   {
       // return true if your grabber is going to provide tags for each video
       return true;
   }

   public function can_grab_models()
   {
       // return true if your grabber is going to provide models for each video
       return true;
   }

   public function can_grab_content_source()
   {
       // return true if your grabber is going to provide content source for each video
       return true;
   }

   public function can_grab_date()
   {
       // return true if your grabber is going to provide date for each video
       return true;
   }

   public function can_grab_rating()
   {
       // return true if your grabber is going to provide rating for each video
       return true;
   }

   public function can_grab_views()
   {
       // return true if your grabber is going to provide views for each video
       return true;
   }

   public function can_grab_video_files()
   {
       // return true if your grabber is going to provide video files for each video
       return true;
   }

   public function can_grab_video_embed()
   {
       // return true if your grabber is going to provide embed code for each video
       return true;
   }

   public function can_grab_video_duration()
   {
       // return true if your grabber is going to provide duration for each video
       return true;
   }

   public function can_grab_video_screenshot()
   {
       // return true if your grabber is going to provide screenshot for each video
      return true;
   }

   public function get_supported_qualities()
   {
       // list of supported video qualities that your grabber provides
       return array('360p', '720p');
   }

   public function get_downloadable_video_format()
   {
       // only grabbers that return MP4 files are supported
       return 'mp4';
   }

   public function can_grab_lists()
   {
       // return true if you want to allow this grabber to grab lists and thus be used on autopilot
       // if true, you will also need to implement grab_list() method - see below
       return false;
   }

   // ===============================================================================================================
   // parsing methods
   // ===============================================================================================================

   public function grab_list($list_url, $limit)
   {
       // this method is used to grab lists of videos from the given list URL
       // $limit parameter means the number of videos to grab (including pagination)
       // if $limit == 0, then you just need to find all videos on the given URL, no need to care about pagination

       $result = new KvsGrabberListResult();

       // $page_content here is the HTML code of the given page
       $page_content = $this->load_page($list_url);

       // parse $page_content and add all video URLs to the result
       // consider pagination if needed
       // you can use $this->load_page($list_url) method to get HTML from any URL
       $result->add_content_page("https://youtube.com/video1");
       $result->add_content_page("https://youtube.com/video2");
       $result->add_content_page("https://youtube.com/video3");

       return $result;
   }

   protected function grab_video_data_impl($page_url, $tmp_dir)
   {
       $result = new KvsGrabberVideoInfo();

       // $page_code here is the HTML code of the given video page
       $page_code = $this->load_page($page_url);
       if (!$page_code)
       {
           $result->log_error(KvsGrabberVideoInfo::ERROR_CODE_PAGE_UNAVAILABLE, "Page can't be loaded: $page_url");
           return $result;
       }

       // parse HTML code and set data into $result
       // replace with your parsing logic

       $result->set_canonical($page_url);
       $result->set_title("Demo title");
       $result->set_description("Demo description long description long description long description long description.");

       $result->set_screenshot("http://www.localhost.com/test/test.jpg");
       $result->set_duration(30);
       $result->set_date(time());

       $result->set_views(1526);
       $result->set_rating(87);
       $result->set_votes(11);

       $result->set_embed("<div>embed code</div>");

       $result->add_category("Category 1");
       $result->add_category("Category 2");
       $result->add_category("Category 3");

       $result->add_tag("Tag 1");
       $result->add_tag("Tag 2");
       $result->add_tag("Tag 3");

       $result->add_model("Model 1");
       $result->add_model("Model 2");
       $result->add_model("Model 3");

       $result->set_content_source("Content Source 1");

       $result->add_video_file("360p", "http://www.localhost.com/test/test_360p.mp4");
       $result->add_video_file("720p", "http://www.localhost.com/test/test_720p.mp4");

       $result->add_custom_field(1, "Custom1");
       $result->add_custom_field(3, "Custom3");

       return $result;
   }
}

$grabber = new CustomGrabberYoutube();
KvsGrabberFactory::register_grabber_class(get_class($grabber));
return $grabber;

 

Testing grabber class

Put grabber class file to your project root folder. Also create test_grabber.php file in the same folder with the following code:

<?php

header('Content-Type: text/plain; charset=utf8');
ini_set('display_errors', 1);
error_reporting(E_ERROR | E_PARSE | E_COMPILE_ERROR);

require_once('admin/plugins/grabbers/classes/KvsGrabber.php');

$grabber = require_once('CustomGrabberYoutube.php');
$grabber->init(new KvsGrabberSettings(), "");
if ($grabber instanceof KvsGrabberVideoYDL)
{
   $grabber->set_ydl_binary('/usr/local/bin/youtube-dl');
}
print_r($grabber->grab_video_data('https://www.youtube.com/watch?v=htOroIbxiFY', 'tmp'));

Modify this code to your class name and specify your demo URL.

Then run via browser:

http://domain.com/test_grabber.php

If everything is fine, you should see dumped info from the scrapped video.

 

Installing grabber into KVS

Just go to Plugins -> Grabbers in admin panel and upload your grabber class into Custom grabber field. Then after saving the form you will see your grabber installed marked with red color. You need to open this grabber settings and select Content mode = Download. Also enable the needed fields under Data.

NOTE: If you don't see any fields under Data, then your grabber class doesn't return true from can_grab_xxx() methods.

If you want to update grabber class, simply upload it again. It is recommended to increment version in get_grabber_version() method to stay sure on which version KVS is using.

 

Finding the list of supported video files to grab

If you don't know which formats source site provides (usually a subset of: 240p, 360p, 480p, 720p, 1080p), you can check that from youtube-dl:

youtube-dl --dump-json https://www.youtube.com/watch?v=PhDXRCLsqz4 >> test.json

This should generate test.json file which can be open in firefox to show JSON structure.

Find a node called formats, it should be a list with items describing each supported format.

KVS can only import formats with ext = mp4, you can list them in get_supported_qualities() method using XXXp notation, e.g. 360p, 720p.

Here is sample screenshot for youtube:

youtube_dl_formats.png.e903bb9f4ba9692a6936073621b607db.png

CustomGrabberYoutube.txt

Link to comment
Share on other sites

  • 4 months later...

Here is sample code for album grabber:

<?php

class KvsGrabberAlbumCustomSample extends KvsGrabberAlbum
{
   public function get_grabber_id()
   {
       return "albums_custom_sample";
   }

   public function get_grabber_name()
   {
       return "Sample custom grabber";
   }

   public function get_grabber_version()
   {
       return "1";
   }

   public function get_grabber_domain()
   {
       return "domain1.com";
   }

   public function get_supported_url_patterns()
   {
       return array("/https?:\/\/(www\.)?domain1\.com\/.*/i");
   }

   public function can_grab_description()
   {
       return true;
   }

   public function can_grab_categories()
   {
       return true;
   }

   public function can_grab_tags()
   {
       return true;
   }

   public function can_grab_models()
   {
       return true;
   }

   public function can_grab_content_source()
   {
       return true;
   }

   public function can_grab_rating()
   {
       return true;
   }

   public function can_grab_views()
   {
       return true;
   }

   public function can_grab_date()
   {
       return true;
   }

   public function can_grab_lists()
   {
       return true;
   }

   public function grab_list($list_url, $limit)
   {
       $result = new KvsGrabberListResult();
       $result->add_content_page("http://domain1.com/album1/");
       $result->add_content_page("http://domain1.com/album2/");
       return $result;
   }

   protected function grab_album_data_impl($page_url, $tmp_dir)
   {
       $result = new KvsGrabberAlbumInfo();

       $page_code = $this->load_page($page_url);
       if (!$page_code)
       {
           $result->log_error(KvsGrabberAlbumInfo::ERROR_CODE_PAGE_UNAVAILABLE, "Page can't be loaded: $page_url");
           return $result;
       }

       $result->set_canonical($page_url);
       $result->set_title("Demo title");
       $result->set_description("Demo description long description long description long description long description.");
       $result->set_date(time());

       $result->set_views(1526);
       $result->set_rating(87); //0-100%
       $result->set_votes(11);

       $result->add_category("Category 1");
       $result->add_category("Category 2");
       $result->add_category("Category 3");

       $result->add_tag("tag 1");
       $result->add_tag("tag 2");
       $result->add_tag("tag 3");

       $result->add_model("Model 1");
       $result->add_model("Model 2");
       $result->add_model("Model 3");

       $result->set_content_source("Content Source 1");

       $result->add_image_file("http://www.domain1.com/test/test.jpg?v=1");
       $result->add_image_file("http://www.domain1.com/test/test.jpg?v=2");

       return $result;
   }
}

$grabber = new KvsGrabberAlbumCustomSample();
KvsGrabberFactory::register_grabber_class(get_class($grabber));
return $grabber;
 
Link to comment
Share on other sites

  • 5 months later...

Fatal error: Call to a member function is_import_categories_as_tags() on a non-object in /home/admin/web/xxxxxxxx/public_html/admin/plugins/grabbers/classes/KvsGrabber.php on line 2842

 

We updated test code in the original post for this issue. The new grabber API has things coded differently.

 

Hello as I do so that by url I detect all the vidos in a url and all the albums in a url

 

Content URLs on the page should be detected automatically based on what you provide in this function:

 

public function get_supported_url_patterns()
{
return array("/regexp here/i");
}

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...