Jump to content

Recommended Posts

Posted

Google lists 1000 example URLs under "Crawled - currently not indexed". (There are approx 20k total but only 1000 URLs may be listed.)

Most of these are in the format:

....../get_file/0/b04c151f4453927a9b37d6774df63e17bb57891a1b/0/20/screenshots/1.jpg/

...../get_file/6/93d0af9c5eccc35429879aae59d9155863ae321c7b/64000/64707/64707_720p.mp4/?rnd=1722643200142

These types of URLs would not be intended to be indexed. Is it possible to prevent Google from crawling these types of pages?

I don't know if Google dicovering/crawling these sorts of URLs is harmful to the overall assessment of the site and indexing of pages - does anyone have experience of this?

  • 2 weeks later...
Posted

Why is the default robots.txt set to allow robots to crawl page content URLs (files)? There doesn't appear to be any SEO purpose to allow robots to crawl these pages? If there is such purpose intended please could you explain?

If there is no operational/SEO benefit to allowing robots to crawl (& inspect) these page content URLs (files) then it would appear to be appropriate to block robots from these pages.

One way might be to add lines to the robots.txt file such as:

User-agent: *

Disallow: /get_file/

Would this block robots from the page content URLs examples listed above and not otherwise affect or stop the robots crawling the page URLs intended to be indexed?

 

 

Posted

For indexing videos Google needs to download files.

https://developers.google.com/search/docs/crawling-indexing/sitemaps/video-sitemaps

Quote

Additionally, the following requirements apply to video sitemaps specifically:

  • Don't list videos that are unrelated to the content of the host page. For example, a video that is a small addendum to the page, or unrelated to the main text content.
  • All files referenced in the video sitemap must be accessible to Googlebot. This means that all URLs in the video sitemap:
    • must not be disallowed for crawling by robots.txt rules,
    • must be accessible without metafiles and without logging in,
    • must not be blocked by firewalls or similar mechanism,
    • and must be accessible on a supported protocol: HTTP and FTP (streaming protocols are not supported).

 

Posted

Ok ty it looks like there is no choice but to keep these URLs accessible even if they fill up the Google report with content URLs not intended to be indexed

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...