Jump to content

Google Search Console - Soft 404


Jim

Recommended Posts

Google Search Console lists some URLs as not indexed with reference "Soft 404". The URLs are of the form https://www.tubedomainname.com/embed/83622. Why does KVS create these URLs, is it correct. They point to a page on the other tube site. What would be the best solution for these pages?

Link to comment
Share on other sites

Do you mean this link contains other domain name than yours? Then likely you are using 3rd-party embed codes for your videos, and these URLs are embed codes.

For example this URL is embed code:

https://www.kvs-demo.com/embed/123

By default robots.txt prevents indexing these URLs, as normally they are duplicates of your video pages and they refer your original video pages via canonical:

<link href="https://www.kvs-demo.com/videos/123/tdwp-danger-wildman/" rel="canonical"/>

 

Link to comment
Share on other sites

To clarify:

Google Search Console lists some crawled URLs as not indexed with reference "Soft 404". The URLs are of the format https://www.mykvstubedomainname.com/embed/83622.

This "embed" URL plays the video full screen which is shown on the page https://www.mykvstubedomainname.com/videos/83622/title-of-video-etc-etc

Google has only started discovering these "embed" pages in the last few weeks, now there are 45.

Is it the case then that the robots.txt file is not effective in preventing indexing of these "embed" URLs and if so how can it be corrected? 

Link to comment
Share on other sites

1 hour ago, Jim said:

Is it the case then that the robots.txt file is not effective in preventing indexing of these "embed" URLs and if so how can it be corrected?

I don't think robots.txt file is not effective here. It should not index these pages by default.

You can check your embed URL in URL inspection like this, for our demo site it shows that its indexing is blocked:

url_inspection.thumb.png.c1925ed3c867f062522e44536933f5bb.png

Link to comment
Share on other sites

The embed page is not indexed. However, the real issue is that the page crawl is allowed according to the URL inspection below, wheras for the demo URL above crawling is blocked by robots.txt.

It appears robots.txt file is not effective in preventing crawling  of these "embed" URLs  - how can this be corrected?

 

 

Crawl
Last crawl
Dec 25, 2023, 7:12:15 PM
Crawled as
Googlebot smartphone
Crawl allowed?
Yes
Page fetch
Successful
Indexing allowed?
Yes
Link to comment
Share on other sites

 Would adding this previous suggested googlebot allow have the effect of overiding all the general disallows including ../embed/.. , explaining why robots.txt was not blocing crawling? Otherwise the robots.txt file is the same as the kvs default, so I have removed it.

 

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...