- Newest
- Most votes
- Most comments
To prevent search crawlers such as Google from indexing your site, you can create a robots.txt file in your web root
For example, this will inform all crawlers not to index your website
User-agent: *
Disallow: /
You can also change your WordPress Privacy settings as per Optimize Your Site for Search Engines (SEO)
https://en-support.files.wordpress.com/2023/08/privacy-settings-public.png?w=1024
From my experience, three countermeasures are all needed:
-
Add a WordPress plugin or custom theme code to change the canonical URL link element in the HTML headers of all pages such as: "<link rel='canonical' href='blog.domain.tld/actual/url/here' />" Note that the "type" attribute is omitted due to a bug in some versions of Google. In some versions of WordPress, the predefined rel_canonical action does this slightly wrong hence the need for code to override it.
-
Change the web server configuration to include the right domain name in a canonical URL HTTP Link header for non-html files such as 'Link: https://blog.domain.tld/actual/url/here.png; rel="canonical">' Instructions depend on the web server software.
-
Contact the sites that link to your IP address and tell them to link to the right domain name instead.
Relevant content
- asked 6 months ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 2 months ago
The proposed robots.txt would also be served from the public name of the wordpress site, making your answer as given non-functional. Furthermore, Google is extremely bad at obeying robots.txt, so even if the OP had a mechanism to serve a different robots.txt for IP-only URLs, it might not work.