PSA from my inbox: check what traffic your firewalls and CDN are blocking. By far the most common issue in my inbox is related to firewalls or CDNs blocking googlebot traffic. If I reach out to the blocking site, in the vast majority of the cases the blockage is unintended. I've said this before, but want to emphasize it again: make a habit of checking your block rules. We publish our IP ranges so it should be very easy to run an automation that checks the block rules against the googlebot subnets. https://lnkd.in/e4DiAbx4
Thank you for sharing
Gold dust in these documents Gary and every brand should be checking these! Thank you 🔥
Gary Illyes what does PSA stand for? Google is telling me "prostate-specific antigen" (Google France, mobile desktop) but I'm quite sure it's not what you intended 😅 (it's not a joke)
If you don't know, here is SEO Gold.
I'm currently trying to filter bad bots and non-human traffic on purpose and only let good bots in like Google. Because to me, it looks like through the machine traffic, whatever is happening there, content is being rehashed by AI, content spinners, etc. Is it not possible to run all google bot IP crawler addresses on ASN 15169 instead of ASN 396982? Then I just have a short fine handled nginx rule: if ( $geoip2_data_autonomous_system_number ~* (15169|...)) { set $badbotasn 0; } Some IPs running on ASN 396982 (Google Cloud) and I don't have a good feeling and the code is also much longer (I hope this is correct) if ($remote_addr ~* (34\.100\.182\.96|34\.101 ... .|35\.247\.243\.240)) { set $badbotasn 0; } Indeed, from the Google Cloud platform also comes something like this: GET /robots.txt HTTP/1.1" 301 162 "-" "Apache-HttpClient/4.5.13 (Java/17.0.3) I'm sure it's not John. :-) We have more bits, longer rules. Fun fact, the last person who stole my content 1:1 also copied the internal links at the same time. This made the plagiarism check easy due to valuable link hints in GSC :-) And a nice tool to see, how is the reputation of an IP and what happen: https://www.abuseipdb.com/check/35.245.188.175
This a very common issue with improperly configured CloudFlare...
Never thought I'll appreciate someone's inbox that much, keep sending those Gary Illyes 🙏
It would be great if we had better debugging tools. We have a persistent bug that *only* trigger on the real Google WRS, occasionally, on a few select pages. Never were able to identify to conditions that caused it (maybe Cloudflare ruled, maybe API not running. Live tests always were fine.) so had to rely on a system of fall backs to mitigate damage whenever it would randomly occur.
This is a common issue with shared hostings. Their firewalls block part of legit traffic along with google bots and what not.
Consultant | SEO, Technical SEO, SEO Website Audits, Growth Strategies, Google Penalty Recovery, Accessibility, Usability, LLMs, & Social Media
1yI have a site tonight that is showing that in tools. How can I confirm that the tools are right since they use an emulated GB. Gary Illyes