Server Log Analysis in WordPress [Guide 2021]

This guide discusses how to do server log analysis, directly from the WordPress Dashboard - using the IndexJet plugin.

How To Do Server Log Analysis with IndexJet

Server log analysis used to be something only very experienced SEO's worked on.

With IndexJet's ability to track Googlebot on your WordPress website, now anyone can get started with this important SEO technique.

How Often To Check Log Files?

The larger the website, the more often you want to check.

For a 50 page site: 1 check per month

For a 1000 page site: 2 checks per month

For a 100k page site: Check each week at minimum

In general, crawl budget is only an issue if your website has between 50k to 100k+ pages.

For any websites with 500 or even a 1000 pages it's not a problem.

Find Large Pages that Google could not crawl due to size

Issue: Crawl budget can be wasted on these large pages

Find pages or types of pages that Google did not crawl

IndexJet Workflow:

  1. Go to Main Dashboard
  2. Filter for Crawl Frequency 0 (zero)

Issue: Google can't access certain sections of your website i.e. domain.com/shoes/red/ladies/

Solution: Point internal links to these deeper pages, create specific sitemaps for these pages

Find 404 Errors that Googlebot landed on:

IndexJet Workflow:

Crawl Optimizer -> Filter for 404

Issue: You are sending Google bot into a page where no useful content is

Fix: Create redirects to relevant pages

Find 5xx Errors that Googlebot landed on:

IndexJet Workflow:

Crawl Optimizer -> Filter for 5xx

Issue: Google is getting an error response from server - Googlebot then slows down crawling/discovering your website

Fix: Create redirects to relevant pages

Find Redirects that Googlebot landed on:

IndexJet Workflow:

Crawl Optimizer -> Filter for 3xx status

Issue: Google landing on HTTP -> HTTPs or WWW -> NON-WWW

Fix: Reduce the amount of redirects to not waste crawl budgets. Fix this by links pointing to your website

all using HTTPs for example

Remove Hacked Pages from Google [Ultimate Guide 2021]

This guide discusses how to clean up a hacked WordPress website from an SEO perspective, not a security perspective. 

In particular, it goes over the hack where hundreds or thousands of pages were created by the hackers on your website and hackers have indexed these pages in search results.

How Hacked URLs Tank Rankings

Why care about cleaning up hacked URLs from Google’s search engine result pages (SERPs)? 

I have found that it affects rankings and crawl budget negatively to have Google “believe” that these hacked pages are part of your website.

Google will process these pages and determine that your website is not anymore only about topic a) i.e. dentistry in New York  it is now also about the topic that the hacked pages are about b) i.e. Viagra sales in Japan.  

If you had 40 dentistry pages known to Google to start with and hackers added 2000 viagra pages, Google may think you are now mainly a viagra outlet and will not trust you for rankings in the dental industry in New York anymore.

As a Google user and dental patient I also prefer dentists that don't peddle viagra on the side.

I have seen this happen in live SERPs, once the hacked pages were removed the rankings recovered.

1. Get A List Of Hacked URLs

There are three ways to get the list of URLs that hackers have inserted into your website. Some are better than others...

     a) Decrypting hackers files on your server (best method as this gets you the full list immediately)

     b) Getting the list of URLs from Google Search Console (second best)

     c) Using site:domain.com search operator  (Use Serp Extractor)

Decrypting files on your server

Decrypting files on your server is the ONLY way you will ever get all URLs that the hackers created all at once. 

DO NOT delete the hacked files before decrypting them.

The problem with all other methods is that you are now using a secondary source, such as Google search results or Google search console. Secondary sources will only give a small percentage of the total URLs and you will spend a lot of time getting the full list.

I use a guy on Fiverr who specializes in decryption. I won’t mention him here but he is easy to find and charges start from $10. 

Search: “decrypt php” on Fiverr. 

2. Set Hacked Pages To 410 Error Status

A 410 status code tells search engines that the page has been deleted. 410 Errors are not supported by GSC reports.

It will show as an error in GSC for reporting purposes only. Google will action this correctly on their own server. 

There are two ways to create 410 errors on your WP site that I recommend. I have coded two plugins that help you do this.

You can choose either the paid or the free route at this point.

     Plugin 1) IndexJet (paid)

     Plugin 2) 410 Delete Pages SEO (free)

IndexJet also allows you to request Google to visit your 410  pages again in bulk and generate XML sitemaps. More on that below.

Go to the 410 Tool/section in either of the WP plugins and copy/paste your list of hacked URLs that you want to be 410 errors.

Benefits of 410 errors over 404:

With a 410 we are telling Google clearly this page has been deleted. We are saying once Google has discovered the 410, there is no need for them to come back and check again, because the page is gone. This can save crawl budget when compared to a 404. 

A 404 forces Google to come back regularly to check if your page is now back online. This is not something you want if you have thousands of hacked pages that Google needs to check 404’s on.

4. Create a Sitemap Of The Hacked URLs

Inside the IndexJet plugin you can directly create a sitemap.xml that includes all hacked pages, which you have now set to 410. This is important because we want Google to visit all these pages to discover our new 410 deleted status.

If you do not have IndexJet, the workaround is to use this Google sheet (make a copy) that I created to generate an XML sitemap from a list of URLs that you can copy and paste. You would then need to manually upload this XML sitemap to your server. The last step is to submit your sitemap to Google via GSC.

5. Submit Sitemap To Google Search Console

Copy and paste your sitemap URL into Google Search Console

6. Get Google To Discover The New Status (410) Of Your Hacked URLs 

We have submitted a sitemap to Google, but this is not enough. Hackers have likely also hijacked your GSC and submitted their own sitemap previously to get the hacked pages indexed.

Google is reluctant to visit these hacked pages again because they are of low quality.

So how do we force Googlebot to come back as soon as possible to look at these hacked pages, in order to make Google realise that they are now 410 - deleted/gone?

The solution is IndexJet’s integration with three indexing services:

  • OmageIndexer
  • EliteIndexer
  • SpeedLinks

Via these services you can request Google to come discover your pages, IN ONE CLICK,  directly from your WP dashboard. 

At this point you need to bring your own API key. We might offer plans later that includes X amounts of Indexing credits per month/year.

An alternative for SEO's who are not IndexJet users would be to Google search console and submit hacked pages for indexing manually.

7. Monitor Googlebot While It Crawls Your URLs

We want to know when Google has discovered your 410 status codes.

At this point Google is aware that you have deleted the hacked pages from your server and can start to reprocess the signals (such as topical relevance of your website and internal link signals).

You can use the "Crawl Optimizer" section in IndexJet to monitor URLs that have been visited by Googlebot (this section is currently being worked on and will be released on 12th Jan 2021.)

An alternative to using IndexJet would be to look at your server logs. A tool such as Screaming Frog Log File Analyser would be needed in this case.

Conclusion

I hope the article has helped you to map out a way that you can clean up hacked pages in a straight forward fashion. Free and Paid tools were provided alike.

My goal is to help SEO's and WordPress uers make their workflows easier. Cleaning up a hacked website can be a real headache!

If you need any further advice please join our free Facebook Group or Skype group to chat further there.