• About The Tech SEO

The Tech SEO

Because SEO Is More Than Keywords

  • Tech SEO Blog
  • SEO Tools
    • Quick Click SEO Audit – Chrome Extension
    • Discover Lost URLs
    • Redirectinator
You are here: Home / SEO Tools / Discover Lost URLs with Wayback Machine’s API

Discover Lost URLs with Wayback Machine’s API

After reading Patrick Stox’s post on Search Engine Land about Fixing Historical Redirects, I was inspired to start using Wayback Machine’s API within one of my Google Sheets. After a little playing with some formula’s, I developed the Discover Lost URLs Google Sheet.

Discover Lost URLs Spreadsheet

How The Sheet Works

Step 1: Enter Domain

Enter a domain name that you are looking to audit. The URL can be entered in several ways.

  • domain.com
  • subdomain.domain.com
  • www.domain.com/sub-folder/

Once the URL is entered, the sheet will begin to populate.

Step 2: Modify Dates

Set the “Date From” and “Date To” fields to pull the URLs that are in the Wayback Machine’s database.

Step 3: Choose What To Show

By default, the Wayback API spits out all URLs. This includes images, CSS, JS, and HTML. The sheet is set to show “Just HTML” by default. To show all, drop down the selection box to “All”

Reading the Sheet

Time Stamp: The Time Stamp column shows the date the URL was in the index in the format of yyyyMMddhhmmss.

Wayback Machine URL: This is the URL from the database

Mime Type: This indicates whether the URL is for an HTML page, CSS file, JS file, etc…

Current Status: This is a custom function that was written to pull back the URLs status code. If there is a redirect in place, it will follow the redirect and post the path as “301 > 200”. If it is is a redirect chain, it will pull in the chain too, such as “301 > 301 > 302 > 404”

Final URL: This URL is the current final destination of the older Wayback URL. This is great to see where SEO value is being lost.

Download The Discover Lost URLs Google Sheet Now!

Sheet Limitations

Google Sheets can choke up on large amounts of data. I set the API to only pull in 20,000 rows max. I think it will struggle at that, depending on ram and browser. If the sheet is running too slow, try to do a few runs with a smaller date range.

Comments

  1. Peter says

    September 27, 2016 at 10:44 pm

    When we find the broken links, How can we fix, resolve them?

    Thanks,

    Reply
    • The Tech SEO says

      September 27, 2016 at 11:47 pm

      This tool only discovers them. To fix them, you need to created 301 redirect from the old URL to the new URL. How to do that depends on what server you are using.

      Reply
  2. poli says

    December 13, 2017 at 5:05 am

    Thank you very much for this post. Regarding the current status custom function, could you please explain the meaning and difference of the following cases:
    301 > 200
    301 > 301 > 200
    301 > 301 > 301 > 200
    301 > 301 > 303
    301 > 302 > 200
    301 > 302 > 404
    301 > 303
    301 > 404
    302 > 404
    error
    blank
    Actually what i want to understand is the sequence logic.

    Thanks a lot in advance,
    Regards,
    P.

    Reply
    • The Tech SEO says

      January 2, 2018 at 9:32 am

      So sorry I didn’t see your response until now. These are the status codes of the requests it takes to get to the final destination. For example 301 > 301 > 200 means the main URL redirected twice with a 301 before it ended at a 200. Hope that helped.

      Reply
  3. Tyler C says

    June 8, 2019 at 5:08 am

    I may be a bit late to the party, but would it be possible to tweak the script a bit in order to display an additional column with the actual archive url?

    https://web.archive.org/web/20120709000203/http://www.website.com/homepage.html

    ^ For example. Look forward to hearing from you soon. Thanks!
    -Tyler

    Reply
  4. Dav says

    March 21, 2020 at 1:28 am

    Thank you for this article on wayback machine. Difficult to find reliable services.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Welcome To The Tech SEO

Welcome to The Tech SEO website. My name is Jeff Louella. This site is dedicated to some SEO tools I build to help with my daily SEO job. I make them available to the world for free so that you can use them for yourself. Please share the tools with your teams as well a mention where you get them from. If you have any recommendations on making these tools better, feel free to reach out to me at any of the social channels below.

Listen To The Page 2 Podcast

Listen to the Page 2 Podcast. An SEO Podcast about the origins of SEO's

Recent Posts

  • How Technical SEO Procrastination Hurts Your Redesign Effort

Newest Tools

  • Privacy Policy for The Tech SEO – Quick Click Website Audit
  • Discover Lost URLs with Wayback Machine’s API
  • Quick Click SEO Audit – Chrome Extension
  • Redirectinator: A URL Redirect Monitoring Spreadsheet

Tags

redesigning replatforming seo website redesign

Copyright © 2025 · Metro Pro on Genesis Framework · WordPress · Log in