When crawling a page, it fails with a "origin error" message (code 520)

Occasionally you might try to copy a website, only to get a 520 error code from each page.

What is status code 520?

HTTP status code 520 is not specified in any RFCs, but is used by Cloudflare's reverse proxies to signal an "unknown connection issue between CloudFlare and the origin web server" to a client in front of the proxy.

How can I work around it?

The most likely reason you are getting this message is due to the default user agent for your crawl project. By default, the user agent is the name and version of the Cyotek product doing the crawling, and the name and version of the crawling library itself. For WebCopy projects, this is generally CyotekWebCopy/1.0 CyotekWebCrawler/1.0. Notice that this user agent does not identify itself as a web browser.

Changing your user agent to that of an actual browser may help.

How do I change the user agent of my project?

  • Open the Project menu and choose User Agent
  • Select Use custom user agent
  • Either select one of the pre-defined agents, or enter your own custom string.
  • Click OK to save the changes

How can I check before crawling an entire site?

You can use the Test URI feature of WebCopy to determine if the URI you want to crawl is going to reject the user agent. Simply click Test URI from the toolbar, enter the URL of the site to test, select a predefined or custom user agent and click Test. WebCopy will try access the URL, and will notify you of any problems.