Cyotek WebCopy Revision History

Copy websites locally for offline browsing

Date Released: 18 February 2012Version 1.0.0.9

Please report any errors in how websites are crawled so that we can continue to improve Cyotek WebCopy. The more information you can provide, the better we can make the product.

Changes and new features

  • Added a Replace section to the Regular Expression dialog to make it easier to test replacement expressions
  • Various performance enhancements
  • The Errors tab no longer lists "Unknown Response" for non-200 HTTP codes, but instead includes the code description
  • Added the ability to run user defined custom tools from within the application
  • Attempting to open a recent file which no longer exists now prompts to remove the missing file from the recent files list

Bug fixes

  • Fixed a crash when crawling if a rule was created with an invalid regular expression
  • Reworked application mutex to avoid silent startup and shutdown exceptions
  • Fixed regular expression cache not being thread safe
  • Status bar wasn't correctly cleared if there was a problem populating a view which required a valid crawlmap
  • Fixed status bar messages from occasionally not appearing

Date Released: 04 December 2011Version 1.0.0.8

Please report any errors in how websites are crawled so that we can continue to improve Cyotek WebCopy. The more information you can provide, the better we can make the product.

Changes and new features

  • Product help is now available and the product is now out of beta
  • Added the ability to enable the "multi line" option in the Regular Expression editor to easier test patterns using ^ on $ on lists of URL's
  • Added a Test URL option for Forms, allowing you to test that your forms can be successfully POSTed prior to running a full crawl
  • Changed settings dialogs to use a tabbed interface
  • Holding down Shift when clicking the Copy Website or Analyze buttons forces the download of all resources, skipping last modified checks

Bug fixes

  • Fixed a large number of issues with the application services libraries and components
  • Fixed an issue where attributes of posted URL's were not correctly loaded if encountered at a later point during the crawl
  • Fixed a crash which could occur when using the title replacement options and a page had a null title
  • Fixed a crash which could occur when scanning a HTML tag containing a malformed URL
  • Fixed an issue where email addresses were stripped if they contained the # character and the "strip fragments" option was enabled

Date Released: 24 August 2011BetaVersion 1.0.0.7

Please report any errors in how websites are crawled so that we can continue to improve Cyotek WebCopy. The more information you can provide, the better we can make the product.

Note: Unfortunately help is not available in this build

Changes and new features

  • The Link Map window now remembers its size and position
  • The URI control for selecting the website to analyze is now tied to the system URI history
  • Removed the confirmation prompt when rebuilding a crawlmap from saved history information
  • The link scanner now supports the use of the base tag. If present, the URI value will be combined with links on the page.

Bug fixes

  • Fixed various problems which could occur when trying to crawl a site with malformed links containing double slashes after the domain
  • If the copy process crashes the application will continue to run after dismissal of the exception reporting dialog
  • Fixed a crash which would occur if a generated file name was the same as an existing directory name
  • Fixed several crashes which occurred if a valid content type was downloaded as an empty file
  • The list of incoming URI's for any given URI were being incorrectly populated
  • Fixed an issue where if a URI was referred to in multiple locations, after the first time it was encountered the outgoing and incoming URI links would not be updated correctly for future encounters
  • When reloading a project, the link map is no longer crawled looking for pages directly matching the root element, but all non-excluded internal URI's are formed into the map, resolving a problem where the crawl map generating from reloading a project may not match the crawl map generating from analyzing a website
  • Fixed the & character from not appearing correctly in the status bar
  • Fixed issue with application window being sent behind other top level windows when cancelling a crawl
  • Fixed tab order on main window
  • Fixed one occurrence where links were not combined correctly causing an infinite cascade (or at least until you hit the path limit for your OS). Additional causes of this bug may still be present, investigations are continuing.

Date Released: 03 July 2011BetaVersion 1.0.0.6

Bug fixes

  • If the root URL for a project included a document file name, no files were copied unless the Crawl above Root option was enabled

Date Released: 29 May 2011BetaVersion 1.0.0.5

Changes and new features

  • A new rule option has been added that can be used to prevent a rule from matching a child URI
  • If-Modified-Since header and the NotModified HTTP status code are now supported
  • Added a new option to allow the latest version of a file to be always downloaded, skipped if the If-Modified-Since checks
  • A new "Missing" tab has been added that shows URL matches in a previous scan that were not matched in the latest scan
  • Redirect processing now honors 303 and 307 response codes
  • Report lists now display tooltips
  • If a link redirects to another, the destination is now stored with the original link
  • Content length is now stored with link information, independently of if headers are stored
  • Link properties dialog now shows redirect information and content length
  • Added the ability to view the size of a website by content type

Bug fixes

  • Exception reports were using the file version instead of the product version
  • Fixed a rare XML crash when saving a project
  • Fixed a crash which would sometimes occur when editing a rule or a form
  • When downloading a file, the Last Downloaded timestamp is now stored as UTC
  • Fixed an error where the content type was not set correctly if HEAD checking was disabled
  • Fixed a problem where the local file for a URL would be continously regenerated if the "Empty Save Folder" option was not set
  • Fixed a problem where it was possible for a URL to be crawled even though pre processing had rejected the URL
  • Empty directories are no longer generated for URL's which fail pre processing, such as redirects or unsupported content types
  • Fixed a crash which would occur if the referring URL was not available
  • Fixed a crash which would occur if the "content-type" header wasn't present when pre-processing a URL
  • URI's which end with / but point to a valid text/html document no longer strip of the final segment when generating the local filename and the flatten directories option is disabled
  • Link properties dialog now correctly includes the time when a file was last downloaded
  • Buttons in the main window now correctly follow the colors of the main theme

Date Released: 08 March 2011BetaVersion 1.0.0.4

Changes and Updates

  • Meta refresh redirects are now crawled and remapped
  • Changed how redirects are handled, these will now appear in the main report lists
  • Files list now displays the content type of entries
  • Skipped list now displays the content type of entries
  • Added new Not Found and Redirect exclusion reasons, redirects and missing files will no long appear as "None" in skip lists.

Bug Fixes

  • Two URL's with the same host bar the www prefix (e.g. http://cyotek.com/ and http://www.cyotek.com/) are now treat the same when determining if a URL is external.
  • URI's were not correctly combined on pages being crawled as a result of a redirect.
  • Reloading a sitemap which contained redirects did not display a map for any content discovered after the redirect
  • No longer attempts to download content for redirected responses
  • Project's weren't always being correctly marked as changed
  • Application wouldn't start on 64bit Windows (regression from 1.0.0.3).
  • Lists are correctly cleared before an analyze or copy action (regression from 1.0.0.3).
  • When creating or opening a project, the contents of the Files tab were not being cleared (regression from 1.0.0.3).

Date Released: 21 November 2010BetaVersion 1.0.0.3

Changes and Updates

  • Substantial performance improvements have been made when loading large projects containing many links.
  • Updated to use Html Agility Pack 1.4
  • A new option to control if headers should be saved in the project file has been added. This option is disabled by default.
  • Cut, copy and paste commands are now available from the main window. However, lists and trees currently only support copy.

Bug Fixes

  • Titles and Descriptions were attempted to be obtained from all files, causing a rare crash.
  • The Accept GZip Compression option was never correctly read from the project file.
  • Toolbar visibility was not preserved between sessions

Date Released: 02 October 2010BetaVersion 1.0.0.2

Changes and Updates

  • Add-ins can now be enabled and disabled.
  • Appearance themes are now enabled.
  • The views Skipped and Files now have a context menu.
  • The Speed, Time Elapsed and Time Remaining columns have been removed as they aren't working.

Bug Fixes

  • Relative paths weren't being saved in project files correctly
  • The application wasn't correctly attached to the error handling system
  • Command line arguments are now correctly processed.
  • Filenames were not being regenerated when opening a project.
  • Completion messages now correctly warn when errors were detected during copying.
  • Fixed a problem where running on XP either didn't display disabled images or crashed.

Date Released: 17 July 2010BetaVersion 1.0.0.1

Changes and Updates

  • A new options page for controlling the local copy options has been added.
  • The project properties dialog now displays several of the common editors to provide access to properties which could not be changed in the alpha build.
  • The context menu for various lists now has an Edit Local File option.
  • Added a new option to control if extensions are remapped based on their content type.
  • Results list now shows elapsed time and estimated time of downloads.
  • 401 authentication requests are now supported, either via predefined credentials or during the crawl via a password dialog.
  • The default buffer size has been increased to a larger value, allowing for faster downloads. In addition, the buffer size is now configurable.
  • Gzip compression is now supported.
  • Deflate compression is now supported.
  • Crawling is now performed on a separate thread, resolving sluggish behaviour with the user interface. Disabled for this build
  • The Link Map Viewer now has a tab for displaying all links found. All lists in this dialog have had new columns added with more details on the links.
  • The project properties dialog now provides access to properties which could not be changed in previous builds.
  • Object model simplified, some confusing class inheritance has been removed.
  • Added the ability for additional content type handlers to be used.
  • Added the ability to specify multiple seed URI's.
  • A new configuration section has been added allowing you to store authentication credentials in a project file and to disable the password dialog when crawling.
  • Added a new viewer extensibility options allowing new tabs to be added to the interface.
  • Major refactoring of the base IApplication implementation.
  • Response headers are now stored in the link map. The Link Properties dialog now displays these headers.
  • The Link Properties dialog now displays local path information and the ability to open, open the containing folder, or edit the local file.
  • Scanning of subdomains is now supported.
  • You can now select from a common list of user agents.
  • Crawling will no longer occur above the root level by default. A new option has been added to toggle this behaviour.
  • Exclusions have been renamed to Rules to reflect their changing nature in this build and future planned enhancements.
  • When using the Add Rule context menu item from a result list, the editing dialog is now displaying allowing the entire rule to be configured.
  • The Add Rule command now includes any applicable query string in the URL for the rule.
  • A basic Regular Expression Editor is available and can be accessed via the Function button displayed next to supported fields.
  • Error text associated with a page error is now stored in the link map.
  • The page errors list will now be regenerated on loading a project with a saved link map.
  • The Link Map Viewer now displays link titles and error text.

Bug Fixes

  • Redirects were not followed for 301 or 307 status codes.
  • The error list wasn't properly recording all errors which occurred during a crawl.
  • The failure to download a file due to a non-HTTP related error should no longer crash the application.
  • The prompt to create a missing save folder now includes the folder name instead of a formatting placeholder.
  • Fixed an issue where local file names contained escaped HTML entities.
  • Fixed an issue where it was possible for local file names to contain illegal characters.
  • Analyzing a website now only downloads files supported for crawling.
  • CSS contained within comment blocks is no longer crawled.
  • Page links found in an IFrame or Frameset were not scanned.
  • Cancelling a crawl now also correctly aborts the current transfer instead of waiting for it to complete.
  • If a list was scrolled horizontally, the content menu displayed from the filter bar wasn't positioned correctly.
  • Fixed a bug where response headers were not available if the request was not an expected response code.
  • The result expression editor no longer displays results for a blank expression.
  • Duplicate keyboard accelerators have been fixed.
  • The Sorted property of a crawl map now correctly defaults to false.
  • Fixed a problem where it was possible for the CommandManager to try and load classes it had no business loading, causing error messages to be displayed on startup.
  • Fixed a problem where command interface elements were not always given a name, leading to a problem where items could not be accessed unless the full text was known.
  • The failure to load an image resource for a command interface element will no longer cause the application to fail to initialize.
  • The Add Rule and Add Form dialog's caused a crash when being used to create rather than modify items.
  • If a link to child of a page which has been matched to a rule with the DisableCrawl option is detected, the entire link will now be excluded.
  • Fixed some selection inconsistencies in rules and forms editors.
  • The Add Rule command now automatically escapes regular expression elements within the URL, such as the ? of a query string.
  • Fixed some layout problems in Windows XP.

Date Released: 15 June 2010AlphaVersion 1.0.0.0

  • Initial Release

More information

Downloads

Donations

Advertisments