I manage a few different web sites and I have one in particular that I inherited that is a real mess. Years of adding images, pdfs, web pages, etc.. to the web server and never removing any of them. The site is now populated with a lot of files that are no longer referenced by any of the web pages. But, there are probably some files that do still have references out there. I don't have time to look at each and every web page to determine which files it's safe to delete and which files need to remain for the site to not break.
I'm looking for what I would call a "Link Checker" program. But I want it to check not just links to other web pages I also want it to check links to image and other files. And lastly I would love it if it could give me a report of all the files that are NOT referenced by any currently active web page.
This seems like a simple programming project to me. If I were making the program I would use FTP to create a list of every file name and it's location on the public web folder. Then I would use HTTP to open the first (home) page of the website and scan the source of links to other pages and files. Store that info in a datafile and then open each web page link from that list and repeat until the program had opened and gleaned out every filename.
Then, the final process would be to compare the file list gathered from the initial FTP connection with the datafile listing gathered via the HTTP browse. Any files not matched up are "safe" to delete from the webserver.
Of course, the software would also have to be able to be told about subfolders that it should ignore to conserve time. If you know you don't want to have it look into private folders or program folders on the web server. And or links that lead into shopping cart programs or other such places. You should also be able to enter filename extensions that you want it to skip or ignore.
So... Does anyone know of software that will do this? I see a real need for something that will do this. I think it could be a great time saver for someone like myself who manages several company websites or who inherits sites that were previously managed by people who just allowed files to accumulate.
I do not use this actual product, I use one of their other products, InSite and am happy with it. Support is pretty good, get help directly from the ones who write it when writing to ask about features or to suggest them.