I'd give HTTrack a try, it's worked well for me in the past.
Static website converter?
Jun 21, 2020
Does anyone know a good tool that can convert a dynamic website into a static website?
The Thimbleweed Park blog was built using PHP and a MongoDB database and it's quite complex. The website won't ever change and I like to turn it into a static site so I can host the files somewhere else and can shut down the server.
Ideally it would be a tool that you just point at the url and it scrapes the site, producing a bunch of .html files.
Does such a thing exist?
Darren
Jun 21, 2020
Misel
Jun 21, 2020
What about the good old
wget --recursive --no-parent <url>
wget --recursive --no-parent <url>

Ron Gilbert
Jun 21, 2020
wget pulls down the site, but it doesn't fix up the links so it's a navigable site in a browser. The "files" download don't end in .html and the links that point to them don't get a .html added.
Chris Armstrong
Jun 21, 2020
Have you looked into adding a REST API to turn the existing site into a ‘headless CMS', and then using a static site generator like Gatsby (https://github.com/gatsbyjs/gatsby/) or Jekyl (https://github.com/contentful/jekyll-contentful-data-import) to generate the HTML pages?

Ron Gilbert
Jun 21, 2020
That sounds like a lot of work. I just want to get all the html files and host them statically. If it takes me more than an hour, it's not worth my time.
Daniel Lewis
Jun 21, 2020
HTTrack
Darius
Jun 21, 2020
Maybe wget with http option "--adjust-extension" could do the trick?
http://www.gnu.org/software/wget/manual/wget.html#HTTP-Options
http://www.gnu.org/software/wget/manual/wget.html#HTTP-Options
Maxime Priv
Jun 21, 2020
I used SiteSucker in the past for this. I think it will do what you need (if you're on a Mac). You can try the free version on their website ;)
https://ricks-apps.com/osx/sitesucker/index.html
https://ricks-apps.com/osx/sitesucker/index.html
Jon B
Jun 21, 2020
I'd second a recommendation for Gatsby, it might be a bit over an hour but it has a wonderful model for pulling in dynamic sources into structured data and then formatting it into a static output. Tons of plugins on the source side to pull from just about anything. Haven't personally used a mongo source but I see that there's a first party source plugin for it:
https://www.gatsbyjs.org/packages/gatsby-source-mongodb/
https://www.gatsbyjs.org/packages/gatsby-source-mongodb/
AlgoMango
Jun 22, 2020
It's already archived on archive.org (or you can request it should rescan the latest version) so you can download it from there as a complete static site easily with wayback-machine-downloader. It's a Ruby script. Install Ruby and then "gem install wayback_machine_downloader". After it installs all you need to do is type "wayback_machine_downloader http://grumpygamer.com/" and wait for the magic to happen;) ,
Only issue might be that you really just get the front-end, no logins etc. but I've found it useful. Just takes five minutes to try so I think it's worth a try... Enjoy!
Repo:
https://github.com/hartator/wayback-machine-downloader
Only issue might be that you really just get the front-end, no logins etc. but I've found it useful. Just takes five minutes to try so I think it's worth a try... Enjoy!
Repo:
https://github.com/hartator/wayback-machine-downloader
Brian
Jun 22, 2020
Setting aside Jekyll based hosting, this sort of practice of scrfaping is employed by the archiveTeam. They have resources up at https://github.com/dhamaniasad/WARCTools
I suspect more than one of them will convert your front-end into a warc without issue. The trick is then rendering that warc as a non-awful output. I'd say this is very much a plan B if wget doesn't adjust paths for you.
I suspect more than one of them will convert your front-end into a warc without issue. The trick is then rendering that warc as a non-awful output. I'd say this is very much a plan B if wget doesn't adjust paths for you.
Kevin
Jun 22, 2020
Gitlab Pages & Hugo - powerful yet simple to use.
Matteo
Jun 22, 2020
Björn Tantau
Jun 22, 2020
wget should work with the - - mirror option. I've used it quite often for exactly this purpose.
aasasd
Jun 22, 2020
Ron, wget absolutely does the things you want, I used it for this very purpose. The difficult part is picking the right options among the couple hundred that it has: recursive download with assets, under the specified directory—and alas I don't remember the proper options offhand. But you can be sure that leafing through the man page will get you the desired results.
aasasd
Jun 22, 2020
Specifically, some options you'll want are --page-requisites --convert-links --adjust-extension .
Steve
Jun 22, 2020
wget does a pretty good job. Also, there‘s httrack. See https://stackoverflow.com/questions/6348289/download-a-working-local-copy-of-a-webpage
Simounet
Jun 22, 2020
Hi Ron,
I think it should do the trick.
wget --no-verbose \
--mirror \
--adjust-extension \
--convert-links \
--force-directories \
--backup-converted \
--span-hosts \
--no-parent \
-e robots=off \
--restrict-file-names=windows \
--timeout=5 \
--warc-file=archive.warc \
--page-requisites \
--no-check-certificate \
--no-hsts \
--domains blog.thimbleweedpark.com \
"https://blog.thimbleweedpark.com/"
Stay safe and have a nice-not-so-grumpy day.
I think it should do the trick.
wget --no-verbose \
--mirror \
--adjust-extension \
--convert-links \
--force-directories \
--backup-converted \
--span-hosts \
--no-parent \
-e robots=off \
--restrict-file-names=windows \
--timeout=5 \
--warc-file=archive.warc \
--page-requisites \
--no-check-certificate \
--no-hsts \
--domains blog.thimbleweedpark.com \
"https://blog.thimbleweedpark.com/"
Stay safe and have a nice-not-so-grumpy day.
Dan Jones
Jun 23, 2020
It looks like your site is some sort of home-grown CMS, right?
Then you've already got all your site's posts in a database somewhere.
You should be able to write a basic PHP script to pull those entries in the database, and dump them to a bunch of HTML files.
Then you've already got all your site's posts in a database somewhere.
You should be able to write a basic PHP script to pull those entries in the database, and dump them to a bunch of HTML files.
Johnny Walker
Jun 29, 2020
You're on a Mac, right? Got Homebrew? It should be as simple as:
brew install httrack
Then
httrack "https://blog.thimbleweedpark.com/" -O "/blog.thimbleweedpark.com/" -v
(-O means the output directory -- so you can change the second part. -v mean verbose). If you need anything on another subdomain then:
httrack "https://blog.thimbleweedpark.com/" -O "/blog.thimbleweedpark.com/" "+*.thimbleweedpark.com//*" -v
brew install httrack
Then
httrack "https://blog.thimbleweedpark.com/" -O "/blog.thimbleweedpark.com/" -v
(-O means the output directory -- so you can change the second part. -v mean verbose). If you need anything on another subdomain then:
httrack "https://blog.thimbleweedpark.com/" -O "/blog.thimbleweedpark.com/" "+*.thimbleweedpark.com//*" -v
David Choy
Jul 16, 2020
Curious; what did you pick? I manage lots of websites and have used httrack in the past, but it doesn't always work.
Rez
Aug 11, 2020
You can use Hugo
Ingo Rogerson
Aug 13, 2021
I really love your games.
Ingo Rogerson
Aug 13, 2021
Already read "On Stranger Tides" and loved it. Saw it on one of your posts. Thanks for the recommendation.