Jun 21, 2020

Static website converter?

Does anyone know a good tool that can convert a dynamic website into a static website?

The Thimbleweed Park blog was built using PHP and a MongoDB database and it's quite complex. The website won't ever change and I like to turn it into a static site so I can host the files somewhere else and can shut down the server.

Ideally it would be a tool that you just point at the url and it scrapes the site, producing a bunch of .html files.

Does such a thing exist?

Darren

Jun 21, 2020
I'd give HTTrack a try, it's worked well for me in the past.

Misel

Jun 21, 2020
What about the good old

wget --recursive --no-parent <url>

Ron Gilbert

Jun 21, 2020
wget pulls down the site, but it doesn't fix up the links so it's a navigable site in a browser.  The "files" download don't end in .html and the links that point to them don't get a .html added.

Chris Armstrong

Jun 21, 2020
Have you looked into adding a REST API to turn the existing site into a ‘headless CMS', and then using a static site generator like Gatsby (https://github.com/gatsbyjs/gatsby/) or Jekyl (https://github.com/contentful/jekyll-contentful-data-import) to generate the HTML pages?

Ron Gilbert

Jun 21, 2020
That sounds like a lot of work. I just want to get all the html files and host them statically.  If it takes me more than an hour, it's not worth my time.

Daniel Lewis

Jun 21, 2020
HTTrack

Darius

Jun 21, 2020
Maybe wget with http option "--adjust-extension" could do the trick?
http://www.gnu.org/software/wget/manual/wget.html#HTTP-Options

Maxime Priv

Jun 21, 2020
I used SiteSucker in the past for this. I think it will do what you need (if you're on a Mac). You can try the free version on their website ;)

https://ricks-apps.com/osx/sitesucker/index.html

Jon B

Jun 21, 2020
I'd second a recommendation for Gatsby, it might be a bit over an hour but it has a wonderful model for pulling in dynamic sources into structured data and then formatting it into a static output. Tons of plugins on the source side to pull from just about anything. Haven't personally used a mongo source but I see that there's a first party source plugin for it:

https://www.gatsbyjs.org/packages/gatsby-source-mongodb/

AlgoMango

Jun 21, 2020
It's already archived on archive.org (or you can request it should rescan the latest version) so you can download it from there as a complete static site easily with  wayback-machine-downloader. It's a Ruby script. Install Ruby and then "gem install wayback_machine_downloader". After it installs all you need to do is type "wayback_machine_downloader http://grumpygamer.com/" and wait for the magic to happen;) ,
Only issue might be that you really just get the front-end, no logins etc. but I've found it useful. Just takes five minutes to try so I think it's worth a try... Enjoy!
Repo:
https://github.com/hartator/wayback-machine-downloader

Brian

Jun 21, 2020
Setting aside Jekyll based hosting, this sort of practice of scrfaping is employed by the archiveTeam. They have resources up at https://github.com/dhamaniasad/WARCTools

I suspect more than one of them will convert your front-end into a warc without issue. The trick is then rendering that warc as a non-awful output. I'd say this is very much a plan B if wget doesn't adjust paths for you.

Kevin

Jun 21, 2020
Gitlab Pages & Hugo - powerful yet simple to use.

Matteo

Jun 21, 2020
As @Darren already said HTTrack should work like a charm:
https://www.httrack.com/

Björn Tantau

Jun 21, 2020
wget should work with the - - mirror option. I've used it quite often for exactly this purpose.

aasasd

Jun 21, 2020
Ron, wget absolutely does the things you want, I used it for this very purpose. The difficult part is picking the right options among the couple hundred that it has: recursive download with assets, under the specified directory—and alas I don't remember the proper options offhand. But you can be sure that leafing through the man page will get you the desired results.

aasasd

Jun 21, 2020
Specifically, some options you'll want are --page-requisites --convert-links --adjust-extension .

Steve

Jun 21, 2020
wget does a pretty good job. Also, there‘s httrack. See https://stackoverflow.com/questions/6348289/download-a-working-local-copy-of-a-webpage

Simounet

Jun 22, 2020
Hi Ron,
I think it should do the trick.
wget --no-verbose \
--mirror \
--adjust-extension \
--convert-links \
--force-directories \
--backup-converted \
--span-hosts \
--no-parent \
-e robots=off \
--restrict-file-names=windows \
--timeout=5 \
--warc-file=archive.warc \
--page-requisites \
--no-check-certificate \
--no-hsts \
--domains blog.thimbleweedpark.com \
"https://blog.thimbleweedpark.com/"

Stay safe and have a nice-not-so-grumpy day.

Dan Jones

Jun 23, 2020
It looks like your site is some sort of home-grown CMS, right?
Then you've already got all your site's posts in a database somewhere.
You should be able to write a basic PHP script to pull those entries in the database, and dump them to a bunch of HTML files.

Johnny Walker

Jun 29, 2020
You're on a Mac, right? Got Homebrew? It should be as simple as:

brew install httrack

Then

httrack "https://blog.thimbleweedpark.com/" -O "/blog.thimbleweedpark.com/" -v

(-O means the output directory -- so you can change the second part. -v mean verbose). If you need anything on another subdomain then:

httrack "https://blog.thimbleweedpark.com/" -O "/blog.thimbleweedpark.com/" "+*.thimbleweedpark.com//*" -v
Here are the rules for commenting.