wget spider

wget --mirror --convert-links --page-requisites --adjust-extension <site-url>

WARC

If I'm crawling a site for an offline backup I probably need a zim file. Wget can't output a zim file but it can output a warc file which can be converted to a zim file using warc2zim

Cross domain

Netcat Webserver

while true; do echo -e "HTTP/1.1 200 OK\n\n" $(cat index.html) | nc -lp 8080 -q 1; done

Strip everything except ascii from Text files

A really quick way of doing this is to use strings. However this will strip whitespace characters.

If you need whitespace characters intact you can use this:

cat <file> | tr -d "[:cntrl:]" | iconv -c -f utf-8 -t ascii -

Python One-Liners

Takes a list of urls and formats them in markdown python -c "import sys; from urllib.parse import urlsplit; print(''.join([f\"[{urlsplit(line.strip()).path.lstrip('/')}]({line.strip()})\n\" for line in sys.stdin]))"