LinkedIn Data Set ¶

US:Consumer Data Set ¶

clean-up the data:

cat * > WP-Consumer1.txt
LANG=C sed -e 's/\"//g' WP-Consumer1.txt > WP-Consumer2.txt
LANG=C cut -d, -f1,2,3,4,5,16,17,20,25 WP-Consumer2.txt > WP-Consumer.txt3.txt
rg -a -F -i -N "michael bazzell" WP-Consumer.txt3.txt
rg -a -F -i -N bob | rg -a -F -i -N smith

SpecialK Data Set ¶

clean-up the data:

wget -i links.txt
#Check for RARs
cat * > _SpecialK1.txt
LANG=C sed -e 's/\"//g' _SpecialK1.txt > _SpecialK2.txt
LANG=C sort -u -f _SpecialK2.txt > _SpecialK3.txt

Usenet Archive¶

Usenet Archive I ¶

clean-up the data:

```bash tab="macOS" sudo easy_install pip pip install internetarchive

or¶

brew install python pip3 install internetarchive

```bash tab="Ubuntu"
pip install internetarchive

```bash tab="Windows" https://www.python.org/ftp/python/3.7.1/python-3.7.1.exe pip install internetarchive

```bash
ia search collection:giganews –itemlist > gig.txt
ia download –itemlist gig.txt –glob="*.csv.gz"
gunzip -r .
Find . -type f -name \*.csv -print0 | xargs -0 cut -f3,4 > _Usenet1.txt
sort -u -f _Usenet1.txt > _Usenet2.txt
sed -e 's/[\"]//g' _Usenet2.txt > _Usenet3.txt

Usenet Archive II ¶

clean-up the data:

```bash tab="macOS" sudo easy_install pip pip install internetarchive

or¶

brew install python pip3 install internetarchive

```bash tab="Ubuntu"
pip install internetarchive

```bash tab="Windows" https://www.python.org/ftp/python/3.7.1/python-3.7.1.exe pip install internetarchive

```bash
ia search collection:usenethistorical –itemlist > list.txt
ia download –itemlist list.txt –glob="*.csv.gz"
find . -name "*.zip" -exec unzip {} \:
rg -a -F -i -N "From: " > 1.txt
sed -e 's/\.mbox\:From\://g' 1.txt > 2.txt
sed -e 's/[\"]//g' 2.txt > 3.txt

Elastic Search Databases¶

shodan.io

https://www.engadget.com/2019/04/29/database-exposes-80-million-us-households/
https://www.vpnmentor.com/blog/report-millions-homes-exposed/
Shodan: product:elastic port:9200 users
http://0.0.0.0:9200/
http://0.0.0.0:9200/_cat/indices?v
http://0.0.0.0:9200/users/_search?size=100
http://0.0.0.0:9200/users/_search?size=1000
Shodan: product:elastic port:9200 leads
http://0.0.0.0:9200/leads/_search?size=100

YouTube Comments / Channel Names ¶

Software¶

ripgrep ¶

```bash tab="macOS" brew install ripgrep

```bash tab="Ubuntu"
sudo apt-get install -y ripgrep

```bash tab="Windows" choco install ripgrep

or¶

scoop install ripgrep

or¶

Visit https://github.com/BurntSushi/ripgrep/releases¶

and download ripgrep-{version}-{processor}-pc-windows-msvc.zip¶

e.g. https://github.com/BurntSushi/ripgrep/releases/download/11.0.2/ripgrep-11.0.2-x86_64-pc-windows-msvc.zip¶

Recommended to search through big data sets fast:

```bash
rg -a -F -i -N yoursearchterm filename

Download multiple files¶

Copy Selected Links - Firefox Extension

Windows GNU/coreutils Utils¶

Core Utils for Windows

GNU Utils for Windows

macOS wget¶

Install Homebrew & wget

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install wget

LinkedIn Data Set¶

US:Consumer Data Set¶

SpecialK Data Set¶

Usenet Archive¶

Usenet Archive I¶

or¶

Usenet Archive II¶

or¶

Elastic Search Databases¶

YouTube Comments / Channel Names¶

Software¶

ripgrep¶

or¶

or¶

Visit https://github.com/BurntSushi/ripgrep/releases¶

and download ripgrep-{version}-{processor}-pc-windows-msvc.zip¶

e.g. https://github.com/BurntSushi/ripgrep/releases/download/11.0.2/ripgrep-11.0.2-x86_64-pc-windows-msvc.zip¶

Download multiple files¶

Windows GNU/coreutils Utils¶

macOS wget¶

LinkedIn Data Set ¶

US:Consumer Data Set ¶

SpecialK Data Set ¶

Usenet Archive I ¶

Usenet Archive II ¶

YouTube Comments / Channel Names ¶

ripgrep ¶