LinkedIn Data Set¶
US:Consumer Data Set¶
clean-up the data:
cat * > WP-Consumer1.txt
LANG=C sed -e 's/\"//g' WP-Consumer1.txt > WP-Consumer2.txt
LANG=C cut -d, -f1,2,3,4,5,16,17,20,25 WP-Consumer2.txt > WP-Consumer.txt3.txt
rg -a -F -i -N "michael bazzell" WP-Consumer.txt3.txt
rg -a -F -i -N bob | rg -a -F -i -N smith
SpecialK Data Set¶
clean-up the data:
wget -i links.txt
#Check for RARs
cat * > _SpecialK1.txt
LANG=C sed -e 's/\"//g' _SpecialK1.txt > _SpecialK2.txt
LANG=C sort -u -f _SpecialK2.txt > _SpecialK3.txt
Usenet Archive¶
Usenet Archive I¶
clean-up the data:
```bash tab="macOS" sudo easy_install pip pip install internetarchive
or¶
brew install python pip3 install internetarchive
```bash tab="Ubuntu"
pip install internetarchive
```bash tab="Windows" https://www.python.org/ftp/python/3.7.1/python-3.7.1.exe pip install internetarchive
```bash
ia search collection:giganews –itemlist > gig.txt
ia download –itemlist gig.txt –glob="*.csv.gz"
gunzip -r .
Find . -type f -name \*.csv -print0 | xargs -0 cut -f3,4 > _Usenet1.txt
sort -u -f _Usenet1.txt > _Usenet2.txt
sed -e 's/[\"]//g' _Usenet2.txt > _Usenet3.txt
Usenet Archive II¶
clean-up the data:
```bash tab="macOS" sudo easy_install pip pip install internetarchive
or¶
brew install python pip3 install internetarchive
```bash tab="Ubuntu"
pip install internetarchive
```bash tab="Windows" https://www.python.org/ftp/python/3.7.1/python-3.7.1.exe pip install internetarchive
```bash
ia search collection:usenethistorical –itemlist > list.txt
ia download –itemlist list.txt –glob="*.csv.gz"
find . -name "*.zip" -exec unzip {} \:
rg -a -F -i -N "From: " > 1.txt
sed -e 's/\.mbox\:From\://g' 1.txt > 2.txt
sed -e 's/[\"]//g' 2.txt > 3.txt
Elastic Search Databases¶
https://www.engadget.com/2019/04/29/database-exposes-80-million-us-households/
https://www.vpnmentor.com/blog/report-millions-homes-exposed/
Shodan: product:elastic port:9200 users
http://0.0.0.0:9200/
http://0.0.0.0:9200/_cat/indices?v
http://0.0.0.0:9200/users/_search?size=100
http://0.0.0.0:9200/users/_search?size=1000
Shodan: product:elastic port:9200 leads
http://0.0.0.0:9200/leads/_search?size=100
YouTube Comments / Channel Names¶
Software¶
ripgrep¶
```bash tab="macOS" brew install ripgrep
```bash tab="Ubuntu"
sudo apt-get install -y ripgrep
```bash tab="Windows" choco install ripgrep
or¶
scoop install ripgrep
or¶
Visit https://github.com/BurntSushi/ripgrep/releases¶
and download ripgrep-{version}-{processor}-pc-windows-msvc.zip¶
e.g. https://github.com/BurntSushi/ripgrep/releases/download/11.0.2/ripgrep-11.0.2-x86_64-pc-windows-msvc.zip¶
Recommended to search through big data sets fast:
```bash
rg -a -F -i -N yoursearchterm filename
Download multiple files¶
Copy Selected Links - Firefox Extension
Windows GNU/coreutils Utils¶
macOS wget¶
Install Homebrew & wget
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install wget