[-] CapableStaircase@lemmy.zip 3 points 3 weeks ago

At the moment this is the source I’ve been relying on. Not an exact file list per se but a comprehensive list of dataset magnets:

https://github.com/yung-megafone/Epstein-Files

[-] CapableStaircase@lemmy.zip 4 points 4 weeks ago

I took the same list provided by this post and added a few more extensions to the search. In doing so I was able to successfully download 2327/2542 NATIVE files. I performed this search by making HEAD requests for each URL before trying to download them with a GET request. This search method resulted in me finding an additional 3 files that gave Content-Type and Content-Length in the HEAD response but ultimately "disappeared" and gave a 404 when performing a GET response.

NOTE:

  • All MS office files (.doc(x), .xls(x), .ppt(x)) are exactly ZERO bytes long.

  • There are two sqlite .db files which are password protected and I have not yet tried to crack.

  • Lots of jail footage

  • I think very small .avi videos which many sequential Bates numbers are actually single frames that need to be recombined into the original video. I have not done so.

Extensions I tried:

dataset10:

avi, mp4, mov, mp3, wav, m4a, m4v, wmv, ts, vob, 3gp, amr, opus, csv, xlsx, xls, docx, doc, pluginpayloadattachment

common-audio:

m4a, mp3, wav, aac, flac, ogg, wma, aiff, opus, m4b

common-video:

mp4, mov, avi, wmv, mkv, webm, m4v, mpg, mpeg, 3gp

uncommon-audio:

ac3, amr, mka, au, ra, mid, aif, dts, caf, gsm, ape, wv, spx, mpc, snd, voc, tta, tak, dsf, dff

uncommon-video:

flv, vob, ts, ogv, m2ts, mts, asf, 3g2, f4v, divx, rm, rmvb, m2v, dv, xvid, swf, m4s, hevc, h264, h265

rare-audio:

8svx, amb, au, avr, cda, cvs, cvsd, cvu, dss, dvms, fap, fssd, gsrt, hcom, htk, ima, ircam, maud, nist, paf, prc, pvf, sd2, sds, sf, smp, sou, txw, vms, w64, wve, xa, aifc, al, ul, la, sb, sw, ub, uw

rare-video:

264, 265, 302, 3p2, 787, 890, aec, aep, aepx, ajp, ale, am, amc, amv, arcut, arf, avb, avc, avd, avp, avs, awlive, axm, bdm, bdmv, bik, bix, bmk, bnp, box, bs4, bsf, bu, camproj, camrec, ced, cine, cip, clpi, cmmp, cmmtpl, cmproj, cmrec, cpi, cst, cx3, d2v, d3v, dash, dat, dce, dck, dcr, dcr, ddat, dif, dir, dlx, dmb, dmsd, dmsd3d, dmsm, dmsm3d, dmss, dnc, dpa, dpg, dream, dsy, dv4, dvdmedia, dvr, dvr-ms, dvx, dxr, dzm, dzp, dzt, edl, evo, eye, f4p, fbr, fbz, fcp

documents:

pdf, doc, docx, txt, rtf, odt, xls, xlsx, csv, ppt, pptx, odp, html, htm, xml, json, md, tex, epub, mobi

images:

jpg, jpeg, png, gif, bmp, tiff, tif, webp, svg, ico, raw, cr2, nef, orf, sr2, psd, ai, eps, heic, heif

archives:

zip, rar, 7z, tar, gz, bz2, xz, iso, dmg, cab, lz, lzma, zst, lz4, sz, z, tgz, tbz2, txz, tlz, tar.gz, tar.bz2, tar.xz, tar.zst, tar.lz, tar.lzma, tar.lz4, tar.z, [tar.sz](http://tar.sz/)

epstein:

apmaster, apversion, attr, bmp, bup, dat, data, db, db-journal, doc, ds\_store, f catalog, f\_catalog, ifo, images #1, images #2, iphoto, ivc, mpg, NULL, pdf, pps, ps, psb, psd, raf, tif, tiff, tropez, txt, xml

Torrent file: https://archive.org/details/data-set-9-native.tar.xz

NOTE: See INFO folder for more information.

[-] CapableStaircase@lemmy.zip 1 points 4 weeks ago

For anyone watching this post, I just dropped an update on that issue. Will be posting a new magnet link for the 84GB I was able to download soon.

[-] CapableStaircase@lemmy.zip 1 points 1 month ago

Can you also check and see if dataset 8/10/11 have all the native files they should based on the presence of these placeholders?

[-] CapableStaircase@lemmy.zip 1 points 1 month ago

I found this in a random doc today. I’ll add it to your list and give it a shot tonight. It’ll be slow going so I don’t get rate limited again. I think if you hit too many 404’s in a row the CDN locks you out for a bit.

[-] CapableStaircase@lemmy.zip 2 points 1 month ago

I could only grab ~44 of the NATIVEs you’ve listed and they total up to a tiny portion of the expected 80GB remaining. The hard part is guessing what file extension these files will have without getting rate limited by DOJ. I was hoping to get a copy of the zip file’s EOCD but it’s still down.

If anyone ever sees that zip come back please try and download the last 150-200MB. That’s where the zip archive’s table of contents is gonna live.

[-] CapableStaircase@lemmy.zip 4 points 1 month ago

You rock. I didn’t realize NATIVEs had a placeholder PDF. I’ll try and scrape the media files tonight to add to the existing dataset 9 more complete archive.

[-] CapableStaircase@lemmy.zip 2 points 1 month ago

What’s your method for getting the zip file without being cut off by the CDN?

[-] CapableStaircase@lemmy.zip 1 points 1 month ago

Hi, OG 101GB dataset uploaded here. The DAT/OPT files are exactly what I used to fetch the files for this dataset.

I want to go through the other partial dataset 9 zips and check for deltas in the contents of the DAT/OPT files but haven’t had the time yet.

CapableStaircase

joined 1 month ago