Epstein Files Jan 30, 2026 Release - Archived from Justice.gov (lemmy.world)

submitted 1 month ago* (last edited 1 month ago) by xodoh74984@lemmy.world to c/datahoarder@lemmy.ml

229 comments fedilink hide all child comments

Epstein Files Jan 30, 2026

Data hoarders on reddit have been hard at work archiving the latest Epstein Files release from the U.S. Department of Justice. Below is a compilation of their work with download links.

Please seed all torrent files to distribute and preserve this data.

Ref: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Epstein Files Data Sets 1-8: INTERNET ARCHIVE LINK

Epstein Files Data Set 1 (2.47 GB): TORRENT MAGNET LINK
Epstein Files Data Set 2 (631.6 MB): TORRENT MAGNET LINK
Epstein Files Data Set 3 (599.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 4 (358.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 5: (61.5 MB) TORRENT MAGNET LINK
Epstein Files Data Set 6 (53.0 MB): TORRENT MAGNET LINK
Epstein Files Data Set 7 (98.2 MB): TORRENT MAGNET LINK
Epstein Files Data Set 8 (10.67 GB): TORRENT MAGNET LINK

Epstein Files Data Set 9 (Incomplete). Only contains 49 GB of 180 GB. Multiple reports of cutoff from DOJ server at offset 48995762176.

ORIGINAL JUSTICE DEPARTMENT LINK

TORRENT MAGNET LINK (removed due to reports of CSAM)

/u/susadmin's More Complete Data Set 9 (96.25 GB)
De-duplicated merger of (45.63 GB + 86.74 GB) versions

TORRENT MAGNET LINK (removed due to reports of CSAM)

Epstein Files Data Set 10 (78.64GB)

ORIGINAL JUSTICE DEPARTMENT LINK

TORRENT MAGNET LINK (removed due to reports of CSAM)
INTERNET ARCHIVE FOLDER (removed due to reports of CSAM)
INTERNET ARCHIVE DIRECT LINK (removed due to reports of CSAM)

Epstein Files Data Set 11 (25.55GB)

ORIGINAL JUSTICE DEPARTMENT LINK

TORRENT MAGNET LINK

SHA1: 574950c0f86765e897268834ac6ef38b370cad2a

Epstein Files Data Set 12 (114.1 MB)

ORIGINAL JUSTICE DEPARTMENT LINK

TORRENT MAGNET LINK
INTERNET ARCHIVE FOLDER LINK

SHA1: 20f804ab55687c957fd249cd0d417d5fe7438281
MD5: b1206186332bb1af021e86d68468f9fe
SHA256: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2

This list will be edited as more data becomes available, particularly with regard to Data Set 9 (EDIT: NOT ANYMORE)

EDIT [2026-02-02]: After being made aware of potential CSAM in the original Data Set 9 releases and seeing confirmation in the New York Times, I will no longer support any effort to maintain links to archives of it. There is suspicion of CSAM in Data Set 10 as well. I am removing links to both archives.

Some in this thread may be upset by this action. It is right to be distrustful of a government that has not shown signs of integrity. However, I do trust journalists who hold the government accountable.

I am abandoning this project and removing any links to content that commenters here and on reddit have suggested may contain CSAM.

Ref 1: https://www.nytimes.com/2026/02/01/us/nude-photos-epstein-files.html
Ref 2: https://www.404media.co/doj-released-unredacted-nude-images-in-epstein-files

you are viewing a single comment's thread
view the rest of the comments

[-] Arthas@lemmy.world 3 points 3 weeks ago

Epstein Files - Complete Dataset Audit Report

Generated: 2026-02-16 | Scope: Datasets 1–12 (VOL00001–VOL00012) | Total Size: ~220 GB

Background

The Epstein Files consist of 12 datasets of court-released documents, each containing PDF files identified by EFTA document IDs. These datasets were collected from links shared throughout this Lemmy thread, with Dataset 9 cross-referenced against a partial copy we had downloaded independently.

Each dataset includes OPT/DAT index files — the official Opticon load files used in e-discovery — which serve as the authoritative manifest of what each dataset should contain. This audit was compiled to:

Verify completeness — compare every dataset against its OPT index to identify missing files
Validate file integrity — confirm that all files are genuinely the file types they claim to be, not just by extension but by parsing their internal structure
Detect duplicates — identify any byte-identical files within or across datasets
Generate checksums — produce SHA256 hashes for every file to enable downstream integrity verification

Executive Summary

Metric	Value
Total Unique Files	1,380,939
Total Document IDs (OPT)	2,731,789
Missing Files	25 (Dataset 9 only)
Corrupt PDFs	3 (Dataset 9 only)
Duplicates (intra + cross-dataset)	0
Mislabeled Files	0
Overall Completeness	99.998%

Dataset Overview

                      EPSTEIN FILES - DATASET SUMMARY
  ┌─────────┬──────────┬───────────┬──────────┬─────────┬─────────┬─────────┐
  │ Dataset │  Volume  │   Files   │ Expected │ Missing │ Corrupt │  Size   │
  ├─────────┼──────────┼───────────┼──────────┼─────────┼─────────┼─────────┤
  │    1    │ VOL00001 │    3,158  │   3,158  │    0    │    0    │  2.5 GB │
  │    2    │ VOL00002 │      574  │     574  │    0    │    0    │  633 MB │
  │    3    │ VOL00003 │       67  │      67  │    0    │    0    │  600 MB │
  │    4    │ VOL00004 │      152  │     152  │    0    │    0    │  359 MB │
  │    5    │ VOL00005 │      120  │     120  │    0    │    0    │   62 MB │
  │    6    │ VOL00006 │       13  │      13  │    0    │    0    │   53 MB │
  │    7    │ VOL00007 │       17  │      17  │    0    │    0    │   98 MB │
  │    8    │ VOL00008 │   10,595  │  10,595  │    0    │    0    │   11 GB │
  │    9    │ VOL00009 │  531,282  │ 531,307  │   25    │    3    │   96 GB │
  │   10    │ VOL00010 │  503,154  │ 503,154  │    0    │    0    │   82 GB │
  │   11    │ VOL00011 │  331,655  │ 331,655  │    0    │    0    │   27 GB │
  │   12    │ VOL00012 │      152  │     152  │    0    │    0    │  120 MB │
  ├─────────┼──────────┼───────────┼──────────┼─────────┼─────────┼─────────┤
  │  TOTAL  │          │1,380,939  │1,380,964 │   25    │    3    │ ~220 GB │
  └─────────┴──────────┴───────────┴──────────┴─────────┴─────────┴─────────┘

Notes

DS1: Two identical copies found (6,316 files on disk). Byte-for-byte identical via SHA256. Table above reflects one copy (3,158). One copy is redundant.
DS2: 699 document IDs map to 574 files (multi-page PDFs)
DS3: 1,847 document IDs across 67 files (~28 pages/doc avg)
DS5: 1:1 document-to-file ratio (single-page PDFs)
DS6: Smallest dataset by file count. ~37 pages/doc avg.
DS9: Largest dataset. 25 missing from OPT index, 3 structurally corrupt.
DS10: Second largest. 950,101 document IDs across 503,154 files.
DS11: Third largest. 517,382 document IDs across 331,655 files.

Dataset 9 — Missing Files (25)

EFTA00709804    EFTA00823221    EFTA00932520
EFTA00709805    EFTA00823319    EFTA00932521
EFTA00709806    EFTA00877475    EFTA00932522
EFTA00709807    EFTA00892252    EFTA00932523
EFTA00770595    EFTA00901740    EFTA00984666
EFTA00774768    EFTA00912980    EFTA00984668
EFTA00823190    EFTA00919433    EFTA01135215
EFTA00823191    EFTA00919434    EFTA01135708
EFTA00823192

Dataset 9 — Corrupted Files (3)

File	Size	Error
`EFTA00645624.pdf`	35 KB	Missing trailer dictionary, broken xref table
`EFTA01175426.pdf`	827 KB	Invalid xref entries, no page tree (0 pages)
`EFTA01220934.pdf`	1.1 MB	Missing trailer dictionary, broken xref table

Valid %PDF- headers but cannot be rendered due to structural corruption. Likely corrupted during original document production or transfer.

File Type Verification

Two levels of verification performed on all 1,380,939 files:

Magic Byte Detection (file command) — All files contain valid %PDF- headers. 0 mislabeled.
Deep PDF Validation (pdfinfo, poppler 26.02.0) — Parsed xref tables, trailer dictionaries, and page trees. 3 structurally corrupt (Dataset 9 only).

Duplicate Analysis

Within Datasets: 0 intra-dataset hash duplicates across all 12 datasets.
Cross-Dataset: All 1,380,939 SHA256 hashes compared. 0 cross-dataset duplicates — every file is unique.
Dataset 1 Two Copies: Both copies byte-for-byte identical (SHA256 verified). One is redundant (~2.5 GB).

Integrity Verification

SHA256 checksums were generated for every file across all 12 datasets. Individual checksum files are available per dataset:

File	Hashes	Size
`dataset_1_SHA256SUMS.txt`	3,158	256 KB
`dataset_2_SHA256SUMS.txt`	574	47 KB
`dataset_3_SHA256SUMS.txt`	67	5.4 KB
`dataset_4_SHA256SUMS.txt`	152	12 KB
`dataset_5_SHA256SUMS.txt`	120	9.7 KB
`dataset_6_SHA256SUMS.txt`	13	1.1 KB
`dataset_7_SHA256SUMS.txt`	17	1.4 KB
`dataset_8_SHA256SUMS.txt`	10,595	859 KB
`dataset_9_SHA256SUMS.txt`	531,282	42 MB
`dataset_10_SHA256SUMS.txt`	503,154	40 MB
`dataset_11_SHA256SUMS.txt`	331,655	26 MB
`dataset_12_SHA256SUMS.txt`	152	12 KB

To verify any file against its checksum:

shasum -a 256 <filename>

If you'd like access to the SHA256 checksum files or can help host them, send me a DM.

Methodology

Hash Generation: SHA256 checksums via shasum -a 256 with 8-thread parallel processing
OPT Index Comparison: Each dataset's OPT load file parsed for expected file paths, compared against files on disk
Intra-Dataset Duplicate Detection: SHA256 hashes compared within each dataset
Cross-Dataset Duplicate Detection: All 1,380,939 hashes compared across all 12 datasets
File Type Verification (Level 1): Magic byte detection via file command
Deep PDF Validation (Level 2): Structure validation via pdfinfo (poppler 26.02.0) — xref tables, trailer dictionaries, page trees
Cross-Copy Comparison: Dataset 1's two copies compared via full SHA256 diff

Recommendations

Remove Dataset 1 duplicate copy — saves ~2.5 GB
Document the 25 missing Dataset 9 files — community assistance may help locate these
Preserve OPT/DAT index files — authoritative record of expected contents
Distribute SHA256SUMS.txt files — for downstream integrity verification

Report generated as part of the Epstein Files preservation and verification project.

this post was submitted on 31 Jan 2026

288 points (100.0% liked)

datahoarder

10252 readers

1 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 6 years ago

MODERATORS

archivist@lemmy.ml