I've been hearing about usenet for so long, and despite how I like to think I'm tech savvy, I was still deeply confused about what the concept was.

So I decided to learn what the deal was, and I think I have some rough idea now. My questions more or less concern indexers in particular. The first thing I thought about was that if headers contained the title and other easily identifiable information there shouldn't be much thought to choosing an indexer.

However, people have pointed out the obvious that files with unobfuscated headers will get immediately taken down through DMCA requests. But now I'm not sure how do indexers accomplish their job, is it a manual process more or less? Do people upload files with obfuscated headers and then map the true headers in specific indexers?

Either way, the concept is so unique and damn interesting. Usenet seems to be a very good compromise between decentralization and reliability/speed

  • If we told you we would have to hug you. Sorry

  • Short answer: in the context of filename obfuscation, indexers do not do any indexing
    When making and posting a filename-obfuscated post, the uploader makes his own NZB and shares the NZB with one or more indexers

    Indexing:
    Find the yEncode specification. It's a very short text file. Binary files are split into segments smaller than 1m bytes, because Usenet has always been a text messaging platform, and 1m bytes is a common maximum for message size. Each segment is yEncoded, and a Usenet message is created - the yEncoded binary chunk is the message body, and the body is appended to a Usenet message header. The yEncode spec defines a format for the Subject line to contain the filename. All the thousands of segments (as Usenet articles, or messages) are posted to the same newsgroup

    Indexing reads the article headers from one or more chosen newsgroups, scans the headers for filenames, and builds a NZB using the article headers' Message-IDs
    Use the Web UI at binsearch.info to view in an arbitrary newsgroup a list of articles matching a specific filename
    Binsearch can dynamically create a NZB from the articles you select from its Web list

    Indexing is trivial. Anybody can choose a newsgroup and index the binary files it contains within its articles. That's a problem, because copyright trolls use indexing to collate a list of articles, and send that list of Message-IDs to Usenet providers in the form of a takedown demand

    The first thing I thought about was that if headers contained the title and other easily identifiable information there shouldn't be much thought to choosing an indexer

    In the indexing context, the user's choice of indexer depends on whether all indexers are indexing the same groups. Or if Binsearch covers the groups you need, use it as your indexer

    In the obfuscation context, join the indexer which has properly obfuscated posts which contain the files you want to download

    Obfuscation:
    Over several years, uploaders and a couple of indexers collaborated to develop a filename obfuscation method which can resist copyright takedown demands. Obviously, this means removing the filename from the Subject field of the message header. After about 4 iterations they settled on the method used by many uploaders today

    No filename, no more indexing by filename matching

    An uploader knows all the Message-IDs of all the articles he posts. He has all the information necessary to make a NZB file for his own uploads. If an uploader creates his own NZB, he can send it to his favorite indexers

    If you're a member of that indexer, you can download that NZB. If you're not a member, you miss out - unless someone who is a member shares the NZB outside the indexer

    Minor details:
    The PAR2 files are created using the original filenames
    The PAR2 mechanism includes a table of hashes (magic cryptography) which are used for verification and repair
    The downloader app (or the downloader using his favorite PAR2 app) invokes a PAR2 repair on the downloaded filename-obfuscated files. The PAR2 software hashes the files, matches the hashes stored in the PAR2, and uses the original filenames stored in the PAR2 to rename the file to original

    There is no direct lookup from obfuscated filename to original filename. If you read someone claiming indexers are able to deobfuscate, do not believe that

    In the best obfuscation method (some inferior methods are still in use), the articles are spread arbitrarily across 6 to 10 different newsgroups. The idea that articles are contained inside newsgroups is an illusion. An article has a globally unique Message-ID. It is not necessary to tell the server which newsgroup the article is in. The Message-ID is all that's needed. Of course, by not being in a single newsgroup, the articles can't be indexed

    Because they embed the original filenames, the PAR2 files are posted to different newsgroups

  • Back in the 1990s, this was all done manually, one would hope someone has figured out how to automate it by now. At the time, some sites even compensated users with additional access time for manually sifting through posts and piecing together posts and par files for nzbs.

  • The nzb file contains this information. That is partly why you download the nzb from the indexer.

    Then the question becomes how does the indexer know what to put in the nzb, if the headers are obfuscated?

    the people of the indexer are creating the NZB files, they dont scrap newsgroups as everything is obfuscated. if you dont have the correct nzb file to a post, you wont be able to know whats inside.

    They still scrap some. Not everything is obfuscated.

    The indexer doesn't build the NZB because it doesn't know what to put in the NZB
    The uploader creates the NZB while processing the upload. The uploader then shares his NZB with one or more indexers

    [removed]

    This has been removed.

    No AI-generated content or repetitive posts. Do not use tools like ChatGPT, Claude, Gemini, Copilot, or similar AI software to create posts or comments. If we identify AI-generated content, it will be removed without warning. Posts that repeat information already covered in the subreddit wiki, FAQ, or pinned threads may also be removed. Contribute original, thoughtful content. Repeated violations may result in bans or other moderator action

    Oh sorry I guess I didn’t read the question properly.

    The indexer can detect original filenames in the par files for example. They can also match file sizes against known scene releases, parse nfo files, etc. As eluded in the op some stuff can also be manually edited by staff.