Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
BitCannon – A torrent index archiving, browsing, and backup tool (bitcannon.io)
54 points by fortytw2 on Jan 15, 2015 | hide | past | favorite | 24 comments


A torrent index that I can view thru an emacs mode would be the most useful. If the data format is just a text tile with each torrent taking one line would be amazing. It would be long, but it's just text so memory consumption is negligible. Kinda same style as Org-mode, I imagine.

You could just search like usual, open a magnet with Return, click 'c' for it's comments. When opening the file, the mode would check for seeders/leechers and display it inline.


That's an interesting idea, and I'm sure people familiar with emacs would find it very useful. BitCannon has a pretty simple API (at least I think) so feel free to tinker around and make alternative clients for it!


Hi, excellent idea but each time I try to import a file, I've got file parsing ended. New: 0 Dup:xxxxxx. In the site, there is no torrent at all. It looks like the import works but think that it is already imported? I've got a Win8.1 64 bits and I tried the 64 and 32 bits .exe. I run the latest version of MongoDB 2.6. Any idea of what I'm doing wrong? Thanks!


BitCannon Windows Guide with autostart script for the curious http://www.htpcguides.com/install-bitcannon-personal-kickass...


I would love if that was a desktop client. I don't want to mess with Mongo and Node for my own torrent index. Maybe I could persist the index on dropbox or something and point my client to it. NW.js seems like a good option for this!


I'm not using Node in the web app itself, just for dependency management and development. Once the app is built, it's just angular and regular js.

Give it a try, I think that will clear up any confusion. All you need is MongoDB and the BitCannon binary.

(ie You only need Go/Node/bower/grunt if you wish to compile it yourself)


This is an interesting concept because until reading this, it had never occurred to me that the torrent indexes are typically centralized as opposed to the decentralized nature of torrenting.


That was exactly my goal. I was thinking how easy it is for people to download the backups that a couple torrent sites provide, but hardly anybody does so because the backups are hard to use. I also hope that, in addition to encouraging people to download torrent archives, that I may also help encourage more torrent sites to provide archive downloads of their torrent database.


Would it be possible to populate it just by indexing everything tracked in DHT? There was a research paper featured here recently, but I can't find it right now. They had a way of getting a decent chunk of all the DHT torrents and did some analysis based on that data.


I think that would be a nice way to import torrents without relying on a site that could be taken down. Unfortunately, I don't know a ton about DHT and the BitTorrent protocols. I'm still trying to figure out how to scrape a UDP tracker in go to get seed / leech counts. I like this idea though!


> that I may also help encourage more torrent sites to provide archive downloads of their torrent database.

Could the underlying archive be managed with git? For distributed versioning, sha verificiation, etc?


So you have a NodeJs server for the web interface and a Martini server for the API part? Would it not have made more sense to have chosen either NodeJs or Martini for the whole project?


Initially it was separated, but I thought the same thing so I embedded the web app in the binary and Martini now serves the whole thing.


@Stephen304, what are the mechanisms for gathering index content?

Does the user manually download those from centralized sites?

How do the centralized sites get their index content?


Currently, users have to manually download archives from torrent sites that provide them (Kickass and Demonoid are 2 I know work with BitCannon)

I know this isn't ideal, so I intend to implement an auto updating function to periodically auto download and import updates. You can see the beginnings of this in the config file in the current release.


BitSnoop has a db of 24m+ torrents available, though it's (i) badly categorised and (ii) offline atm.

http://bitsnoop.com/api.html

So if I installed this, I would (for the moment) currently need to grab the dump from KAT or Demonoid or wherever, extract it, and add it to the database...? How does it deal with duplicates from the same site - or the same infohash found on different sites?


Yes, that's the current process for importing torrents. Making an automatic import / update is high on my priorities.

It uses a unique key on the infohash, so it skips duplicates at the moment. I've thought about concatenating the categories info when duplicates are found, so a torrent on kickass in Movies and Demonoid in Anime will merge to be 1 torrent with [Movies, Anime], but it isn't too high on my priorities considering I still have to add a UDP tracker scraper.


Okay, so installed and added today's KAT db in.

A few initial thoughts: 1. It took 2-3 hours to add in the whole KAT db. No problems in parsing the data from what I could tell, but a long time to do it. Maybe that's my machine but it's not ridiculously slow. I'm not going to be adding in the bitsnoop db of 24m+ torrents at this rate.

2. Pulling up 'browse' takes 45-50s just with the KAT stuff indexed and it takes that long every time I click on browse. Could you add some kind of caching to that page?

3. There's no paging. It loads a page of results and no more.

4. Search is very very fast. Would be nice to be able to specific the category when searching.

The main thing it's missing that makes other sites good is all the ratings, seed/leech data, comments, meta data. I wondered, maybe you could do a large scrape of KAT every now and again that grabs all of that and provide it to people as a kick-start to their database....? Then they only need grab updated files every day or something.


I think @Stephen304 is working on a way to pull seed/leech data from the DHT.

The main problem with pulling the rest of the data is that you land up with 150+ GBs of metadata rather easily


Thanks. I'll give this a go tomorrow and see how it looks.


This is fantastic work and I can't wait to get this running on my home server. Thank you!


You're welcome! Let me know what you think, and if you have any ideas for improving it, feel free to email me or open an issue on github!


Cool idea. Any screenshots or a demo instance of the web interface?


I've added a couple screenshots under the features section of the website to show what it basically looks like. (I've been trying to add screenshots all day but I've been in and out of class)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: