275

I'm Starting A Search Engine For The Fediverse (lemmy.ca)

submitted 2 years ago by lautan@lemmy.ca to c/fediverse@lemmy.world

65 comments fedilink hide all child comments

Hey everyone,

This isn't an announcement, just wanted peoples thoughts on this.

I think everyone knows searching the fediverse can be better. Googling doesn't work too well, etc. So I wanted to do my part and help out.

Indexing all posts, etc is quite a lot to handle, so I wanted to start small and just focus on video search. I've started indexing videos from Peertube and other video websites. (Even YouTube but this could be removed to just focus on independent sites)

I know Peertube has their own search engine for videos. I will be reaching out to them. Compared to my site I'm planning it'll have other video sources and be easier to use.

So that leads to feedback from you guys.

What do you think about indexing videos posted on the fediverse and other independent platforms?
Are there similar services?
Am I just wasting my time?

you are viewing a single comment's thread
view the rest of the comments

[-] TrickDacy@lemmy.world 1 points 2 years ago

How much bandwidth do you suppose a crawler would use? I'd guess very little

[-] lautan@lemmy.ca 2 points 2 years ago

It will be very little if not downloading full html pages.

[-] TimLovesTech@badatbeing.social 2 points 2 years ago

I was thinking more in terms of resources (number of spider threads X posts/communities/users being indexed) that would be now dedicated to a bot, not so much network traffic that is probably tiny if not downloading images.

[-] TrickDacy@lemmy.world 1 points 2 years ago

Right, it would be an initial hit but if the bot was properly built it wouldn't need to do full reindexing very often. I'm no expert but I think it could be done in a way that there is no noticeable spike in traffic or anything

[-] TimLovesTech@badatbeing.social 1 points 2 years ago

That's the thing, it would need to be done in chunks and have its revisits scheduled if you want to do a complete indexing of an instance. And for a large instance that's a lot of DB thrashing if you aren't spacing that out, or just sampling like "top 10 posts" or something, but that kind of data is going to make a useless search engine depending on the goal of the search engine. If you wanted to just catalog the daily top posts of the fediverse that might work, but if you want to catalog everything it's going to take a lot of resources and a long time to make sure you're not hammering people's servers.

this post was submitted on 21 Dec 2023

275 points (100.0% liked)

Fediverse

38202 readers

86 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, Mbin, etc).

If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!

Rules

Posts must be on topic.
Be respectful of others.
Cite the sources used for graphs and other statistics.
Follow the general Lemmy.world rules.

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration)

founded 2 years ago

MODERATORS

ruud@lemmy.world

Xylinna@lemmy.world

MrCenny@lemmy.world

TragicNotCute@lemmy.world

automodbeta@lemmy.world

woelkchen@lemmy.world