Content Scraping and a new addition

Today, I got an e-mail from some legitimate-looking site asking if they could use a web crawler to archive my blog on their site, so that people in 5/10/15/20 years can research gaming, critics, etc. 

I say “ask” but they actually only told me that they will do that… and that I can refuse if I want to. I responded that I don’t want them to do so.

Alas, I’m making a post about it as I didn’t really have a post for today.

Research is great and I support it fully but I don’t get why researchers wouldn’t be able to just check out my blog in the future as well. Sure, WordPress may not work in the future… Nah, just kidding. There may happen something to my blog or my site that will stop me from ever posting on here again… But I’m sure that my posts will persist on the world-wide-web without any issues even if I don’t want it to be. Nothing gets lost on the internet after all, right?

But the way they did this was rather ugly. They formulated everything in their e-mail so overly flowery, hiding their intention, to the point where I had to ask Frosti if he could translate it for me. At first, I was wondering if this is spam but after checking site upon site and sources, as well as reverse-image-searching for that woman that mailed me, I found out that it’s actually legitimate. Alas, I found it weird that they didn’t use language that makes it easier to understand.

Alas, I don’t really know how this won’t affect my blog’s performance and why people wouldn’t just ask me any questions in case they want to research my blog. If there was one researcher or scientist who would ask for permission to use my site, I’d allow it probably (don’t take that as permission btw, e-mail me instead). It’s a different story to just scrape off content like that, factually stealing it, and then uploading it to another public site where it’s just going to get checked out by people that won’t have to visit my site. 

My blog works in the same way that their archive works… with the simple difference that my blog and all content hosted on here is owned by me. I mean, the words I wrote and the thoughts I thought were my intellectual property, right? 

So I declined the offer. But I’m sure there is some site somewhere that is doing that already and I don’t have the resources to check every single site on the internet, I guess.

Alas, I thought I’d introduce something to my blog that a lot of other bloggers also have on their sites… the following block:

This post originated on Indiecator and was first published on there by Dan Indiecator aka MagiWasTaken.

Is it gonna do a lot? Probably not. Will it protect me from worrying that my posts are getting used somewhere else to generate money for other people? A little bit.

The big idea here is that I’ll basically just put that in all of my 261 posts so far (or at least most of them) and the many more to come… At the same time, people potentially will find that post and get lead to my site where it actually originated from. The catch is that I’ll have to add this reusable block to 261 more posts… and I’m kinda annoyed by that already… oof.

Maybe I’m being a bit sensitive about this or a bit paranoid… but I don’t want other people to earn money off of stuff that I created, especially when I don’t earn a cent in the first place and when I wouldn’t receive anything from them. I feel like that’s fair enough, right? 

What are you guys’ thoughts on this? Have you had to deal with people stealing your posts before? What have you done against that? Any other suggestions on how to make this place safer against that? Let me know!

Cheers!

3 Comments

  1. I saw your post before I saw the request itself as I got one too. I was assuming some sort of dodgy scraped combo blog type thing, but as it turns out it’s just a collection for University’s run over the web archive (aka the wayback machine).

    The reason they want to host it there is they consider blogs as ‘at risk’ content, meaning there is no garauntee from one day to the next that the content will still be available. And given the nature of the medium with bloggers vanishing and shuttering their blogs (even when free hosted on wp.com) this is a fair concern to have.

    While each to their own when it comes to decisions like this, I’m personally pleased — I was manually archiving my site on the wayback machine every so often anyway, and now I won’t have to. 😉

    Liked by 1 person

    Reply

    1. What bothered me was the way they formulated it mostly. I didn’t really understand it at first and that person’s e-mail adress seemed dodgy and before long it felt like an attempt to just “steal” my work here. For the last couple of hours, I had mixed feelings again about it and whether or not I should maybe reconsider it… and the way you formulated it certainly makes sense and I would have liked it if they had mentioned that concern, too, especially as I thought that blogs would stay around forever.

      …Especially as I thought that the Wayback Machine already did that stuff on its own :c

      Thanks for the comment, Nait!

      Like

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.