Search

Search My Garden Blog with Google Custom Search

19.11.07

Content Scrapers and My Blog's Feed

I've been publishing my blog with a full feed to make it easier for people who have found this blog interesting and subscribed to my feed to be notified of new posts. Recently I noticed that a site was scraping my feed in whole and publishing it to their site.



This site that is scraping my feed on the surface looks like another social bookmarking website but after some digging around it looks to be nothing more than way to scrape content from sites and feeds and re-publish that content onto their site in the hopes that people will click on their ads.

The unfortunate thing about all this is that for the time being I'll be taking advantage of the FeedBurner settings that cut down the amount of content you'll see in the feed and if an entry is interesting to you you'll have to visit the blog to read it all. I don't plan to make this a permanent thing-just long enough to allow me to try to do something about my content being scraped from my feed. I don't mind if someone takes a snippet of my blog and uses it to expand on something or as an example as long as it is done under the scope of my Creative Commons license. I believe in the free exchange of information on the internet and I believe in share and share alike and because of that when someone working on a website for a school contacted me about using a photo of mine I granted them permission. I didn't even ask to be paid or anything because I often use the information that schools and universities publish on the web about plants.

The site where I've found my feed scraped onto is www.ppnow.net or www.ppnow.com and like I said it looks like a run of the mill social bookmarking website until you notice that the profiles look made up and sometimes the girls have pictures of boys and apparently none of the "members" there are social enough to fill out a profile. I looked around that website for contact information because who doesn't have a website and a contact form on the net these days? Anyway I couldn't find it and the only thing I found was an email: god@ppnow.net and when I sent an e-mail I got no response. Today I found my latest blog entry about storing sweet potato vine tubers on the same website published in whole. So again I went to the faq of the site and notice that the e-mail god@ppnow.net had been removed and replaced with ppnowcom@gmail.com a generic e-mail account. I wish Google made it as easy to report scrapers as they do people who sell link on their websites.

Long story made even longer the people who have been nice enough to subscribe to my feed will still get updates just not entire posts (for a while) and people who visit the my garden blog directly won't see a difference. If you're curious if your blog is being scraped or your content is being used on another website you can you can use copyscape to scan for the net and see if your blog or website's content is being republished.

19 comments:

  1. Anonymous8:27 PM

    Your Temporary Companion piece was linked to by someone else and all i found was the article surrounded by ads. I reported it to Google and later that day the site was gone when I checked again.

    I don't know about other ad services but with Google you can click on "ads by Google" and a page will open that you can use to report a violation.

    I know everytime I go to a site now and they are either stealing content or playing funny with ad placement trying to trick you into clicking - I report them.

    One reason there are so many parasites out there is because no one says anything - but if people started registering complaints maybe at least Google would start policing their "Publishers" a bit better

    ReplyDelete
  2. Hey John,

    I appreciate that...I happen to keep an eye on my blog and from time to time I come across places scraping blogs I know. When I come across that I'll pass the info on to blogger and let them know what their options are.

    I tried to report the site in question here but all I got was an e-mail from Google informing me of how to file a DMCA complaint.

    And I really wish there was a better way for Google to police the people that are part of the program because when these parasites are part of the program they bring us all down.

    ReplyDelete
  3. Anonymous10:00 PM

    It's too bad Google gives a small publisher the run around. Easy enought to check your post then look at the offending site, you'd have to be blind not to see the violation.

    But if consumers started to flood the Google system with reports I'm pretty sure they would start to clean up. Adding something like "I refuse to click any ad anywhere until I see this site removed from your ad program" might get their attention.

    Until people get involved I guess we're stuck with the parasites. I can't say it enough - ordinary people need to get involved and help clean up the scamers out there. I'm tired of searching for something only to end up at some worthless site shoving ads in my face.

    I know it's not garden related but you should post some of the ad-sense help forum threads - I think people would get a kick seeing some of them.

    ReplyDelete
  4. I had the same thing happen to me, someone was stealing all my content via the feed, so now I only use a shortened feed and I don't plan to switch back. It is too risky.

    Fortunately for me, I was able to determine what blogging service the thief was using, and I found a legitimate email address to send my complaint to. After providing some proof, the site was removed.

    It's one of the "hazards" of the Internet, and we have to remain vigilant for ourselves and for others. Like your commenter, John, if something doesn't seem right, we should try to report it to someone who can do something about it!

    Carol, May Dreams Gardens

    ReplyDelete
  5. Ummmm ... what is a feed ?

    ReplyDelete
  6. I'm sure glad I happened upon your blog to read this important message. I had never heard of content scrapers before -- it's pure plagiarism isn't it? Great site, btw!

    Diane at Sand to Glass
    and Dogs Naturally

    ReplyDelete
  7. I never even thought about this happening! Not that I suffer from the delusion that anyone wants to steal my content, but you never know ... Thanks for the heads-up!

    ReplyDelete
  8. Well it looks like after publishing this the website in question has decided to remove my entries so maybe I won't have to truncate my feed.

    Carol,
    Thanks for sharing your experience.

    Ohiomom,
    Wiki has a good explanation of a feed
    Web Feed. It's a constantly updated stream of your blog or website. You can use it to subscribe to blog or site and be notified of the latest posts or updates. Your blog has one in the url bar click the orange square and you'll see your feed and the options for subscribing to it. You can place it on your yahoo or MSN or Google Homepage and it is updated when you post a new entry on your blog.

    Diane,
    Thanks for stopping by and commenting. People can call it all they want but it is simply plagiarism.

    The Green Hornet,
    It can happen to anyone and it isn't really about having a great site or blog. People who do this don't often care about the "importance" or "popularity" of a site/blog they only thing they want is content to fill their site with.

    I'll post a new post on the subject with some of the information I've learned with this experience.

    ReplyDelete
  9. mrbrownthumb,

    Thanks for the tip on registering the domain ... yet another thing I hadn't thought of! Glad your content is safe, at least momentarily, from the scrapers. Today's my first time reading your blog but I'll be back.

    GP

    ReplyDelete
  10. Hi,
    After our recent exchange of info on this subject, I changed my feed from full to partial. About that time, Technorati quit indexing and updating my posts. When I wrote them, they said it was because of conflicting RSS feeds, and gave me some technical instructions to correct the problem. Nothing worked, so since it started after I changed from full to partial, I tried changing it back. Sure enough, Technorati started updating and indexing again. So now I have a dilemma--let the scrapers scrape or Technorati not index and update.
    I tried the Google Ad route, to no avail. I just wrote another scraper this Sunday, but I see my posts still on that site. It is very disheartening.
    Aiyana

    ReplyDelete
  11. Aiyana,

    John pointed out that he found some place where my posts where and he reported it and said that it was gone after. Maybe if a third party reports it gets faster results? E-mail me the current links and I'll report it also.

    ReplyDelete
  12. The site is http://www.itswen.com/. She doesn't really have a blog, just reposts others' stuff. Her site uses Google Ads. If people have a regular blog and want to link, that's one thing, but these folks who don't really write anything and just copy and link to other's stuff without permission pisses me off. At least it's a link, unlike some who just copy and claim it as their own work. I notice she deleted my comment to her about reposting--but left the post and link.
    Aiyana

    ReplyDelete
  13. Aiyana,

    I used the link at the bottom of the ads to leave feedback and I reported the site. If anything maybe they'll lose their adsense account.

    I'm sure you already tried a whois lookup of and came across the privacyprotect.org. I see they have the option of requestion the information that is private in there especially if the domain owner is breaking the law. You can probably get the person's registrar and webhost info through that and report it also.

    ReplyDelete
  14. Anonymous6:50 PM

    Aiyana,

    MBT's report must have worked
    - no ads on the site.

    Cheers

    ReplyDelete
  15. John,

    They're still there you have to click on the titles to see them.

    Aiyana,

    That site is just a blog (wordpress) with a plug-in that pulls in feeds and reposts them.

    ReplyDelete
  16. Hi,
    Thanks for the help on this. I appreciate it. When I was doing another search for my reposts, I ran across some of your posts regarding cactus on a UK site. I'll see if I can find it again.
    Aiyana

    ReplyDelete
  17. Aiyana,

    I'd appreciate that because I can't find them with copyscape. If you were signed into your Google account at the time it is probably under your webhistory and if you can narrow it down to the date/week you can probably find it easier than doing searches.

    BTW

    I did some Googling and noticed that what your scraper is doing is creating back links. Click on your post titles and see if her website shows up under "links to this post" and delete the links she has made to your posts. If you haven't already.

    ReplyDelete
  18. Hi all,

    Just wondering if anyone has put the plagiarism warning banner from copyscape on their blogs?

    ReplyDelete
  19. Dee,

    I haven't but I've seen it on several garden blogs lately but I can't for the life of me name any right now.

    ReplyDelete

Hi!

Feel free to leave a comment. You can always use the search box for my blog or the search "Google For Gardeners" if you're looking for gardening information. If you're looking for seed saving information check out "Seed Snatcher"search engine.

Do not have a blog yourself? Comment using the "anonymous" feature. If you have a Twitter or FB account feel free to use the "Name URL" feature so other people can find you.


Thanks for visiting.