Markmonitor dotcom | Watchdog or What?
What is Markmonitor.com?
Markmonitor.com is a company providing brand protection to (mainly) global brands.
Markmonitor monitors the Internet (supposedly) looking for brand-piracy, domain name hijacking and counterfeiting (of branded goods) among it’s range of client services. The company must use search spiders to trawl websites looking for this information.
They also have another side of business, as a domain registrar, and a number of large corporations including Apple.com have their domains under their ambit.
The IP’s seen so far are part of Google’s network, specifically Google Plus1 IPs in the Range 174.125.*.*
So far so good. Most of us will agree the (claimed) services offered by Markmonitor.com may be desirable, at least by international corporations trying to protect their brands. So why have I banned their spiders and their website IP addresses?
Markmonitor bots Use Faked User Agent Strings
I have watched activity from this bot for several months. Regular warning messages are logged to my site server activity log “cannot modify header information – headers already sent by…” The User Agent in this case was identified as Google Feedfetcher! Feedfetcher is the well known RSS feed reader from Google.
HOWEVER, the article this ‘feedfetcher’ is providing has long ceased to be included in the site’s RSS feeds – In fact it’s an article posted last year – so where is ‘feedfetcher’ getting this RSS feed from.
I have also seen Markmonitor bots using normal Mozilla browser user agents strings, as well as missing or empty user agents.
As far as I’m concerned, any organisation used faked or misleading user agents is up to no good – and will be banned from my sites, and my client’s sites.
Markmonitor.com bots are rule breakers
They ignore robot.txt disallow rules. They also look for files (so far only images) that do not exist in folders – usually because they have been moved or deleted… (so where are they getting these old links from?)
Analysing the warning message, these are generated when an image is loaded either in a browser window (forbidden) hotlinked (forbidden) or trawled by a bot (in a forbidden folder) – First of all, the image is displayed as it should be when viewing the article directly. Secondly the RSS feed does not include this image (although it did last year).
This leads me to the opinion the article has either been plagiarised, or has been hotlinked from IPs in the range. As the link appears to come from several IP’s in the range 74.125.*.* My usual practice of banning a full or sub range of IPs when bad activity is seen from a number of individual IPs from the range is not a good idea in this case – Google Plus1 also uses IPs in the 74 range so some other means is needed.
If a single discovered IP used by MarkMonitor.com is banned, it will send bots from other IPs to the site (tested and confirmed). So this organisation is insistent it wants its bots on websites! The only effective way to prevent this companies spider is to identify every IP being used and block them individually. All the shortcuts and group blocks end up blocking Google Plus1 as well. (The +1 button won’t work)
I’m trying another way to block the spider using by using “deny from markmonitor.com”, but will have to see if this works – no guarantees.
So Is there Cause For Concern
Does Markmonitor.com pose any cause for concern? To my mind the answer must be “yes it does”. If for no other reason that they may use bots (spiders) without disclosing their identity and source. There is no public information about how they search the web, or the bots used, in the course of their business.
Quite simply, any spider or bot, whether it belongs to a bona-fide company or a private individual that is not correctly identified is unwelcome on any website. Worse still, if the bot is disguised as a genuine high profile bot (e.g. feedfetcher or googlebot) it can only be considered undesirable in the absence of other public information.
Even more concerning is the apparent practice by Markmonitor.com of ‘pretending’ to be someone else. If this is the way they conduct business, it is at least dishonest – and do you really want dishonest people trawling your website?
Does Markmonitor.com Pose Any Current Threat.
At this time I would say probably not. Disregarding the conspiracy theorists and paranoid, there have been no bad events (according to Project Honeypot) recorded from visits by their IPs.
Reports are however starting to appear linking the markmonitor.com domain to phishing scams – of course it is highly likely the domain is itself being spoofed – and serves them right. If they want to spoof legitimate user agents, why shouldn’t their brand be spoofed in return…
However, when it comes to internet security, we should all err on the side of caution, and not trust anything without definite knowledge of what it is, what it is doing and where it comes from.
I am not happy that this organisation shares an IP range with Google… I am more unhappy that bots seen from this range are ignoring rules and trawling for images. Trawl my content only please…
Protecting Your Site from Marmonitor
Protecting your site or blog from Markmonitor.com bots requires vigilance. Ban one IP, user agent or host, and the company will use another IP – they seem to have many available. Deny one of their spoofed user agents – they will use another fake user agent string, or a blank user agent identity. The same goes when blocking a host address; they simply switch hosts.
The only way to combat this pernicious spidering of your site is keeping on top of suspicious activity. Watch server logs for missing user agents – and block all traffic to the site for any visitor without a valid user agent string with (.htaccess for Apache servers).
A good clue to suspicious activity is a missing referrer – 99% of the time the only wanted traffic not coming from a referrer is a search engine spider (you could deny all non-referred traffic, but that would also block direct visits from people bookmarking content URLS – not visitor friendly). Better to use the non-referred traffic as a starting point to investigate the visit, find out more, and ban the source if it’s definitely from a bad source…
Markmonitor Can Explain
They can correctly identify any spider in use with a valid and true user agent ID accredited to their organisation, and they can ensure their spider obeys robots.txt rules including the advanced rules mentioned in my earlier article “Go Away Baidu and Yandex“
Until then I don’t want their bots on my sites, I don’t want them linking to content or images on my sites; GO AWAY
Related articles referring to Markmonitor.com
- One Company To Rule Them All? (abovetopsecret.com) Conspiracy Theories, Paranoia or what? While I don’t subscribe to conspiracy theories, the fact remains suspicious behaviour has been seen from IP’s controlled by Markmonitor.
- Warning fake paypal received on My Opera email A report claiming a fishing scam used a spoof of PayPal.com using domain name Paipal.com which appears to be owned by Markmonitor
- PayPal Security | DSL Reports Forums This forum thread describes another PayPal phishing attempt where markmonitor.com domain name may have been spoofed (about halfway down the page)