WordPress Database Contains a Lot of Junk
WordPress stores a lot of junk in the database. WordPress news feeds, theme and plugin update release notices and information, old plugin and theme stuff from removed plugins and themes that don’t clean out their data; and that’s before getting to the useful junk like post revisions and other data useful to WordPress users.
Storing junk in the database is nothing new to WordPress, as a .org support submission from 4 years ago shows.
I noticed my WordPress database was excessive in size, found it padded with 7,000 lines of WordPress “news.” Why is this stuff in my database, and how do I get it out? wordpress.org/support/topic/database-padded-with-junk-content
Interestingly, no-one bothered to reply to the submission.
Unnecessary Data Stored in Database
Bing and MSN Bots Are Banned
I have banned Bing, Yahoo and MSN search engine spiders from my sites! I’m tired of the constant rule breaking and over-crawling by Bing and MSN search bots.
Bing is a Rule Breaker
Microsoft claims Bing honours robots.txt rules. In my experience that is a blatant lie. Bingbot / msnbot simply ignore robots.txt rules and crawl whatever they want. Some of the specific rules broken include;
- crawling system folders
- crawling image folders (msn-media bot). Image folders and extensions jpg, png,gif, bmp are disallowed
- crawling RSS feeds. All RSS feeds are disallowed; rss.xml, /feed/, etc
- crawling comment forms; DOMAIN/comment/184 – the path /comment/ is disallowed in robots.txt
The last straw was today. 2 days ago I added Bing and MSN user agent strings to disallowed bots in robots.txt across all my sites; this morning I see these bots read robot.txt then ignored it totally, and crawled the sites anyway.
Website Offline after DoS Attack
GNAX Hosting – So Far So Good
Last week I moved my domain graphicline.co.za to GNAX VPS hosting. I’ve watched Google page load times get shockingly poor the past four months. Nothing I’ve done on-site to improve performance has made any difference. I’d already tried several caching systems and offloaded some files to a CDN and other fast servers – with no improvement.
Eventually, after trying everything else, the only conclusion I could draw was the long path bottleneck between Google’s Mountain View servers and the data centre servers hosting my domain was the main culprit in the time it took for Big G to load pages.
Average page loads for 2 of the sites (WordPress) on the domain had gone from under 2.5 seconds in May to over 4 seconds in August and over 5 by September, while the main site (Drupal) was approaching 4 seconds from under 2 in May. Minimum page load speed had got to nearly 4 seconds for one site by September.
My Domain is too Busy – Server Overloaded
I never though I would be saying this – at least not within 12 months of switching my site(s) to CMS. It’s only 12 months – nearly twelve months, since I installed Drupal for my main website, less since adding two WordPress sub-domain sites. Now my the shared server is having to push out more than 80 000 pages a month – or so the server stats show. And it’s causing problems!
It’s causing problems because my domain is exhausting available server resources. The most common resource exceeding limits is CPU – often hitting 100% utilisation. Memory is often going over the available share of 1GB – that’s right 1000MB – available for the domain. While maximum entry processes are averaging between 4 and 6, there have been several instances where the 20 limit was exceeded.
And I’ve been wondering why I’ve been battling to edit and publish content for the past 6 weeks.
Break to Renew Shop Front Page
The new shop front page finally got started today, after being a low priority for several weeks. I decided to take a break from the tedious work of capturing today to do something about the front page of the website, which was looking tacky after changing the layout width.
First off a new sliding banner got set up with 1080 pixel wide images, 100 px wider than the old slider. The banner is also lower in height by 80. The change of the catalogue from WPOnlineStore to CartPress resulted in an increase to the width of the site pages, and the old banner looked very untidy.
False Valuation by Pansee.com
Ever had a site valuation by pansee.com – I got sent a mail informing me ‘someone’ had conducted a valuation of my website graphicline.co.za using pansee.com valuation tools, with a link to the valuation report. Interested to see what the report contained, I checked if Google had any information about malware on the site, then visited the page.
The valuation report had some interesting data. From the country where most of the website traffic is derived from, to number of daily visitors. And a claim to the value of advertising on the front page.
France is the Biggest Source of Traffic
This amused me… According to pansee.com, 12.2 percent of my traffic comes from France, while the USA only accounts for 8.1%
Our Website Server Has a Problem
The horrible feeling of clicking on a page, and the site reports a server error, and stays down… Since just after midday yesterday the Apache server hosting graphicline.co.za and all our sub sites has experienced problems. Major configuration changes were rolled out starting on Sunday April 16, which have severely disrupted the function of these sites…
Memory Allocation Reset to Default
First off all, memory allocations were reset to the server default value of 32MB – totally inadequate to run Drupal and WordPress. Then today I discovered sub-domains with static HTML files only were also throwing up a server error – so the sub-domains weren’t being seen as such by the server.