As I mentioned in Complete Blog Remake, Part 2, there are lots of evil bots out there. They are relentless in their
automated search for known exploits, and a lot of those target WordPress installations and plugins. Most of these go
through the normal HTTP protocol, trying to find URLs that are routed to some badly written, exploitable PHP code.
In my logs, I find thousands of calls to /xmlrpc.php, /wp-admin/admin-ajax.php,
/wp-content/uploads/locate.php and others where there are current or older versions that expose known SQL
injection or script injection exploits.
Because of how my routing is built, all of these requests are interpreted as possible article titles and sent to
the ArticleController.Single(string postname) method, which searches for an article with a weird name, doesn't
find it, and responds with a 404 page. The request gets logged by Azure, and when there are many
bots (or just one unusually intense one), Azure alerts me of having many client errors in a short time period.
In the beginning, I used these logs to double-check that I hadn't missed any incoming links, but because of the huge amount of bots out there, the requests that I'm really interested in gets drowned out by the low signal-to-noise ratio.
Building the naughty list
Some requests could be people or crawlers (Google, Yahoo, Baidu, ...) just doing their job, following links that
may or may not lead somewhere, so I don't want to blindly and automatically block the IP address of everyone making
mistakes in typing or following a misspelled link. But if there are a few bad requests from the same IP
address (say eight in 24 hours), I will block them.
Other requests are just blatant attempts at finding exploits. I will block the IP address of those calls instantly.
The Single method makes use of the PageNotFound method of the base class, so the result is really straightforward:
// ArticleController.cspublicActionResultSingle(stringpostname)
{
if (postname.StartsWith("xmlrpc.php") ||
postname.Contains("wp-admin") ||
postname.Contains("wp-content/plugins"))
{
returnPageNotFound(403);
}
/* Edited out: Code that searches for the requested article */if (article == null)
{
returnPageNotFound();
}
}
The PageNotFound method of the base class isn't too complicated either. It
calls ApplicationData.SuspectUserAddress to handle the list of suspicious or blocked IP addresses:
// BlogControllerBase.cspublicActionResultPageNotFound(intstatusCode = 404)
{
if (applicationData.SuspectUserAddress(Request.UserHostAddress, statusCode == 403))
{
returnnewHttpStatusCodeResult(403);
}
else
{
/* Edited out: Code that gives a nice 404 page */
}
}
And here is finally some of the code that keeps track of suspicious IP addresses:
// ApplicationData.csinternalboolSuspectUserAddress(stringaddress, boolconfidentSuspicion)
{
// Is this address already blocked? Just return true.if (BlockedAddresses.Contains(address)) returntrue;
// If I'm not sure yet, check some more rulesif (!confidentSuspicion)
{
// How many times has this address acted suspiciously already?intcount = SuspiciousRequestAddresses.Count(sra => sra == address);
if (count >= 5)
{
// Do a reverse DNS lookup. Is it NOT a known nice crawler?if (!IsNiceCrawler(address))
{
// Then this suspicion is a confident one!confidentSuspicion = true;
}
}
}
// Are we sure now?if (confidentSuspicion)
{
// Remove from list of suspicious requestsSuspiciousRequestAddresses.RemoveWhere(sra => sra == address);
// Add to list of blocked addressesBlockedAddresses.Add(address);
returntrue;
}
else
{
// We are not sure... That means this request should be stored as a suspicious oneSuspiciousRequestAddresses.Add(address);
returnfalse;
}
}
privateboolIsNiceCrawler(stringaddress)
{
varparsed = IPAddress.Parse(address);
varhostInfo = Dns.GetHostEntry(parsed);
// Something like (google.com$)|(googlebot.com$)|(msn.com$)|(crawl.baidu.com$)stringvalidationRegex = ConfigurationManager.AppSettings["NiceCrawlersRegex"];
// Check all of hostInfo's aliases for one that matches the regexboolisNice = hostInfo.Aliases.Any(
alias => Regex.IsMatch(alias, validationRegex, RegexOptions.IgnoreCase)
);
returnisNice;
}
After doing this, the amount of 404s went down by a lot, but the 403 errors started rising. I checked a few times to see
that the blocked requests are really exploit attempts, and I feel comfortable with this solution.
Also, I changed my Azure alerts to separate the different 4xx responses. I still want those unhandled 404s to generate
an alert so that I can fix broken links. This works really well for me.
This is the second part of a series of articles about my complete blog remake. If you haven't read the first part, here it is: Complete blog remake, part 1
Last week I wrote about completely remaking my blog, leaving WordPress, PHP, MySQL and Loopia behind. One
of my main concerns was to keep all urls intact, since I know that some of my old articles have a lot of incoming links. The whole url
scheme reverse-engineering was the focus of the first part of this article series.
The ghost of WordPress unpatched
After taking a leap of faith and actually publishing the first version of my reverse-engineered blog engine (along with all of the
articles) to Azure, I kept a vigil eye on the access logs. I wanted to make sure that I hadn't missed any incoming links. I
discovered two cases of misspelled addresses and the non-existing robots.txt and favicon.ico, that I could fix quickly, but
most of all there were hundreds of of 404's for the WordPress administration panel, WordPress-specific AJAX url's, some requests
for /xmlrpc.php, and a lot of requests for (which I found out after some searching) known security flaws in older WordPress versions.
Virtually every evil bot net out there is trying to exploit our blogs, hoping to find an unpatched WordPress installation. This is one
of the reasons I wanted to leave WordPress behind. It is also the reason I have chosen to not have an administration web interface for
my blog. Instead I am actually using Visual Studio, not just for coding, running unit tests, debugging, testing and publishing to Azure,
but also for writing my articles and publishing them.
Putting T4 text templates to work
My article data files are really simple text files, each containing a title, a category, some labels and the markup of the article
itself. I wrote a simple T4 template for converting this to an XML file. When I have written an article, I simply run
the TextTemplatingFileGenerator tool and then click Publish to send the new article to Azure. Then I just wait for the scheduled
task (runs once per hour) to pick up the new article and make it visible.
My favorite IDE, by far, is Visual Studio, and my favorite language is without
doubt C#. I have blogged a lot
about JavaScript and PHP too, but I have to admit that C# is my number one.
Being able to actually use Visual Studio as my main tool for blogging (both when writing the Blog engine code, and when writing articles) feels really great.
So far, everything that I have done fits well within the Free tier of my Azure subscription. So not only have I a blogging tool that suits me
better, I have also reduced my web hosting cost with 100 %. There is still more to write about, like having ASP.NET MVC Areas that map to
my subdomains, like demo.atornblad.se, and I leave that for the next part of this series.
For a while now, I have been thinking of migrating this blog to a completely different environment. I have grown
really tired of the two most important technical aspects of this blog: the hosting provider, and the blog platform.
Leaving WordPress
For some bloggers, WordPress is the way to go. The platform is pretty easy to use, even
if there is a bit of a learning curve, and it's still capable enough for more advanced blogging solutions, especially
if you are ready to go down the messy plugin path. For lots of blogs, WordPress does the job.
However, for my needs, WordPress is too big, too small, and too messy.
Too big, because what I really only need
is to serve some text to readers. This can be done with a static filesystem serving articles. But I also need
archives, search and some structure in the form of categories and labels. WordPress does that but so could I.
Too small, because I want to be able to show nicely formatted source code in different languages, display
some graphics based on data, link to my spare-time projects and GitHub repositories in a good way. There are
plugins for all of those, but WordPress plugins tend to be really messy.
Too messy, in fact, because the WordPress core and all those plugins tend to make it difficult to properly
keep blogs upgraded and secure. You should always apply security patches, but that might break some plugin that
hasn't got around to patching or upgrading yet.
There are more arguments for leaving WordPress behind, and I think Daniel Thornbury said it best.
Writing my own blog engine couldn't be that difficult. After all, it's just a matter of mapping urls to information,
right? That's what web servers do. When looking at my WordPress configuration, I realized that I should be able to
create a very simple ASP.NET MVC site to do all this for me. This is a run-down of what addresses are handled:
Addresses for routing
/ – Start page - latest posts in reverse chronological order (first page)
/page/N – Latest posts in reverse chronological order (page N)
All of this could be handles with a pretty simple chain of calls to MapRoute, like this:
// RouteConfig.cspublicstaticvoidRegisterRoutes(RouteCollectionroutes)
{
routes.MapRoute(
name : "MonthlyArchivePage",
url : "{year}/{month}/page/{page}",
constraints : new { year = @"^\d{4}$", month = @"^\d{2}$", page = @"^\d+$" },
defaults : new { controller = "Archive", action = "MonthPage" }
);
routes.MapRoute(
name : "MonthlyArchiveFirstPage",
url : "{year}/{month}",
constraints : new { year = @"^\d{4}$", month = @"^\d{2}$" },
defaults : new { controller = "Archive", action = "MonthPage", page = 1 }
);
routes.MapRoute(
name : "LabelPage",
url : "label/{name}/page/{page}",
constraints : new { page = @"^\d+$" },
defaults : new { controller = "Archive", action = "LabelPage" }
);
routes.MapRoute(
name : "LabelPageFirstPage",
url : "label/{name}",
defaults : new { controller = "Archive", action = "LabelPage", page = 1 }
);
routes.MapRoute(
name : "CategoryPage",
url : "category/{name}/page/{page}",
constraints : new { page = @"^\d+$" },
defaults : new { controller = "Archive", action = "CategoryPage" }
);
routes.MapRoute(
name : "CategoryPageFirstPage",
url : "category/{name}",
defaults : new { controller = "Archive", action = "CategoryPage", page = 1 }
);
routes.MapRoute(
name : "Search",
url : "search",
defaults : new { controller = "Search", action = "Index" }
);
routes.MapRoute(
name : "Feed",
url : "feed",
defaults : new { controller = "Feed", action = "Index" }
);
routes.MapRoute(
name : "StartPage",
url : "page/{page}",
constraints : new { page = @"^\d+$" },
defaults : new { controller = "Start", action = "Page" }
);
routes.MapRoute(
name : "SingleArticle",
url : "{postname}",
defaults : new { controller = "Article", action = "Single" }
);
routes.MapRoute(
name : "StartFirstPage",
url : "",
defaults : new { controller = "Start", action = "Page", page = 1 }
);
}
With a total of five controller classes and just seven action methods, the MVC part is really simple. Each
method has a non-complex data query and a corresponding view, all sharing a common set of "master page" layout
views. The paginated views (start page and archive pages) all share their pagination technique, so I only
have to write it once. These are some examples of how simple the code can be:
I also want to keep the data backend super-simple, so I actually decided to go with putting the articles
on files in the App_Data folder, and storing the entire dataset in the HttpApplicationState object,
effectively caching everything in-memory between app restarts. But since I don't want to actually restart
the application every time I add a new post, I also write a "data refresh" action method that clears and
rebuilds the cache. That method is set up with some added security so that it can only be called from localhost.
All in all, writing all the C# code needed for migrating from WordPress to ASP.NET MVC takes a couple of days.
Adding some nice CSS stuff and JavaScript takes another few days.
Leaving Loopia
For several years now, I have been a customer of Swedish web hosting provider Loopia. They
have always been almost good enough, not quite living up to 100 % of my expectations. When ASP.NET 4 came out, it
took Loopia almost two years to provide it to customers, and then ASP.NET 4.5 was already in the works. Their
current version of PHP is 5.5.11, which was released almost two years ago.
This is really bad for several reasons.
I'm not talking about web developers wanting the latest shiny toys to play with (even if that is also the case). No,
I'm talking about security and quality. Failing to upgrade means failing to install security patches. I'm not
demanding an upgrade to PHP 7, and not even PHP 5.6, but
at the very least they should upgrade to 5.5.30, which has had a lot of bugfixes and security added since 5.5.11.
Loopia has worked for me in the past, but they have also always been a little too expensive given what they
provide. Too little, too late, too expensive.
So I decided to move to Microsoft Azure. With a really nice set of tools available for
developers, and an exquisitely well-made integration with Visual Studio, the choice
was simple. I can design and create my blog, run all my tests, publish it and even debug it remotely from within the same tool.
Brave new world
So I did a full move. From PHP to ASP.NET and C#. From Loopia to Microsoft Azure. From MySQL to filesystem.
From *WordPress to my own creation. And it's all in the cloud, monitored, secure, load-balanced, and almost completely free!
Next time, I will talk a little about the Azure part of the move. Deployment, scheduled jobs, monitoring and so on.