Anders Tornblad

All about the code

Label archive for azure

403s for the Naughty List

As I mentioned in Complete Blog Remake, Part 2, there are lots of evil bots out there. They are relentless in their automated search for known exploits, and a lot of those target WordPress installations and plugins. Most of these go through the normal HTTP protocol, trying to find URLs that are routed to some badly written, exploitable PHP code. In my logs, I find thousands of calls to /xmlrpc.php, /wp-admin/admin-ajax.php, /wp-content/uploads/locate.php and others where there are current or older versions that expose known SQL injection or script injection exploits.

Because of how my routing is built, all of these requests are interpreted as possible article titles and sent to the ArticleController's Single(string postname) method, which searches for an article with a weird name, doesn't find it, and responds with a 404 page. The request gets logged by Azure, and when there are many bots (or just one unusually intense one), Azure alerts me of having many client errors in a short time period.

In the beginning, I used these logs to double-check that I hadn't missed any incoming links, but because of the huge amount of bots out there, the requests that I'm really interested in gets drowned out by the low signal-to-noise ratio.

Building the naughty list

Some requests could be people or crawlers (Google, Yahoo, Baidu, ...) just doing their job, following links that may or may not lead somewhere, so I don't want to blindly and automatically block the IP address of everyone making mistakes in typing or following a misspelled link. But if there are a few bad requests from the same IP address (say eight in 24 hours), I will block them.

Other requests are just blatant attempts at finding exploits. I will block the IP address of those calls instantly. The Single method makes use of the PageNotFound method of the base class, so the result is really straightforward:

public ActionResult Single(string postname) { if (postname.StartsWith("xmlrpc.php") || postname.Contains("wp-admin") || postname.Contains("wp-content/plugins")) { return PageNotFound(403); } /* Edited out: Code that searches for the requested article */ if (article == null) { return PageNotFound(); } }

The PageNotFound method of the base class isn't too complicated either. It calls the ApplicationData class to handle the list of suspicious or blocked IP addresses:

public ActionResult PageNotFound(int statusCode = 404) { if (applicationData.SuspectUserAddress(Request.UserHostAddress, statusCode == 403)) { return new HttpStatusCodeResult(403); } else { /* Edited out: Code that gives a nice 404 page */ } }

And here is finally some of the code that keeps track of suspicious IP addresses:

internal bool SuspectUserAddress(string address, bool confidentSuspicion) { // Is this address already blocked? Just return true. if (BlockedAddresses.Contains(address)) return true; // If I'm not sure yet, check some more rules if (!confidentSuspicion) { // How many times has this address acted suspiciously already? int count = SuspiciousRequestAddresses.Count(sra => sra == address); if (count >= 5) { // Do a reverse DNS lookup. Is it NOT a known nice crawler? if (!IsNiceCrawler(address)) { // Then this suspicion is a confident one! confidentSuspicion = true; } } } // Are we sure now? if (confidentSuspicion) { // Remove from list of suspicious requests SuspiciousRequestAddresses.RemoveWhere(sra => sra == address); // Add to list of blocked addresses BlockedAddresses.Add(address); return true; } else { // We are not sure... That means this request should be stored as a suspicious one SuspiciousRequestAddresses.Add(address); return false; } } private bool IsNiceCrawler(string address) { var parsed = IPAddress.Parse(address); var hostInfo = Dns.GetHostEntry(parsed); // Something like ($)|($)|($)|($) string validationRegex = ConfigurationManager.AppSettings["NiceCrawlersRegex"]; // Check all of hostInfo's aliases for one that matches the regex bool isNice = hostInfo.Aliases.Any( alias => Regex.IsMatch(alias, validationRegex, RegexOptions.IgnoreCase) ); return isNice; }

After doing this, the amount of 404s went down by a lot, but the 403 errors started rising. I checked a few times to see that the blocked requests are really exploit attempts, and I feel comfortable with this solution.

Also, I changed my Azure alerts to separate the different 4xx responses. I still want those unhandled 404s to generate an alert so that I can fix broken links. This works really well for me.

Complete blog remake, part 1
Complete blog remake, part 2
403s for the Naughty List (this part)

Complete blog remake, part 2

This is the second part of a series of articles about my complete blog remake. If you haven't read the first part, here it is: Complete blog remake, part 1

Last week I wrote about completely remaking my blog, leaving WordPress, PHP, MySQL and Loopia behind. One of my main concerns was to keep all urls intact, since I know that some of my old articles have a lot of incoming links. The whole url scheme reverse-engineering was the focus of the first part of this article series.

The ghost of WordPress unpatched

After taking a leap of faith and actually publishing the first version of my reverse-engineered blog engine (along with all of the articles) to Azure, I kept a vigil eye on the access logs. I wanted to make sure that I hadn't missed any incoming links. I discovered two cases of misspelled addresses and the non-existing robots.txt and favicon.ico, that I could fix quickly, but most of all there were hundreds of of 404's for the WordPress administration panel, WordPress-specific AJAX url's, some requests for /xmlrpc.php, and a lot of requests for (which I found out after some searching) known security flaws in older WordPress versions.

Virtually every evil bot net out there is trying to exploit our blogs, hoping to find an unpatched WordPress installation. This is one of the reasons I wanted to leave WordPress behind. It is also the reason I have chosen to not have an administration web interface for my blog. Instead I am actually using Visual Studio, not just for coding, running unit tests, debugging, testing and publishing to Azure, but also for writing my articles and publishing them.

Putting T4 text templates to work

My article data files are really simple text files, each containing a title, a category, some labels and the markup of the article itself. I wrote a simple T4 template for converting this to an XML file. When I have written an article, I simply run the TextTemplatingFileGenerator tool and then click Publish to send the new article to Azure. Then I just wait for the scheduled task (runs once per hour) to pick up the new article and make it visible.

My favorite IDE, by far, is Visual Studio, and my favorite language is without doubt C#. I have blogged a lot about JavaScript and PHP too, but I have to admit that C# is my number one. Being able to actually use Visual Studio as my main tool for blogging (both when writing the Blog engine code, and when writing articles) feels really great.

So far, everything that I have done fits well within the Free tier of my Azure subscription. So not only have I a blogging tool that suits me better, I have also reduced my web hosting cost with 100 %. There is still more to write about, like having ASP.NET MVC Areas that map to my subdomains, like, and I leave that for the next part of this series.

Complete blog remake, part 1
Complete blog remake, part 2 (this part)
403s for the Naughty List

Complete blog remake, part 1

For a while now, I have been thinking of migrating this blog to a completely different environment. I have grown really tired of the two most important technical aspects of this blog: the hosting provider, and the blog platform.

Leaving WordPress

For some bloggers, WordPress is the way to go. The platform is pretty easy to use, even if there is a bit of a learning curve, and it's still capable enough for more advanced blogging solutions, especially if you are ready to go down the messy plugin path. For lots of blogs, WordPress does the job.

However, for my needs, WordPress is too big, too small, and too messy. Too big, because what I really only need is to serve some text to readers. This can be done with a static filesystem serving articles. But I also need archives, search and some structure in the form of categories and labels. WordPress does that but so could I. Too small, because I want to be able to show nicely formatted source code in different languages, display some graphics based on data, link to my spare-time projects and GitHub repositories in a good way. There are plugins for all of those, but WordPress plugins tend to be really messy. Too messy, in fact, because the WordPress core and all those plugins tend to make it difficult to properly keep blogs upgraded and secure. You should always apply security patches, but that might break some plugin that hasn't got around to patching or upgrading yet.

There are more arguments for leaving WordPress behind, and I think Daniel Thornbury said it best.

Writing my own blog engine couldn't be that difficult. After all, it's just a matter of mapping urls to information, right? That's what web servers do. When looking at my WordPress configuration, I realized that I should be able to create a very simple ASP.NET MVC site to do all this for me. This is a run-down of what addresses are handled:

Addresses for routing

All of this could be handles with a pretty simple chain of calls to MapRoute, like this:

public static void RegisterRoutes(RouteCollection routes) { routes.MapRoute( name : "MonthlyArchivePage", url : "{year}/{month}/page/{page}", constraints : new { year = @"^\d{4}$", month = @"^\d{2}$", page = @"^\d+$" }, defaults : new { controller = "Archive", action = "MonthPage" } ); routes.MapRoute( name : "MonthlyArchiveFirstPage", url : "{year}/{month}", constraints : new { year = @"^\d{4}$", month = @"^\d{2}$" }, defaults : new { controller = "Archive", action = "MonthPage", page = 1 } ); routes.MapRoute( name : "LabelPage", url : "label/{name}/page/{page}", constraints : new { page = @"^\d+$" }, defaults : new { controller = "Archive", action = "LabelPage" } ); routes.MapRoute( name : "LabelPageFirstPage", url : "label/{name}", defaults : new { controller = "Archive", action = "LabelPage", page = 1 } ); routes.MapRoute( name : "CategoryPage", url : "category/{name}/page/{page}", constraints : new { page = @"^\d+$" }, defaults : new { controller = "Archive", action = "CategoryPage" } ); routes.MapRoute( name : "CategoryPageFirstPage", url : "category/{name}", defaults : new { controller = "Archive", action = "CategoryPage", page = 1 } ); routes.MapRoute( name : "Search", url : "search", defaults : new { controller = "Search", action = "Index" } ); routes.MapRoute( name : "Feed", url : "feed", defaults : new { controller = "Feed", action = "Index" } ); routes.MapRoute( name : "StartPage", url : "page/{page}", constraints : new { page = @"^\d+$" }, defaults : new { controller = "Start", action = "Page" } ); routes.MapRoute( name : "SingleArticle", url : "{postname}", defaults : new { controller = "Article", action = "Single" } ); routes.MapRoute( name : "StartFirstPage", url : "", defaults : new { controller = "Start", action = "Page", page = 1 } ); }

With a total of five controller classes and just seven action methods, the MVC part is really simple. Each method has a non-complex data query and a corresponding view, all sharing a common set of "master page" layout views. The paginated views (start page and archive pages) all share their pagination technique, so I only have to write it once. These are some examples of how simple the code can be:

public class ArchiveController : Controller { public ActionResult MonthPage(int year, int month, int page) { DateTime startDate = new DateTime(year, month, 1); DateTime endDate = startDate.AddMonths(1); var query = from article in ApplicationData.Current.Articles where article.Published >= startDate && article.Published < endDate select article; var model = new ArchivePageModel { AllArticles = query.ToArray(), Page = page, PageSize = 10, Title = "Monthly archive for " + startDate.ToString("MMMM yyyy") }; return View("Page", model); } public ActionResult LabelPage(string name, int page) { var query = from article in ApplicationData.Current.Articles where article.Labels.Contains(name, StringComparer.InvariantCultureIgnoreCase) select article; var model = new ArchivePageModel { AllArticles = query.ToArray(), Page = page, PageSize = 10, Title = $"Label archive for {name}" }; return View("Page", model); } } public class ArticleController : Controller { public ActionResult Single(string postname) { var article = ApplicationData.Current.SingleOrDefault(a => a.PostName.Equals(postname, StringComparison.InvariantCultureIgnoreCase)); if (article == null) { return HttpNotFound(); } return View(article); } }

I also want to keep the data backend super-simple, so I actually decided to go with putting the articles on files in the App_Data folder, and storing the entire dataset in the HttpApplicationState object, effectively caching everything in-memory between app restarts. But since I don't want to actually restart the application every time I add a new post, I also write a "data refresh" action method that clears and rebuilds the cache. That method is set up with some added security so that it can only be called from localhost.

All in all, writing all the C# code needed for migrating from WordPress to ASP.NET MVC takes about a day. Adding some nice CSS stuff and JavaScript takes another day.

Leaving Loopia

For several years now, I have been a customer of Swedish web hosting provider Loopia. They have always been almost good enough, not quite living up to 100 % of my expectations. When ASP.NET 4 came out, it took Loopia almost two years to provide it to customers, and then ASP.NET 4.5 was already in the works. Their current version of PHP is 5.5.11, which was release almost two years ago. This is really bad for several reasons.

I'm not talking about web developers wanting the latest shiny toys to play with (even if that is also the case). No, I'm talking about security and quality. Failing to upgrade means failing to install security patches. I'm not demanding an upgrade to PHP 7, and not even PHP 5.6, but at the very least they should upgrade to 5.5.30, which has had a lot of bugfixes and security added since 5.5.11.

Loopia has worked for me in the past, but they have also always been a little too expensive given what they provide. Too little, too late, too expensive.

So I decided to move to Microsoft Azure. With a really nice set of tools available for developers, and an exquisitely well-made integration with Visual Studio, the choice was simple. I can design and create my blog, run all my tests, publish it and even debug it remotely from within the same tool.

Brave new world

So I did a full move. From PHP to ASP.NET and C#. From Loopia to Microsoft Azure. From MySQL to filesystem. From WordPress to my own creation. And it's all in the cloud, monitored, secure, load-balanced, and almost completely free!

Next time, I will talk a little about the Azure part of the move. Deployment, scheduled jobs, monitoring and so on.

Complete blog remake, part 1 (this part)
Complete blog remake, part 2
403s for the Naughty List