Predictive Analytics 101

by Ravi Kalakota

Data Mining

Source: Dr. Saed Sayad

An Introduction to Data Mining

Kenny Bastani: A Docker Image for Graph Analytics on Neo4j with A...

Kenny Bastani: A Docker Image for Graph Analytics on Neo4j with A...: I've just released a useful new Docker image for graph analytics on a Neo4j graph database with Apache Spark GraphX. This image deploy...

Named Entity Extraction

A Survey of named entity recognition and classification
Evaluation of Named Entity Extraction Systems
NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Extraction Tools
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: an open source platform for extracting and disambiguating named entities in very diverse documents
NERD Ontology
Unsupervised Named-Entity Extraction from the Web: An Experimental Study

Stanford Named Entity Recognizer (Conditional Random Field) Whitepaper
GATE (General Architecture for Text Engineering) ANNIE (A Nearly-New Information Extraction) System
Illinois Named Entity Tagger
Balie: Multilingual Information Extraction from Text with Machine Learning and Natural Language Techniques
Mallet: Machine Learning

Apache Nutch (Web Crawler); Bixo (Web Mining); Behemoth (Hadoop Document Analysis); Apache OpenNLP (Natural Language Processing); Apache Stanbol (Semantic Content Management); Apache Tika (Metadata and text extraction); Apache UIMA (Unstructured Information Management Architecture); Apache Mahout (Machine Learning); Apache Avro (Data Serialization); Apache SOLR/Lucene; Apache Clerezza (OSGi RESTful Web framework, Triplestore DB); Apache Jena (Semantic Web: RDF, Triplestore DB, OWL); Fedora (Flexible Extensible Digital Object Repository Architecture), Apache Ambari

Maui (Topic Indexing); Weka (Data Mining); LingPipe; FreeLing; OpenCalais; DBpediaSpotlight

Alchemy API; Evri API; Web ARChive (WARC) format

HBase  Bigtable: A Distributed Storage System for Structured Data , Apache Phoenix

Docker and DevOps

Docker Basics  (Tutorial)
Getting started with Docker
Docker User Guide
Dockerizing Applications
Docker Network Configuration
Working with Containers; Automatically Start Containers
Docker Run Reference
Launching Containers with Fleet; Fleet Configuration and API
Getting Started with Etcd; Etcd Configuration
Getting started with system
Working with Docker Images
Google Compute Engine: Container Images

Microservices in a Nutshell

The following is an except from an article that originally appeared on Martin Fowler's website.  

"Microservices" - yet another new term on the crowded streets of software architecture. Although our natural inclination is to pass such things by with a contemptuous glance, this bit of terminology describes a style of software systems that we are finding more and more appealing. We've seen many projects use this style in the last few years, and results so far have been positive, so much so that for many of our colleagues this is becoming the default style for building enterprise applications. Sadly, however, there's not much information that outlines what the microservice style is and how to do it.
In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare mininum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.

To start explaining the microservice style it's useful to compare it to the monolithic style: a monolithic application built as a single unit. Enterprise Applications are often built in three main parts: a client-side user interface (consisting of HTML pages and javascript running in a browser on the user's machine) a database (consisting of many tables inserted into a common, and usually relational, database management system), and a server-side application. The server-side application will handle HTTP requests, execute domain logic, retrieve and update data from the database, and select and populate HTML views to be sent to the browser. This server-side application is a monolith - a single logical executable. Any changes to the system involve building and deploying a new version of the server-side application.

Such a monolithic server is a natural way to approach building such a system. All your logic for handling a request runs in a single process, allowing you to use the basic features of your language to divide up the application into classes, functions, and namespaces. With some care, you can run and test the application on a developer's laptop, and use a deployment pipeline to ensure that changes are properly tested and deployed into production. You can horizontally scale the monolith by running many instances behind a load-balancer.

Monolithic applications can be successful, but increasingly people are feeling frustrations with them - especially as more applications are being deployed to the cloud . Change cycles are tied together - a change made to a small part of the application, requires the entire monolith to be rebuilt and deployed. Over time it's often hard to keep a good modular structure, making it harder to keep changes that ought to only affect one module within that module. Scaling requires scaling of the entire application rather than parts of it that require greater resource.

These frustrations have led to the microservice architectural style: building applications as suites of services. As well as the fact that services are independently deployable and scalable, each service also provides a firm module boundary, even allowing for different services to be written in different programming languages. They can also be managed by different teams.

We do not claim that the microservice style is novel or innovative, its roots go back at least to the design principles of Unix. But we do think that not enough people consider a microservice architecture and that many software developments would be better off if they used it.

For more information: 
James and Martin’s article goes on to define what a microservice architecture is by laying out 9 common characteristics, discussing its relationship with Service-Oriented Architecture, and considering whether this style is the future of enterprise software. Read it here:

James Lewis is a Principal Consultant at ThoughtWorks and member of the Technology Advisory Board. James' interest in building applications out of small collaborating services stems from a background in integrating enterprise systems at scale. He's built a number of systems using microservices and has been an active participant in the growing community for a couple of years.

Martin Fowler is an author, speaker, and general loud-mouth on software development. He's long been puzzled by the problem of how componentize software systems, having heard more vague claims than he's happy with. He hopes that microservices will live up to the early promise its advocates have found.

Pattern: Microservices Architecture
The Scale Cube
SRP: The Single Responsibility Principle (.pdf)
Decomposing Applications for deployability and scalability

Building microservices with Spring Boot: Part 1. Part 2, Part 3 (Deploying Spring Boot-based microservices  with Docker)

A Quick Introduction to CoreOS
An Introduction to CoreOS System Components
CoreOS Continued:etcd
Running Kubernetes on CoreOS Part 1, Part 2
CoreOS Contined: Fleet and Docker
Launching Containers with fleet
Deploying a NodeJS Application using Docker
Deploying Docking Containers on CoreOS using Fleet
Running CoreOS on Vagrant
Running CoreOS on Google Compute Engine
Running CoreOS on EC2

Toolkit: Spray; Akka; Scala; Clojure; Spring; Dropwizard (Jetty (Web Server), Jersey (RESTful), Jackson (JSON), JDBI (SQL), Logback , Yammer metrics, Guava (Core libraries), Hibernate Validator; NodeJS; Play; Python; GitHub

Apache Hue