Creating Intelligence from Big Data

Exploiting big data, including the 90% of data hidden in the deep web, provides new insight for law enforcement, business, government and research. Learn how to create actionable intelligence from big data in this white paper, including:
  • The value of big data
  • Who can benefit most from big data
  • How to create understanding from big data
Download this whitepaper to learn why big data matters and how to create actionable insight from the deep web.

Not Only Structured Query Language

 Approaches:

Amazon Dynamo distributed key value stores (Cassandra, VoltDB, Riak, Redis)

Google Big Table (Hbase)

Document Oriented Database (MongoDB, CouchDB, MarkLogic)

Graph Database (Neo4j)
 
Native Big Data Connectors Source: jasperforge.org

CLEAR (Consolidated Lead Evaluation and Reporting)

CLEAR next generation version of the investigative tool, AutoTrackXP, which has over a decade of history in the public records market. CLEAR launched to the law enforcement market in 2008 as ChoicePoint CLEAR (Consolidated Lead Evaluation and Reporting).

MeDICi Data Intensive Computing Framework

PNNL created a world-leading research program in Data Intensive Computing.

DIC is characterized by problems where data is the primary challenge, whether it is the complexity, size, or rate of the data acquisition. As the number of emerging scientific and national security problems continues to grow, so do our advancements in software and hardware architectures, analytics and visualization. We invite you to explore how PNNL is accelerating the speed of scientific discovery, decision support and threat detection across multiple disciplines.


Starlight Visual Information System (VIS)

A brief video demonstration of Starlight's capabilities including examples of social network analysis (SNA) features and web reporting functionality:



 
Text and UAV Video Analysis
 

Big data: The next frontier for innovation, competition, and productivity

Report from McKinsey Global Institute
 
Download Full Report (.pdf)

The amount of data in our world has been exploding, and analyzing large data sets—so-called big data—will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, according to research by MGI and McKinsey's Business Technology Office. Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future.
MGI studied big data in five domains—healthcare in the United States, the public sector in Europe, retail in the United States, and manufacturing and personal-location data globally. Big data can generate value in each. For example, a retailer using big data to the full could increase its operating margin by more than 60 percent. Harnessing big data in the public sector has enormous potential, too. If US healthcare were to use big data creatively and effectively to drive efficiency and quality, the sector could create more than $300 billion in value every year. Two-thirds of that would be in the form of reducing US healthcare expenditure by about 8 percent. In the developed economies of Europe, government administrators could save more than €100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues. And users of services enabled by personal-location data could capture $600 billion in consumer surplus. The research offers seven key insights.

1. Data have swept into every industry and business function and are now an important factor of production, alongside labor and capital. We estimate that, by 2009, nearly all sectors in the US economy had at least an average of 200 terabytes of stored data (twice the size of US retailer Wal-Mart's data warehouse in 1999) per company with more than 1,000 employees.

2. There are five broad ways in which using big data can create value. First, big data can unlock significant value by making information transparent and usable at much higher frequency. Second, as organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information on everything from product inventories to sick days, and therefore expose variability and boost performance. Leading companies are using data collection and analysis to conduct controlled experiments to make better management decisions; others are using data for basic low-frequency forecasting to high-frequency nowcasting to adjust their business levers just in time. Third, big data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services. Fourth, sophisticated analytics can substantially improve decision-making. Finally, big data can be used to improve the development of the next generation of products and services. For instance, manufacturers are using data obtained from sensors embedded in products to create innovative after-sales service offerings such as proactive maintenance (preventive measures that take place before a failure occurs or is even noticed).


3. The use of big data will become a key basis of competition and growth for individual firms. From the standpoint of competitiveness and the potential capture of value, all companies need to take big data seriously. In most industries, established competitors and new entrants alike will leverage data-driven strategies to innovate, compete, and capture value from deep and up-to-real-time information. Indeed, we found early examples of such use of data in every sector we examined.

4. The use of big data will underpin new waves of productivity growth and consumer surplus. For example, we estimate that a retailer using big data to the full has the potential to increase its operating margin by more than 60 percent. Big data offers considerable benefits to consumers as well as to companies and organizations. For instance, services enabled by personal-location data can allow consumers to capture $600 billion in economic surplus.

5. While the use of big data will matter across sectors, some sectors are set for greater gains. We compared the historical productivity of sectors in the United States with the potential of these sectors to capture value from big data (using an index that combines several quantitative metrics), and found that the opportunities and challenges vary from sector to sector. The computer and electronic products and information sectors, as well as finance and insurance, and government are poised to gain substantially from the use of big data.

6. There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

7. Several issues will have to be addressed to capture the full potential of big data. Policies related to privacy, security, intellectual property, and even liability will need to be addressed in a big data world. Organizations need not only to put the right talent and technology in place but also structure workflows and incentives to optimize the use of big data. Access to data is critical—companies will increasingly need to integrate information from multiple data sources, often from third parties, and the incentives have to be in place to enable this.


Podcast Download Distilling value and driving productivity from mountains of data

MGI senior fellow Michael Chui discusses how the scale and scope of companies' access to data is changing the way they do business.

Assessment of men; selection of personnel for the Office of Strategic Services [by] the OSS Assessment Staff

"This volume is the account of how a number of psychologists and psychiatrists attempted to assess the merits of men and women recruited for the Office of Strategic Services. The undertaking is reported because it represents the first attempt in America to design and carry out selection procedures in conformity with so-called organismic (Gestalt) principles. As a novel experiment it might interest a wide range of readers, but more specifically we hope it will invite the attention of those who are concerned with the problem of predicting human behavior, especially if they are engaged in practicing and developing clinical psychology and psychiatry and in improving present methods of diagnosis, assessment, and selection. All told, 5,391 recruits were studied intensively over a three-day period at one station or over a one-day period at another. These were the two areas in the United States where the bulk of assessment was done. Of these the performances of 1,187 who went overseas were described and rated by their superior officers and associates in the theater. Some standard procedures, elementalistic in design, were included in our program, because the best of these instruments are especially efficient in picking out disqualifying defects of function and so in eliminating men who arc definitely inferior. Organismic methods, on the other hand, are to be recommended in addition whenever it is necessary to discriminate unusual talent, to measure ability in the range running from low average to high superior. The plan described in this book was devised to fit the special needs of the Office of Strategic Services, but it would not take much ingenuity to modify some of the techniques and to invent others of the same type to meet the requirements of other institutions. These methods were first used on a large scale by Simoneit, as described in Wehrpsychologie, and the German military psychologists, and after them by the British"--Introduction. (PsycINFO Database Record (c) 2006 APA, all rights reserved).

Turning Firefox to an Ethical Hacking Platform

Security-database.com list of useful security auditing extensions :

- Information gathering

Whois and geo-location
ShowIP : Show the IP address of the current page in the status bar. It also allows querying custom services by IP (right mouse button) and Hostname (left mouse button), like whois, netcraft.
Shazou : The product called Shazou (pronounced Shazoo it is Japanese for mapping) enables the user with one-click to map and geo-locate any website they are currently viewing.
HostIP.info Geolocation : Displays Geolocation information for a website using hostip.info data. Works with all versions of Firefox.
Active Whois : Starting Active Whois to get details about any Web site owner and its host server.
Bibirmer Toolbar : An all-in-one extension. But auditors need to play with the toolbox. It includes ( WhoIs, DNS Report, Geolocation , Traceroute , Ping ). Very useful for information gathering phase

Enumeration / fingerprinting
Header Spy: Shows HTTP headers on statusbar
Header Monitor : This is Firefox extension for display on statusbar panel any HTTP response header of top level document returned by a web server. Example: Server (by default), Content-Encoding, Content-Type, X-Powered-By and others.

Social engineering
People Search and Public Record: This Firefox extension is a handy menu tool for investigators, reporters, legal professionals, real estate agents, online researchers and anyone interested in doing their own basic people searches and public record lookups as well as background research.

Googling and spidering
Advanced dork : Gives quick access to Google’s Advanced Operators directly from the context menu. This could be used to scan for hidden files or narrow in a target anonymously (via the scroogle.org option) [Updated Definition. Thanks to CP author of Advanced Dork]
SpiderZilla : Spiderzilla is an easy-to-use website mirror utility, based on Httrack from www.httrack.com.
View Dependencies : View Dependencies adds a tab to the "page info" window, in which it lists all the files which were loaded to show the current page. (useful for a spidering technique)

- Security Assessment / Code auditing

Editors
JSView : The ’view page source’ menu item now opens files based on the behavior you choose in the jsview options. This allows you to open the source code of any web page in a new tab or in an external editor.
Cert Viewer Plus : Adds two options to the certificate viewer in Firefox or Thunderbird: an X.509 certificate can either be displayed in PEM format (Base64/RFC 1421, opens in a new window) or saved to a file (in PEM or DER format - and PKCS#7 provided that the respective patch has been applied - cf.
Firebug : Firebug integrates with Firefox to put a wealth of development tools at your fingertips while you browse. You can edit, debug, and monitor CSS, HTML, and JavaScript live in any web page
XML Developer Toolbar:allows XML Developer’s use of standard tools all from your browser.

Headers manipulation
HeaderMonitor : This is Firefox extension for display on statusbar panel any HTTP response header of top level document returned by a web server. Example: Server (by default), Content-Encoding, Content-Type, X-Powered-By and others.
RefControl : Control what gets sent as the HTTP Referer on a per-site basis.
User Agent Switcher :Adds a menu and a toolbar button to switch the user agent of the browser

Cookies manipulation
Add N Edit Cookies : Cookie Editor that allows you add and edit "session" and saved cookies.
CookieSwap : CookieSwap is an extension that enables you to maintain numerous sets or "profiles" of cookies that you can quickly swap between while browsing
httpOnly : Adds httpOnly cookie support to Firefox by encrypting cookies marked as httpOnly on the browser side
Allcookies : Dumps ALL cookies (including session cookies) to Firefox standard cookies.txt file

Security auditing
HackBar : This toolbar will help you in testing sql injections, XSS holes and site security. It is NOT a tool for executing standard exploits and it will NOT learn you how to hack a site. Its main purpose is to help a developer do security audits on his code.
Tamper Data : Use tamperdata to view and modify HTTP/HTTPS headers and post parameters.
Chickenfoot : Chickenfoot is a Firefox extension that puts a programming environment in the browser’s sidebar so you can write scripts to manipulate web pages and automate web browsing. In Chickenfoot, scripts are written in a superset of Javascript that includes special functions specific to web tasks.

- Proxy/web utilities

FoxyProxy : FoxyProxy is an advanced proxy management tool that completely replaces Firefox’s proxy configuration. It offers more features than SwitchProxy, ProxyButton, QuickProxy, xyzproxy, ProxyTex, etc
SwitchProxy: SwitchProxy lets you manage and switch between multiple proxy configurations quickly and easily. You can also use it as an anonymizer to protect your computer from prying eyes
POW (Plain Old WebServer) : The Plain Old Webserver uses Server-side Javascript (SJS) to run a server inside your browser. Use it to distribute files from your browser. It supports Server-side JS, GET, POST, uploads, Cookies, SQLite and AJAX. It has security features to password-protect your site. Users have created a wiki, chat room and search engine using SJS.

- Misc

Hacks for fun
Greasemonkey : Allows you to customize the way a webpage displays using small bits of JavaScript (scripts could be download here)

Encryption
Fire Encrypter : FireEncrypter is an Firefox extension which gives you encryption/decryption and hashing functionalities right from your Firefox browser, mostly useful for developers or for education & fun.

Malware scanner
QArchive.org web files checker : allowing people to check web files for any malware (viruses, trojans, worms, adware, spyware and other unwanted things) inclusions.
Dr.Web anti-virus link checker : This plugin allows you to check any file you are about to download, any page you are about to visit
ClamWin Antivirus Glue for Firefox : This extension scans every downloaded file automatically with ClamWin.

Anti Spoof
refspoof : Easy to pretend to origin from a site by overriding the url referrer (in a http request). — it incorporates this feature by using the pseudo-protocol spoof:// .. thus it’s possible to store the information in a "hyperlink" - that can be used in any context .. like html pages or bookmarks

Additional Links:

Blackbuntu is Ubuntu base distro for Penetration Testing with GNOME Desktop Environment. It's currently being built using the Ubuntu 10.10.

The Metasploit® Framework is a free, open source penetration testing solution developed by the open source community & Rapid7.

The Social-Engineer Toolkit (SET) is specifically designed to perform advanced attacks against the human element.

The intelligence community gets social by Brian Fung, WSJ

Digital media is mostly about entertainment for some, while for others, the value lies in being able to spread messages to a large audience. But, as many news organizations are discovering, Web 2.0 technologies are as good for listening as they are for broadcasting. The notion of social media as a trend-monitoring tool is spreading — and now U.S. spy agencies are jumping on board.

Intelligence Advanced Research Projects Activity (IARPA), the intelligence community’s research arm, says it hopes to use data gathered from social media to predict political unrest and natural disasters. While the proposal may rankle privacy critics, it’s just the latest example of the way intelligence officials are turning to the social Web to collect policy-relevant information.

The CIA already monitors social networks manually. In 2010, agency analysts became aware of a YouTube account allegedly belonging to the propaganda service of North Korea. Pyongyang soon had other identities set up on Twitter and Facebook (the latter of which was abandoned). The CIA issued several reports later that year on the regime’s entry into social media, concluding that the new Web offensive was primarily aimed at influencing the population of South Korea, one of the world’s most digitally enabled societies. Both countries are engaged in a tenuous military truce and longstanding public relations war.

Even as it was watching North Korea’s evolving positions on social media, the CIA was conducting a study of the social media landscape in India (pdf). Beyond uncovering some fascinating details about the country’s Internet usage patterns, analysts discovered that many of India’s controversial separatist groups were taking advantage of social media tools to advocate their agendas.

Spy agencies’ growing interest in digital media is perhaps unsurprising given that it is an industry that trades in information.. But it also reflects broader, underlying trends in intelligence-gathering. Since the end of the Cold War, U.S. officials have embraced what are called “open sources” — non-classified information drawn from newspapers, radio broadcasts and other publicly accessible outlets. Open sources accounted for some 80 percent (pdf)of what the CIA knew about the Soviet Union’s downfall in the early 1990s, according to then-deputy director William Studeman. Sherman Kent, one of the agency’s first analysts, once estimated (pdf) that 80 percent of all U.S. intelligence needs could be met with open sources in peacetime.

The biggest victory for open-source proponents came in 2005, when the CIA launched a new center dedicated to gleaning intelligence from public information. The announcement signaled more of a rebranding than anything else — open source intelligence has always been a part of the mix to some degree — but the event finally lent recognition and credibility to a historically obscure tradition.

The open-source revolution has only accelerated with social media. Now, analysts can tap directly into millions of individual sources at the micro level, examining tweets, blog posts and videos for new information. They can also step back and survey entire social ecosystems, using vast amounts of metadata to identify significant patterns of behavior in the abstract. Or at the mid-range level, digital media can reveal important connections among small groups of users.

Whether government scrutiny of social media is problematic for civil society depends on your conception of public and private. But it raises other questions, too. What is the intelligence value of an individual tweet? How does the study of social media affect signal-to-noise ratios and, more importantly, how does it affect ways in which the intelligence community allocates its resources to adapt? Does social media change the meaning of open-source intelligence?

Harris DirectionFinding and Geo-Location Systems

Cellular Phone Interception

Stingray/KingFish vehicicular-borne analog and digital interrogation, Direction Finding (DF), SIGINT collection; AmberJack Phased Array DF Antenna; Harpoon amplifier; Tarpon Software; LoggerHead handheld device: survey, intercept, interrogate analog and digital cellular networks; Seahorse interrogation and direction finding system; Triggerfish Multichannel analog and digital cellular network monitor (Link to Source)







Prophet Low Level Voice Intercept

Pen-Link Wireline, Wireless, VoIP, 3G, IP collection

In-Q-Tel: A New Partnership Between the CIA and the Private Sector

Archangel: CIA's Supersonic A-12 Reconnaissance Aircraft By David Robarge

This history of the A-12 reconnaissance aircraft is occasioned by CIA’s acquisition on loan from the Air Force of the eighth A-12 in the production series of 15. Known as Article 128, the aircraft will be on display at the Agency’s Headquarters compound in Langley, Virginia. This history is intended to provide an accessible overview of the A‑12’s development and use as an intelligence collector.

Writing this story was a fascinating challenge because I am not an aviation historian and have never flown any kind of aircraft. Accordingly, I have tried to make the narrative informative to lay readers like myself, while retaining enough technical detail to satisfy those more knowledgeable about aeronautics and engineering. I have drawn on the sources listed in the bibliography and the extensive files on the A-12 program in CIA Archives. Hundreds of those documents will be declassified and released to the public in conjunction with the dedication of Article 128 in September 2007 as part of the Agency’s 60th anniversary commemoration. I have limited citations to specific documentary references and direct quotes from published works. When discrepancies arose among the sources regarding dates and other details, I have relied on the official records.

For their contributions to the substance and production of this work and to the documentary release, I would like to thank my colleagues on the CIA History Staff and at the Center for the Study of Intelligence, the information review officers in the Directorate of Science and Technology, designers and cartographers in the Directorate of Intelligence, and publication personnel at Imaging and Publishing Support. I also am grateful for historical material provided by the Lockheed Martin Corporation and the A-12 program veterans, the Roadrunners.

David Robarge
CIA Chief Historian
September 2007