• submit to reddit
Vlad Mihalcea12/31/13
2 replies

NoSQL is Not Just About Big Data

After publishing a small experiment with MongoDB, the author was challenged by the JOOQ team to match his results against Oracle. He will explore the specifics of that challenge in a later post, and in this one, he discusses a number of Small Data use-cases in which MongoDB was the right tool for the job.

Alec Noller12/31/13
0 replies

Lark: A "RESTy" Interface for Redis

Redis users might be interested in Lark, a new Python library designed to transform an HTTP request into a Redis command and provide a "RESTy" interface. Features include automatic JSON serialization and deserialization for Redis values, adapters for Flask and Django, and more.

Alec Noller12/31/13
1 replies

Are You Really a Data Scientist?

According to this recent post, you're not a data scientist just because you work with Hadoop a bit, and know some Python, and have some chops when it comes to databases. According to the author, it takes more than that, and in this article, he provides some resources to help you get there.

Gareth Rushgrove12/30/13
0 replies

Making the Web Secure, One Unit Test at a Time

Writing automated tests for your code is one of those things that, once you have gotten into it, you never want to see code without tests ever again. Why write pages and pages of documentation about how something should work when you can write tests to show exactly how something does work?

Lukas Eder12/30/13
0 replies

The Great SQL Implementation Comparison Page

Fortunately, we have SQL standards. Or do we? It’s a well-known secret (or cynical joke) that the SQL standard is yet another SQL dialect among peers.

Joshua Gross12/30/13
25 replies

Top Posts of 2013: Please stop using Twitter Bootstrap

Let’s be honest: a great many of us are tired of seeing the same old Twitter Bootstrap theme again and again. Twitter Bootstrap’s success has turned it into the Times New Roman of design.

John Sonmez12/30/13
20 replies

Top Posts of 2013: There Are Only 2 Roles of Code

All code can be classified into two distinct roles; code that does work (algorithms) and code that coordinates work (coordinators). I would say that 90% of the code I have written does not nicely divide my classes into algorithms and coordinators.

Mikio Braun12/30/13
0 replies

Top Posts of 2013: Big Data Beyond MapReduce: Google's Big Data Papers

Mainstream Big Data is all about MapReduce, but when looking at real-time data, limitations of that approach are starting to show. In this post, I’ll review Google’s most important Big Data publications and discuss where they are.

Lukas Eder12/30/13
0 replies

MongoDB “Lightning Fast Aggregation” Challenged with Oracle

What does “Scale” even mean in the context of databases? When talking about scaling, people have jumped to the vendor-induced conclusion that SQL doesn’t scale, while NoSQL scales. In this article, the author takes a look at database scalability by comparing Oracle benchmarks to MongoDB.

Arthur Charpentier12/30/13
0 replies

100 Blogs Worth Reading: R, Probability, Data Analysis and Visualization, and More

For the 100th installment of Arthur Charpentier's collections of data science-related links, he has decided to instead provide a list of 100 blogs worth reading. Topics covered include statistics, probability, R, data analysis, graphs, maps, visualization, sciences, economics, and more.

Ayende Rahien12/30/13
0 replies

Reducing the Cost of Writing to Disk

So, we found out that the major cost of random writes in our tests was actually writing to disk. Writing 500K sequential items resulted in about 300 MB being written. Writing 500K random items resulted in over 2.3 GB being written. So the obvious thing to do would be to use compression

Adam Fowler12/30/13
0 replies

MarkLogic Range Index Scoring in V7

A new feature of MarkLogic 7′s search API is range index scoring – affecting relevancy based on a value within a document. In this article, the author details a couple of use cases: One involving ratings, and one involving distance from the center point of a geospatial query.

Mark Needham12/29/13
3 replies

Neo4j: The 'Thinking in Graphs' Learning Curve

In a couple of Neo4j talks, the author has been asked how long it takes to get used to modeling data in graphs and whether he felt it's simpler than alternative approaches. His experience closely mirrors what he believes is a fairly common curve when learning technologies that change the way you think.

Duncan Brown12/27/13
0 replies

Upgrading Spring Data Neo4j and Neo4j: "Gotchas" to Watch for

The author was in the middle of upgrading a little test project to a newer version of Spring Data Neo4j and Neo4j itself when he came across a few little points that others might find useful. Here are a couple "gotchas" he encountered.

Ofir Nachmani12/27/13
0 replies

The Basics of Cloud Capacity

Traditional capacity planning, where new servers were purchased to fulfill the demand of a single application with a load of 20% max, is terminated by cloud computing. The comparison below shows some of the basic differences between the traditional DC and the cloud: