By Jeanie Straub, Douglas County Libraries
If you are interested – or think you should be interested – in Big Data, there are a couple of annual conferences that you may want to swish your feet in via internet offerings. They piqued my interest when I stumbled upon one presentation on Quora and will probably pique yours as well if you are not already familiar with these conferences: Very Large Databases (VLDB) and Extremely Large Databases (XLDB).
These conferences deal with terascale and petascale databases, respectively. (To get a sense of this size and the market, know that the data that makes up Wikipedia used 5.87 terabytes in January 2010 and that Teradata Corp. (NYSE: TDC) reported revenue of $602 million for the quarter ended Sept. 30, 2011, an increase of 23 percent.
XLDB
The first one I’ll mention, XLDB 2011: 5th Extremely Large Databases Conference, was held Oct. 18-19 in Menlo Park, Calif.
According to the website, http://www-conf.slac.stanford.edu/xldb2011/, the conference is “in response to the exploding need for systems and tools capable of managing and analysing extremely large data sets.”
The main goals of the conference are:
- Encourage and accelerate the exchange of ideas between users trying to build extremely large databases worldwide and database solution providers
- Share lessons, trends, innovations, and challenges related to building extremely large databases
- Facilitate the development and growth of practical technologies for extremely large databases
- Strengthen, expand, and engage the XLDB community
In 2011, the XLDB conference focused on “practical solutions.”
People from academia represent about 14 percent of the participants in XLDB, while scientists are about 21, industries represent about 47 percent and the balance belongs to vendors. Since the first conference in 2007, the number of science people has gone down while industry numbers have climbed.
Topics range from “Industrial Perspective on Tools for Big Data” to “Data Infrastructure at LinkedIn” to “What’s New at Google?” to “Advanced Concepts and Techniques for Visualizing Large Data.”
You can see PPT slides from the conference at http://www-conf.slac.stanford.edu/xldb2011/Program.asp.
Here are quotes from typical abstracts – note that a lot were over my head but my intent was to swish my feet in them:
Low-rank matrix factorizations are effective tools for discovering and quantifying relationships between classes of entities such as documents and terms (as in keyword search), users and stories (news personalization), and users and items (recommendation systems). – “Techniques for Discovering Relationships in Massive-Scale Data,” Peter J Haas, IBM
The 1000 Genomes Project data represents a very large dataset, which is of significant interest to not only computer scientists and bioinformaticians but also bench biologists and clinicians. The Data Coordination Centre for the 1000 Genomes Project has worked on several measures to improve accessibility both for high end and naive users. Here I present some of these tools including our Data Slicer which allows individuals to download slices of data relevant to their gene or genomic region of interest and our Variation Pattern Finder which presents a view on variation data, showing how the genotypes for different variants are shared between different individuals. By providing these tools the 1000 Genomes project hopes to make the data we present as widely useful as possible. “The 1000 Genomes Project, User Accessibility,” Laura Clarke, EBI
Although much of this stuff, as I mentioned, was clearly over my head after the first couple sentences, enough is understandable and of interest to pick through the slides and read the abstracts; one thing you will at least get is a view of issues surrounding extremely large data – as well as what’s coming down the road – and this is something that as a professional I both want to understand and feel like I should understand; at minimum you will pick up the language and themes of Big Data.
To read a presentation that is easier to understand, try Edmond Lau’s (Quora) presentation “Scaling Up Quickly on the Cloud,” the full-text of which is available as one of his postings on Quora at http://www.quora.com/Edmond-Lau/Scaling-Up-Quickly-on-the-Cloud.
You can follow XLDB on Twitter at @XLDBConf; also search #xldb for links to presentations and the like. Video of presentations is available on iTunes U, a distribution system for lectures and other educational content.
VLDB
The other conference I found while searching for XLDB was VLDB, an annual conference held by the U.S. non-profit Very Large Data Base Endowment Inc.; the mission of VLDB is “to promote and exchange scholarly work in databases and related fields throughout the world.”
The VLDB conference began in 1975, and a complete archive is available at http://www.vldb.org/archives/website.html.
According to the website http://www.vldb.org/2011/?q=node/2
VLDB is a premier annual international forum for data management and database researchers, vendors, practitioners, application developers, and users. The conference … cover[s] current issues in data management, database and information systems research. Data management and databases remain among the main technological cornerstones of emerging applications of the twenty-first century.
VLDB 2011 was in Seattle, Aug. 29-Sept. 3. Various talks are available by searching “vldb conference 2011 site:youtube.com”. For example the presentation “Databases will visualize queries, too” is at http://www.youtube.com/watch?v=kVFnQRGAQls – you’ll see on the youtube page that the PPT slides, paper and online demo for that same talk are available at http://queryviz.com/.
By searching Twitter for #VLDB you can find links to more from current and previous years such one from a previous year, “Data Markets in the Cloud: An Opportunity for the Database Community,” by Magdalena Balazinska, Bill Howe, and Dan Suciu, University of Washington: http://www.vldb.org/pvldb/vol4/p1482-balazinska.pdf:
Cloud-computing is transforming many aspects of data management. Most recently, the cloud is seeing the emergence of digital markets for data and associated services. We observe that our community has a lot to offer in building successful cloud-based data markets. We outline some of the key challenges that such markets face and discuss the associated research problems that our community can help solve.
Again, most of VLDB is written by and for computer / data scientists, but you can go into it thinking you will get what you can from it and leave it at that. So have fun with it and let me know what you think: jstraub@dclibraries.org.
Follow Us!