Bio IT World Expo 2016  
Bio IT World Expo 2016

Track 8 - April 21 – 23, 2015

Pharmaceutical R&D Informatics 

Collaboration, Data Science and Biologics

With the increased generation of innumerable and varied data sets, pharma and biotech must effectively manage and integrate data from all stages of the pharmaceutical value chain to enable more informed decisions. This is seen in the emerging trend of data science groups. Track 8 explores the transformation of current IT and informatics teams into data science groups and current progress made by such groups in the analysis, integration and visualization of complex data sets, including genomic, imaging, clinical, external/internal collaboration and real world data.

Final Agenda

Download Brochure | Workshops 

Tuesday, April 21

7:00 am Workshop Registration and Morning Coffee

8:00 – 11:30 Recommended Morning Pre-Conference Workshops*

Biologics, Bioassay, and Biospecimen Registration Systems - View Detailed Agenda 

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*

Finding Innovation in Collaboration Environments: Documentum, Sharepoint, Veeva, and Tigers, Oh My! - View Detailed Agenda 

* Separate registration required

2:00 – 6:30 Main Conference Registration


Click here for detailed information. 

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing



Wednesday, April 22

7:00 am Registration Open and Morning Coffee


Click here for detailed information. 

9:00 Benjamin Franklin Awards and Laureate Presentation

9:30 Best Practices Awards Program

Internet 2

9:45 Coffee Break in the Exhibit Hall with Poster Viewing


10:50 Chairperson’s Opening Remarks

Jose L. Alvarez, Principal Engineer, WW Director, Healthcare and Life Sciences, Seagate Cloud and Systems Solutions

11:00 The Evolution of Data Science in Translational Medicine

Anastasia Christianson, Head, Translational R&D IT, Bristol-Myers Squibb

Eric Carleen, Director, Data Integration, Bristol-Myers Squibb

The role of the data scientist continues to evolve while the demand for the skills continues to rise. The ever increasing volume of data generated on any one drug project and the availability of even more relevant data externally necessitate strong data handling and data analytics expertise. Thus the role of the data scientist continues to evolve and the skills it requires continue to grow. This presentation will describe how the role has evolved in one Pharma company and how the collaboration between data scientists and related skills across organizational boundaries has delivered valuable insights to project teams.

11:30 Data Science in Translational Clinical Research

James Cai, Head, Data Science, Roche, Translational Clinical Research Center (TCRC)

The intelligent use of Big Data has transformed many industries. It also presents numerous opportunities for pharmaceutical companies as we collect more genomic Big Data directly from patients. In this talk I will outline a Data Science model that emphasizes mixed-capability teams and impact on science and business decisions. I will discuss how quantitative analytical skills, agile programming, novel technologies and business acumen all contribute to this model. I will illustrate with examples where Data Science was applied to clinical research resulting in new scientific insights and better business decisions.

Linguamatics12:00 pm Text Mining from Bench to Bedside - Where’s the Value?

Jane Reed, Ph.D., Head, Life Science Strategy, Linguamatics

Accessing the right information is critical to bench-to-bedside translational research. Much of the data is locked in textual format, such as scientific literature, clinical trial reports or electronic health records.  This talk will demonstrate how advanced text analytics can provide a powerful solution to the challenges faced by researchers and clinicians, who need to extract the key facts rapidly and accurately to gain actionable insights for decision support.

ACD Labs New12:15 Simplifying Analytical Knowledge Transfer in an Externalized World

Ryan Sasaki, Director, Global Strategy, ACD/Labs

The lion’s share of chemical R&D today is being outsourced to external organizations. Subsequently, the potential of losing the ‘proof of identity’ for a sample, in the transfer of materials between a contractor and client, grows. As externalization and research virtualization continues to evolve, the task of mining these legacy analytical chemistry datasets and methods to help monitor and identify raw materials, impurities, and metabolites will be ever more difficult based on deficiencies in the knowledge exchange mechanisms. Fortunately, solutions are emerging. This session will present a use case for a new laboratory informatics external collaboration model.

12:30 Session Break

 Thomson Reuters12:40 Luncheon Presentation I: Utilizing Big Data and Linked Data to Explore Relationships between Biological Entities for Drug Repurposing, Translational Medicine and Target Finding 

Tomasz Adamusiak, M.D., Ph.D., Senior Data Scientist, Technology Development, Thomson Reuters

With the advances in NGS technology and data generation and as traditional translational research is deemed inefficient and costly, pharmaceutical and biomedical industries are driven to seek new ways to better utilize their data to extract relevant biological information. Thomson Reuters Cortellis™ Data Fusion delivers a first-in-class Big Data solution to drive new scientific and strategic insights from all of the proprietary and public content.

Elsevier1:10 Luncheon Presentation II: Where Science Intersects with Business – Creating Business Dashboards That Combine Data from Multiple Sources

Huijun Wang, Ph.D., Associate Principle Scientist, Cheminformatics, Merck & Co., Inc.

Eric Gifford, Ph.D., Principal Scientist, Systems Chemical Biology, Merck & Co., Inc.

Matthew Clark, Ph.D., Consultant, Life Science Services, Elsevier

In today’s highly competitive pharmaceutical environment it is imperative for project teams to monitor both business movements, and scientific developments that can affect the business proposition for the program. Elsevier is collaborating with Merck to develop a series of dashboards that can bring in information from multiple sources to create views with facets for drug, target, and disease related information. These dashboards will monitor scientific information gleaned from journals, patents & grant applications to provide a rich context for monitoring project status and competitive position.

1:40 Session Break

1:50 Chairperson’s Remarks

Daniel H. Robertson, Ph.D., Senior Director, Research IT, Eli Lilly and Company

1:55 Transforming IT and Informatics at Biogen to Drive Research

Hank Wu, Director, R&D IT, Biogen

Transforming IT and Informatics at Biogen is at the heart of the company’s strategic commitment to use technology, data and analytics to inform the drug discovery process, unlock new insights, improve patient care and drive innovation. This presentation shares work in progress and lessons learned at Biogen.

2:25 PANEL DISCUSSION: Growing a Data Science Team

  • Enabling Innovative Data-driven Approaches at the Intersection of Science, Medicine & Economics
  • Assembly, Creation and Implementation of Data Science Groups for Pharma
  • The Data Scientist - an Essential Component of Big Data Analytics – Difficult to Identify
  • What are Data Sciences, Informatics and Bioinformatics?
  • Should data scientists be centralized or embedded within other product/functional teams?
  • How strong of a coder/programmer should members of a data science team be?
  • How much domain knowledge does a data scientist need to have?

Moderator: Martin Leach, Ph.D., Vice President, Global Data Office, Biogen


Rainer Fuchs, CIO, Harvard Medical

Jason Johnson, Ph.D., Executive Vice President and Head of R&D, PatientsLikeMe

Jake Klamka, Founder, Insight Data Science Fellows Program

Daniel H. Robertson, Ph.D., Senior Director, Research IT, Eli Lilly and Company

Tom Plasterer, Ph.D., Director, US Cross-Science Lead, AstraZeneca

Sarah Aerni, Ph.D., Principal Data Scientist, Pivotal

IDBS2:55 Can Simplifying the Informatics Landscape Underpin Your Lab or the Future?

Paul Denny-Gouldson, Ph.D., Vice President, Strategic Solutions, IDBS

A core concept of the lab of the future is simplifying day to day tasks and providing easy access to information concerning materials, results and reports. To realize these aspirations, it is essential to modernize existing R&D data workflows but importantly not to just automate the current state. With an upgrade of infrastructure comes a great opportunity to reassess what is done, how it is done and how this can all be optimized. Removing paper and capturing IP can be drivers for a change – but don’t miss the opportunity to get more out of the change. We will use case studies of the good and the bad to show what can be done and how it can be done.

BIOVIA3:10 BIOVIA ScienceCloud: Automating Collaboration Workflows

Ton van Daelen, Ph.D., ScienceCloud Product Director, BIOVIA

The amount of R&D spending beyond company boundaries is approaching 50% of the overall R&D budget, yet informatics infrastructures are challenged to support this changing environment. We will present a comprehensive, cloud-based solution stack for externalized, collaborative research for pharma/biotech and CROs that addresses these challenges and we will discuss how developing customized business rules and synchronizing cloud with on-prem data are critical success factors.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing


4:00 The Construction of a Scientific Modeling Culture and Technology Platform at Merck

Chris L. Waller, Ph.D., Director and Head, Scientific Modeling Platforms, Merck Research Laboratories

Merck Research Laboratories is undergoing a transformation in the way that it prosecutes R&D programs. Through the adoption of a “model-driven” culture, enhanced R&D productivity is anticipated. To support this emerging culture, an ambitious IT program has been initiated to implement a harmonized platform to facilitate cross-domain workflows and decision-making through agile persona driven data and predictive model access.

4:30 Separating the Wheat from the Chaff: Using Proprietary and Public Genomic Information to Identify Biomarkers from Cancer Cell Line Profiling Studies

Yue Webster, Ph.D., Senior Research Scientist, LRL IT Informatics, Eli Lilly and Company

Like most companies, Lilly uses large panels of cancer cell lines to discover genes, transcripts, proteins and/or metabolites which influence response to treatment. The potential for generating false positive findings is significant, and low concordance was highlighted by recent publication (Nature 504, 389–393). The use of co-expression networks and integration across various resources helps identify higher quality relationships. Advanced visualization tools help biologists navigate through thousands of putative relationships.

Lab Answer5:00 Helping Our Clients Succeed in Their Distributed R&D Environments by Delivering Excellence in Scientific and Laboratory Informatics

John F. Conway, Global Director, R&D Strategy and Solutions, LabAnswer

Many organizations have chosen to distribute or externalize large portions of their R&D. Consequently, these same organizations are struggling to collaborate with their external partners. Sharing and capturing of data and information in these environments is requiring extra (inefficient) effort. Through discussion and case studies attendees will get to see firsthand how LabAnswer is helping our clients develop strategies, technologies and best practices that help solve some of the headaches associated with the distributed R&D business model.

CambridgeSemantics5:15 Co-Presentation: A Data Lake for Competitive and Clinical Trial Intelligence

Ben Szekely, Vice President, Solutions, Cambridge Semantics

Christine Blazynski, Ph.D., Chief Science Officer & Senior Vice President, New Product Development, Informa

Semantic Data Lakes combine rich, conceptual models with cloud storage and computing technologies to link multi-structured content. This paradigm enables user-friendly and intuitive search, analytics and visualization across wide and diverse data sets. In this talk, Cambridge Semantics and Informa will present the Semantic Data Lake they have created across Informa's rich content sources including Citeline and Sagient. We will walk through some interesting use cases that illustrate the value of developing a Semantic Data Lake.

5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

6:30 Close of Day


Thursday, April 23

7:00 am Registration Open and Morning Coffee


Click here for detailed information. 

10:00 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced


10:30 Chairperson’s Remarks

Yuriy Gankin, Ph.D., Chief Life Science Officer, EPAM Systems

10:40 Translational R&D Analytics: Delivering ‘Big Insights’ to Drive Translational Research

Kaushal Desai, Associate Director, Translational R&D Analytics and Decision-Support, Research Informatics & Automation, Bristol-Myers Squibb

The emergence of immunotherapy and a focus on systems approaches has led to an unprecedented surge in translational research opportunities for discovery and development of newer treatment paradigms. Organizations leading the race for scientific breakthroughs in patient treatment have accumulated overwhelming quantities of efficacy, survival, safety and biomarker data from decades of preclinical studies and clinical trials. Translational R&D organizations face the arduous task of mining this data to deliver insights that drive translational research. This session will explore case studies demonstrating how translational R&D analytics can inform patient stratification and trial design in early clinical and translational research. The talk will focus on the journey from a lack of discoverability for disjointed datasets to insights that drive key decisions in translational research. Challenges associated with delivering actionable information at the point of decision-making will be highlighted and opportunities to deliver business value will be outlined using real examples from multiple disease areas.

11:10 Integrated Genomics Platform: Putting Patients and Their Genomes into the Focus of Our Research

Nora Manstein, Ph.D., IT Project Manager, Bayer Business Services GmbH

The fast progress in the generation of genomic data has reached the patient. Especially the advent of next generation sequencing and high resolution microarrays enable accurate descriptions of diseases with a strong genetic component ultimately leading to novel therapeutic approaches. Application of these technologies, however, leads to large amounts of data in need of effective storage and analysis. As now several data types (mutation, expression, microRNAs) become available for each patient, patient-centric views and analyses become mandatory. Consistent data handling and storage is a scientific and technological challenge towards both the research organization and the IT infrastructure. We have established the Integrated Genomics Platform (IGP) as a central tool for genomics research in Cardiology, Oncology and Clinical Sciences. The platform supports advanced data analysis and is intended to simplify discovery processes, e.g. for novel therapeutic targets and genetic biomarkers. In this strategic project, we have overcome known bottlenecks and enabled true translational research by establishing a company-wide mandatory repository and toolbox for storage and analysis of genomics data as well as common standards for data annotation, privacy & security.

Bina Technologies11:40 Building a Globally Distributed, Hybrid NGS Sequence Analysis and Integration Infrastructure for Oncology Discovery and Translational R&D

Justin H. Johnson, Principal Scientist, AstraZeneca

Next-Generation Sequencing is changing the way pharmaceutical companies develop drugs, perform patient stratification, and evaluate treatment efficacy. However, managing the massive amounts of NGS data has introduced fundamental IT challenges. Here we discuss the implementation of a fast, flexible, scalable and validated IT infrastructure that can streamline the upkeep of the NGS analysis workflow and the distribution of genomic information throughout an organization for translational discovery.

12:10 pm Session Break

12:20 Luncheon Presentation (Sponsorship Opportunity Available) or Lunch on Your Own

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing


1:55 Chairperson’s Remarks

Dermot McCaul, Director, PreClinical Development and Biologics IT, Merck

2:00 Best Practices to Thrive Under the BioPharma Big Data Deluge

Tom Plasterer, Ph.D., Director, US Cross-Science Lead, AstraZeneca

Information comes at the BioPharma industry from an abundance of sources, ranging from compounds optimization in early R&D up through opining patients in social media discussing the virtues—or lack thereof—of our products. Knowing how to ingest, harmonize and query this information stream can be a tremendous advantage both internally and to the ecosystem of partners and providers we depend upon. Examples using linked data approaches illustrate how the information stream can be tamed, focusing first on getting data out of containers. Once this is established, terminologies can be applied to derive meaningful answers across otherwise-siloed content. The Open PHACTS project and Bio2RDF projects show how this approach has been used to solve real big data questions for BioPharma.

2:30 Beyond Data Integration – Consumable Expert Knowledge in Chemical Biology

Jeremy L. Jenkins, Ph.D., Senior Investigator II, High Throughput Biology, Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research

Much focus in pharmaceutical data management has been placed on data integration. While metadata standards have greatly facilitated integration of disparate datasets, the result can be too much information to display. An emerging challenge is how to create inferences from knowledge bases that enable automation of expert opinions at large scale. We present a system that creates summary-level assertions based on diverse chemical biology data sources to address the problem of ranking tool compounds for targets, and vice versa, quantifying target confidence for compounds. At first pass the approach required domain-specific scientific understanding to engineer; however, application of machine learning methods improved the inference system, as demonstrated by prospective testing of good and bad tool compounds in a panel of cellular reporter gene assays. Overall this approach provides data-driven opinions about compounds that reflect those of an informed chemical biologist.

3:00 Development and Implementation of a Nonclinical Data Warehouse

Gregory Woo, Principal IS Business Systems Analyst, Research & Development Informatics, Amgen, Inc.

Amgen has implemented an integrated data warehouse for nonclinical toxicology studies, including data from internal systems and at Contract Research Organizations. The goal of the system is to allow scientists to rapidly search, query, and visualize historical toxicology, pathology, and toxicogenomics data. This presentation will discuss the system’s design, key features, challenges and lessons learned.

3:30 The Data Integration Challenge

Mark Davies, Technical Lead, Computational Chemical Biology, European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus

The challenge of data integration is a common issue faced by developers of resources in the life science fields. As the size of external data sources grow and approach ‘big data’ scale, is it still feasible to expect a resource to maintain up-to-date links to external resources. Are there alternative approaches that can alleviate the integration burden for the resource provider? A solution developed by the ChEMBL group, led to the creation of the freely available UniChem resource. The UniChem resource allows users to quickly and dynamical integrate the chemical content from a growing number sources, which currently stands at 25 and contains more than 70 million compound structures. We use the example of the new and open SureChEMBL patent system to demonstrate how UniChem can assist with data integration. We also identity the new challenges we face and how we can embrace other technologies and methodologies, such as Linked Data, to help stay on top of the data integration challenge.

4:00 Conference Adjourns

Download Brochure | Workshops 

Reg Early


View 2015 Brochure
View 2015 Brochure
View Videos & Photos 
Platinum Sponsors

Cycle Computing logo

DDN Storage  


Illumnia logo  

Intel Logo  


Official Media Partner

Conference CD

CD iconOrder the 2015 event proceedings - now available on CD

Complimentary Downloads

View white papers, listen to podcasts, and more!

  • Making the World's Knowledge Computable
  • Bioinformatics in the Cloud
  • The Application of Text Analytics to Drug Safety Surveillance

Related Event

 Medical Informatics World Related