Data Triggered Programming Model for Text Processing in Big Data

Sandhya N; Philip Samuel; Mariamma Chacko

Data Triggered Programming Model for Text Processing in Big Data

Authors

Sandhya N
Philip Samuel
Mariamma Chacko

Keywords:

Big Data Computing, Data Centric Architectures, Data Parallelism, MapReduce, Scalable, Data Triggered Multithreading

Abstract

Large volume of text processing becomes a challenge in recent era. Text processing methods drive much of modern data analysis across engineering sciences and commercial applications. Extraction of useful information from text sources refers to text analytics. This term describes tasks from annotating text sources with meta-information such as places mentioned in the text and a wide range of documents. The key/value pair generation of MapReduce program creates memory overhead and deserialization overhead due to data redundancy. Redundancy of data is one of the most important factors that consumes space and affect system performance while using large set of data. This overhead can be avoided considerably by using a novel approach that we developed named Data Triggered Multithreaded Programming (DTMP) model. In this paper, we demonstrate the use of DTMP model using a large dataset with author details and his publications. The Data Triggered Multithreaded Programming can dynamically allocate the resources and can identify the data repetition occurring during computation. DTMP model when applied to the MapReduce programming model brings performance improvement to the system. The major contributions of this work are a simple and scalable processing of text data that enables automatic parallelization and distribution of large-scale computations.

Downloads

Download data is not yet available.

Downloads

Published

2016-07-01

How to Cite

Sandhya N, Philip Samuel, & Mariamma Chacko. (2016). Data Triggered Programming Model for Text Processing in Big Data. Journal of Network and Innovative Computing, 4, 9. Retrieved from https://cspub-jnic.org/index.php/jnic/article/view/123

Download Citation

Issue

Vol. 4 (2016)

Section

Original Article

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

You are free to:

Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Notices:

You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .

No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.

Data Triggered Programming Model for Text Processing in Big Data

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Information

Current Issue