Metadata-Version: 1.1
Name: mwstreaming
Version: 0.2.5
Summary: A collection of scripts and utilities to support the stream-processing of MediaWiki data.
Home-page: https://github.com/halfak/MediaWiki-Streaming
Author: Aaron Halfaker
Author-email: ahalfaker@wikimedia.org
License: MIT
Description: MediaWiki Streaming
        ===================
        
        A set of utilities for stream-processing MediaWiki data.
        
        
        Usage
        -----
            ``mwstream (-h | --help)``
            
            ``mwstream <utility> [-h|--help]``
        
        Data processing utilities
        +++++++++++++++++++++++++
            ``diffs2persistence``
                Generates token persistence statistics using revision JSON blobs with
                diff information.
            ``dump2json``
                Converts an XML dump to a stream of revision JSON blobs
            ``json2diffs``
                Computes and adds a "diff" field to a stream of revision JSON blobs
            ``persistence2stats``
                Aggregates a token persistence statistics to revision statistics
            ``wikihadoop2json``
                Converts a Wikihadoop-processed stream of XML pages to JSON blobs
        
        General utilities
        +++++++++++++++++
            ``json2tsv``
                Converts a stream of JSON blobs to tab-separated values based a set of
                `fieldnames`.
            ``normalize``
                Normalizes old versions of RevisionDocument json schemas to correspond
                to the most recent schema version.
            ``validate``
                Validates JSON against a provided schema.
        
        
        Installation
        ------------
        
            ``pip install mwstreaming``
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: System Administrators
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Topic :: Utilities
Classifier: Topic :: Scientific/Engineering
