Commit Graph

  • 4c77c0f12e Merge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into user_level_wikiq groceryheist 2018-08-31 16:01:07 -07:00
  • 915c864ee5 add more variables and support for persistence groceryheist 2018-08-31 15:57:48 -07:00
  • 3d12865c4e add more variables and support for persistence groceryheist 2018-08-31 15:57:48 -07:00
  • bc1f5428f0 add spark program for running group by users groceryheist 2018-08-31 20:40:22 +00:00
  • 317bafb50d Merge branch 'advanced_persistence' of code.communitydata.cc:mediawiki_dump_tools into advanced_persistence groceryheist 2018-08-23 18:52:54 -07:00
  • 7cd0bf3b9e Add parameter for selecting specific namespaces. groceryheist 2018-08-23 18:27:09 -07:00
  • d93769c21f Merge branch 'advanced_persistence' of code.communitydata.cc:mediawiki_dump_tools into advanced_persistence groceryheist 2018-08-23 18:27:09 -07:00
  • afd40c1a45 Merge branch 'advanced_persistence' of code.communitydata.cc:/mediawiki_dump_tools into advanced_persistence Nate E TeBlunthuis 2018-08-23 18:23:36 -07:00
  • e4222c45dd add namespace filter parameter Nate E TeBlunthuis 2018-08-23 18:25:08 -07:00
  • 829ffcffae Merge branch 'advanced_persistence' of code.communitydata.cc:/mediawiki_dump_tools into advanced_persistence Nate E TeBlunthuis 2018-08-23 18:23:36 -07:00
  • 776b73519a add namespace filter parameter Nate E TeBlunthuis 2018-08-23 18:02:56 -07:00
  • 5b6aaad862 add namespace filter parameter Nate E TeBlunthuis 2018-08-23 18:02:56 -07:00
  • f468d1a5b6 add support for persistence with segment matching groceryheist 2018-08-20 16:08:16 -07:00
  • ff689c71dd Merge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into user_level_wikiq groceryheist 2018-08-14 14:44:37 -07:00
  • 39b4e5698f Use dask to parallelize and scale user level datasets groceryheist 2018-08-14 14:37:03 -07:00
  • daf1851cbb Use dask to parallelize and scale user level datasets groceryheist 2018-08-14 14:37:03 -07:00
  • 418fa020e5 Merge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into user_level_wikiq groceryheist 2018-08-12 21:34:12 -07:00
  • 311810a36c refactor wikiq to seperate script from classes and functions. Code reuse in testing. groceryheist 2018-08-12 21:33:19 -07:00
  • 118b8b1722 move tests to test folder groceryheist 2018-08-12 18:05:59 -07:00
  • f69e8b44a6 move tests to test folder groceryheist 2018-08-12 18:05:59 -07:00
  • bf396ad366 Prefix page titles with namespace names. mediawiki-utils-migration groceryheist 2018-07-09 22:11:17 -07:00
  • d1f5e7b44c undoing my changes to master for now. see branch mediawiki-utils-migration legacy groceryheist 2018-07-05 01:40:17 -07:00
  • dba793c6ac migrate to mwxml. This completes the migration away from python-mediawiki-utilities. Except for preserving legacy persistence behavior, we can safely use the nice updates from the mediawiki-utils project. groceryheist 2018-07-05 01:16:00 -07:00
  • d77b0a4965 migrate to mwpersistence. this fixes many issues. We preserve legacy persistence behavior using the --persistence-legacy. groceryheist 2018-07-04 19:06:07 -07:00
  • 7db6288923 migrate reverts to python-mwreverts groceryheist 2018-07-04 15:29:48 -07:00
  • a883cb536b add note to readme about dependency on compression software groceryheist 2018-07-04 15:20:52 -07:00
  • e925ac9da1 add tests for wikipedia, malformed xml, bzip2, correct bz2 bug in wikiq. groceryheist 2018-07-04 15:08:30 -07:00
  • d2746879d0 create baseline tests for xml dump processing groceryheist 2018-07-03 23:43:47 -07:00
  • ba886ecf4c a number of small updates and fixes Benjamin Mako Hill 2018-05-17 14:37:20 -07:00
  • 3f9da40747 support 7z archives with multiple files. add urlencode paraeter groceryheist 2017-12-07 15:10:56 -08:00
  • 5d7dceb9e4 fix code to work with bzip files Benjamin Mako Hill 2017-02-06 18:25:17 -08:00
  • 7d8ec932dd added list of compressed dump files to .gitignore Benjamin Mako Hill 2015-07-23 12:16:31 -07:00
  • d934700ee9 added support to parse namespaces from title Benjamin Mako Hill 2015-07-23 12:12:20 -07:00
  • 108c8442b2 added README file to document the submodule Benjamin Mako Hill 2015-07-22 19:55:08 -07:00
  • eeb0742cc6 created new repository for wikiq with Mediawiki-Utilities as a submodule Benjamin Mako Hill 2015-07-22 19:44:52 -07:00