| 
							
							
								 Nate E TeBlunthuis | b4dd9acbd8 | Merge branch 'master' of code:cdsc_reddit | 2021-01-27 20:09:23 -08:00 |  | 
			
				
					|  | dbe4c87f8b | add cluster selection to visualization | 2021-01-27 20:08:07 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 3155600514 | remove nsfw subs from topN | 2020-12-28 21:11:44 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 4e20dce188 | Updating to support wang-style user overlaps. | 2020-12-24 22:38:04 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 56269deee3 | Some improvements to run affinity clustering on larger dataset and compute density. | 2020-12-12 20:42:47 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | e6294b5b90 | Refactor and reorganze. | 2020-12-08 17:32:20 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | a60747292e | Add code for running tf-idf at the weekly level. | 2020-12-01 22:54:48 -08:00 |  | 
			
				
					|  | db5879d6c9 | refactor visualization code. | 2020-11-17 16:46:49 -08:00 |  | 
			
				
					|  | 13eb95b3b0 | Merge remote-tracking branch 'refs/remotes/origin/master' into master | 2020-11-17 16:33:14 -08:00 |  | 
			
				
					|  | 2cc897543a | git-annex in nathante@nate-x1:~/cdsc_reddit | 2020-11-17 16:33:13 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 1bf206d219 | git-annex in nathante@mox2.hyak.local:/gscratch/comdata/users/nathante/cdsc-reddit | 2020-11-17 16:31:48 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | f8ff8b2d0f | Update code for clustering + tsne. | 2020-11-17 15:59:20 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 82d184d9c6 | Update code for building simlarity matrices. | 2020-11-17 12:52:48 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | e794214653 | bugfix in completing tfidf similarity matrices. | 2020-11-12 11:47:53 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 220a540beb | increase learning rate. | 2020-11-11 16:58:39 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | cd43a94865 | increase iterations and perplectity and early_exaggeration | 2020-11-11 16:55:39 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | ca6a8f0896 | increase learning rate | 2020-11-11 16:48:41 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | ed0e1a8235 | Fix bug in tsne. | 2020-11-11 16:43:41 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 6baa08889b | git-annex in nathante@mox2.hyak.local:/gscratch/comdata/users/nathante/cdsc-reddit | 2020-11-11 16:39:44 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 4447c60265 | split fitting and plotting tsne. | 2020-11-11 16:38:22 -08:00 |  | 
			
				
					|  | db53c0138a | Add file to plot related subreddits using tsne. | 2020-11-11 16:05:36 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 4c8bd14992 | Bugfix (typo) | 2020-11-10 13:38:11 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 39c581bee9 | Reuse code for term and author cosine similarity. | 2020-11-10 13:18:57 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 5632a971c6 | Refactor tfidf code to for code resuse. | 2020-11-10 13:18:19 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 772f3a8fbd | rename 'idf' files to 'tfidf' | 2020-11-10 13:16:55 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 6edd155749 | Improvements to idf code | 2020-11-10 13:12:11 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 8b8c45ee2d | Merge branch 'master' of code:cdsc_reddit | 2020-11-02 10:40:12 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 3dc17bd27c | add term_cosine_similarity.py | 2020-11-02 10:40:02 -08:00 |  | 
			
				
					|  | 0882878166 | Add Cosine similarities to README.md | 2020-11-02 09:48:10 -08:00 |  | 
			
				
					|  | b50b08a3ea | Update Readme. | 2020-11-02 08:42:13 -08:00 |  | 
			
				
					|  | 9075a8153c | Merge branch 'master' of code:cdsc_reddit into master | 2020-11-01 21:50:44 -08:00 |  | 
			
				
					|  | 4c78f2c527 | Create README.md | 2020-11-01 21:50:27 -08:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 4ced659d19 | Update reddit comments data with daily dumps. | 2020-10-03 16:42:22 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 2740f55915 | Compute IDF for terms and authors. | 2020-08-23 11:57:55 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 2d425600a8 | Update submissions to parse using the backfill queue. | 2020-08-11 22:37:36 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | c92b50e050 | bugfix in checking submission shas | 2020-08-11 14:21:54 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | c0da8f4dbf | Use multiword expressions in tf. | 2020-08-10 16:57:46 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 57951050c0 | Finish generating multiword expressions. | 2020-08-09 22:43:48 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 529b7f0511 | Bugfix | 2020-08-09 02:34:42 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 2d1c8013f2 | Use groupby - joins instead of windows | 2020-08-09 00:21:50 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | f28effe2c3 | renamte tf_comments part 2. | 2020-08-04 13:39:49 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 39fde9884e | rename tf_reddit_comments.py step1. | 2020-08-04 13:39:20 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 78ab514d6b | Improve tokenization following data. Generate author counts. | 2020-08-04 13:24:37 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | b3ffaaba1d | improve tokenizer. | 2020-08-03 22:55:10 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | ddf2adb8a6 | TF reddit comments. | 2020-08-03 22:43:57 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 40be7bedb6 | code to sort tf | 2020-08-03 17:56:36 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | c666302b4a | remove is_submitter field from submissions which doesn't exist. | 2020-07-09 17:12:14 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | aa84a7df03 | Bugfixes in scripts. | 2020-07-07 23:29:36 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 06fd99e7cd | clean up comments in streaming example. | 2020-07-07 12:28:57 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 7d0e020f9d | update .gitignore | 2020-07-07 12:28:44 -07:00 |  |